notes - nma

8/4/2019 Notes - NMA

1/159

U. Washington AMATH 352 - Spring 2009

Applied Linear Algebra and Numerical Analysis

AMATH 352 - Spring 2009Prof. U. Hetmaniuk

University of Washington

The course goals are to understand the basic concepts of linear algebra andto obtain an introduction to some aspects of computational techniques used inmatrix methods. We will study basic concepts in linear algebra, including vec-tors, vector spaces, linear transformations, matrix-vector manipulations, solvinglinear systems, least squares problems, and eigenvalue problems. Matrix decom-positions (e.g. LU, QR, SVD, etc.) will play a fundamental role throughout thecourse. The emphasis will be on practical aspects of linear algebra and nu-merical methods for solving these problems. Such problems arise constantly inscience, engineering, nance, computer graphics.

Professor Randall J. LeVeque (U. Washington) created the notes. He, kindly,allowed me to modify them. All the mistakes and typos are mine.

1


2/159


1 Column Vectors

In geometry, the two-dimensional plane is denoted R 2 and the three-dimensionalspace R 3 . A vector is an object comprised of a magnitude and a direction. Inthe plane, we can draw a vector as an arrow with some length and pointingsomewhere . A vector can also be thought of as a displacement. A displacement

does not depend where it starts. Consequently, the vectors are equal, even

though they start from different places, because they have equal length and equaldirection. The basic idea here, combining magnitude with direction, is the keyto extending to higher dimensions. In this section, we dene the generalizationof vectors in the two-dimensional plane and three-dimensional space.

Let m be a positive integer. We denote R m the set of all real m-tuples, i.e.the set of all sequences with m components, each of which is a real number.The standard notation for an element x of R m is the column vector notation:

x R m x =x1...

xm

. (1)

It is important to remember that, in many applications of linear algebra, theelements of the vector represent something different from the three physical co-ordinates of ordinary space. There is often nothing unphysical about consideringvectors with many more than 3 components.

Example. We have

13 R 2 ,703

R 3 , and

1

2/ 53/ 54

R 4 .

2


3/159


1.1 Addition of column vectors

We can dene the addition of two vectors x and y of Rm

:

x =x1...

xm

y =y1...

ym

, x + y =x1 + y1

...xm + ym

. (2)

The set R m is closed under addition, meaning that whenever the addition isapplied to vectors in R m , we obtain another vector in the same set R m . Forexample, we have

123

+2

48=

1 + 22 43 + 8

=3

211.

Exercise 1. Prove that the addition is associative: (x + y ) + z = x + ( y + z ).

Exercise 2. Prove that the addition is commutative: x + y = y + x .

Exercise 3. Find the unique vector z such that x + z = x , for any vector x . zis called the zero vector and, sometimes, denoted 0 .

Exercise 4. Show that every vector x R m has an additive inverse y R msuch that x + y = 0 .

1.2 Scalar multiplication of a column vector

We can also dene the scalar multiplication: if x R m

and R

, then thevector x belongs to R m and is dened by multiplying each component of x bythe scalar :

x =x1...

xm

R , x =x 1

...x m

. (3)

The set R m is also closed under scalar multiplication. For example, we have

2123

=2 12 22 3

=246

, (3)2

48=

(3) 2(3) (4)(3) 8=

612

24.

Exercise 5. Prove that , R , ( x ) = ( )x .

Exercise 6. Prove that R , x , y R m , (x + y ) = x + y .

Exercise 7. Prove that , R , x R m , ( + )x = x + x .

Exercise 8. Show that, for every vector x R m , 1x = x .

3


4/159


1.3 Norms

To measure the magnitude or the length of a vector, we use a vector norm. Avector norm is simply a map from vectors in R m to nonnegative real numbers,satisfying the following conditions (which generalize important properties of theabsolute value for scalars):

x R m , x 0, (4a)x = 0 if and only if x = 0 , (4b) R , x = | | x , (4c)x , y R m , x + y x + y (Triangle inequality ). (4d)

Note that a norm satises the following properties

x = x and x + x = 2x = 2 x .The triangle inequality can not be an equality because we have, for any non-zerovector x ,

0 = x x < x + x = 2 x .One common choice is the max-norm (or innity-norm) denoted by

x = max1im |x i | . (5)A bound on the max-norm of the error is nice because we know that everycomponent of the error can be no greater than the max-norm,

i, 1 i m, |x i | x .It is easy to verify that satises the required properties (4).

x R m , x 0 because the absolute value is always greater or equalto zero. x = 0 if and only if x = 0 ,

If x = 0 , then all the components of x are zero and, consequently,x = 0 .

If x = 0 , then every component x i will be zero. This implies thatthe vector x is 0 .

R , x = | | x , Note that, for every entry i, we have

|x i | = | | |x i | .

4


5/159


We getmax

1im |x i

|= max

1im |

| |xi

|=

|

|max

1im |x i

|since we are multiplying every entry by the same factor | |. Weobtainx = | | x

x , y R m , x + y x + y The absolute value satises a triangle inequality. Consequently, we

have, for every i,

|x i + yi | |x i |+ |yi | x + y .So every entry in the vector x + y is smaller than x + y . Inparticular, the largest entry will also satises that. So we get

max1im |x i + yi | = x + y x + y

For some problems, however, there are other norms which are either more ap-propriate or easier to bound using our analytical tools. The 2-norm is frequentlyused,

x 2 = mi=1

|x i |2 . (6)

The 2-norm is often called the Euclidean norm. The 1-norm is dened as follows

x 1 =m

i =1|xi | . (7)

The 1-norm is also known as the Manhattan norm because it corresponds tothe distance traveled on a grid of city streets.

These norms are special cases of the general family of p-norms, dened by

x p =m

i=1|x i |

p

1p

. (8)

Note that the max-norm can be obtained as the limit as p + of the p-norm.For example, we have123

= 3 ,123 1

= 6 ,123 2

= 14,

and

248

= 8 , 2

4

8 1= 14 ,

24

8 2= 84.

5


6/159


Exercise 9. Check whether the following maps, dened on R 3 , are norms ornot

x x1 + x2 + x3 x |x1 + x2 + x3| x x41 + x42 + x43Exercise 10. Prove that the p-norm satises the properties (4).The closed unit ball {x R m | x 1}is the set of all vectors with a normsmaller than 1. The shape of this ball depends on the norm. The unit circle (or

sphere) is the set of all vectors with a norm equal to 1,

S 1 = {x R m | x = 1}. (9)In R

2, we can draw the unit circle as a curve composed of the points (x, y ) suchthat

xy = 1 .

There exists an innite number of points on this curve. For example, for theEuclidian norm, the unit circle contains the vectors

10 ,

01 ,

2/ 22/ 2 ,

1/ 2

3/ 2, . . .

The equation governing the unit circle for the Euclidian norm is

xy 2

= 1

x2 + y2 = 1 .

Several ways are possible to draw this curve. Among them, we can parametrizex and y as follows

x = cos( )y = sin( )

where belongs to [0, 2]. Figure 1 illustrates the unit circle for the Euclidiannorm. The equation governing the unit circle for the 1-norm is

xy 1

= 1 | x|+ |y| = 1 .The curve is included in the square [1, 1] [1, 1] and it is composed of thefollowing four branches:

x + y = 1 0 x, y 1x y = 1 x [0, 1] and y [1, 0]x y = 1 x, y [1, 0]x + y = 1 x [1, 0] and y [0, 1]

Figure 2 illustrates the unit circle for the 1-norm.

6


7/159


Figure 1: Unit circle for the Euclidian norm.

Figure 2: Unit circle for the Euclidian norm.

Exercise 11. Draw the unit closed ball {x R 2 | x 1}corresponding tothe 4-norm and -norm.Trefethen and Bau 1 note that

The Sergel plaza in Stockholm, Sweden, has the shape of the unitball in the 4-norm. The Danis poet Piet Hein popularized this su-perellipse as a pleasing shape for objects such as conference tables.

Other useful norms include the weighted p-norms, where each component of a vector is weighted. For example, a weighted 2-norm can be specied as follows

x = m

i =1

wi |x i |2 (10)

where the weights wi are strictly positive real numbers.1 L. N. Trefethen and D. Bau, Numerical linear algebra , SIAM, Philadelphia, 1997.

7


8/159


Exercise 12. Show that the map

|x1|

2 + 3 |x2|2 is a norm. Draw its closed

unit ball.

Exercise 13. Show that the map x1 3x2 2 + 3x1 + x2 2 is a norm.Draw its closed unit ball.For a given non-zero vector, its length depends on the norm chosen to mea-

sure it. However, in R m , all the norms are related. For example, we have

x x 2 m x . (11)In R 3 , consider the vector

x =1

82.

Then we have

1

82

= 8 = 82 12 + ( 8)2 + 2 2 =1

82 2

82 + 8 2 + 8 2 = 31

82

(12)

where the rst inequality is due to the fact that we add positive numbers to 82 .The second inequality comes from the fact that any component is smaller than8. To extend this proof to R m , we assume that the max-norm is reached at the

component I . Then we have

x = |xI | = |xI |2 I 1i =1 |xi |2 + |xI |2 +m

i = I +1|x i |

2 = x 2

|xI |2 + + |xI |2 = m x . (13)Note that these inequalities are sharp because they are attained with the fol-lowing vectors

10...0

=

10...0 2

= 1 and1...1

= 1 ,1...1 2

= m.

Exercise 14. Prove that x x 1 m x . Find vectors with equality.Exercise 15. Prove that 1m x 1 x 2 x 1 . Find vectors with equality.

8


9/159


10/159


Another important concept in linear algebra is the idea of two vectors beingorthogonal to one another, which is a generalization of perpendicular. We saythat x , y R m are orthogonal for the inner product (14) when x y = 0 , i.e.their inner product is 0. In R 2 or R 3 , two vectors are orthogonal if and only if the lines drawn from the origin to the points with coordinates dened by x andy are perpendicular to one another.

Example 16. Consider the vectors

13 and

2

2.

For the Euclidian inner product, the angle between the two vectors is

= arccos1 2 + 3 (2)

12

+ 32

22

+ 22

= arccos 410

8

= arccos 15

= 2 .0344,

approximately 116 degrees.

Example 17. The vectors

x = 13 y = 3

1

are orthogonal,x y = 1 (3) + 3 1 = 3 + 3 = 0 .

So are the vectors

x =

13

24y =

41

41/ 4

.

Indeed, we have

x y = 1 4 + 3 1 + ( 2) 4 + 4= 4 + 3 8 + 1 = 0 .Exercise 18. Compute the angle between the following pairs of vectors

10 ,

10 ,

10 ,

11 ,

13 , 1

0 .

1.5 Useful commands in Matlab

Here are a few commands in Matlab useful for this section. x = [1;2;3]; denes the vector x R 3 with components 1, 2, and 3.

z = zeros(m,1) denes the zero vector in R m .

x = ones(m,1) denes the vector x R m with components equal to 1.

10


11/159


size(x) returns the dimensions of x in a row vector [m 1] .

x = rand(m,1) makes a vector of length m with random values, uniformlydistributed between 0 and 1.

x = randn(m,1) makes a vector of length m with random values, normallydistributed (with mean 0 and variance 1).

norm(x) computes the 2-norm of vector x . norm(x,inf) , norm(x,1) ,norm(x,p) computes, respectively, the -norm, the 1-norm, and the p-norm.

dot(x,y) computes the Euclidian inner product between vectors x and y .

max([1;-4;3]) computes the maximum entry in the vector (here 3).

min([1;-4;3]) computes the minimum entry in the vector (here -4).

The next sequence generates a vector x with 1000 components linearlydistributed between 0 and 5

x = [];for i=1:1000,

x = [x; (i-1)*5.0/1000.0];end

Whenever possible, it is recommended to declare x with its nal size asfollows

x = zeros(1000,1);for i=1:1000,

x(i) = (i-1)*5.0/1000.0;end

11


12/159


2 Linear Spaces

Recall that R m denotes the set of all real m-vectors, i.e. the set of all vectorswith m components, each of which is a real number. We have also dened whatwe mean by addition of two vectors x , y R m : we obtain the sum by addingeach component, and the sum x + y is another vector in R m . We have alsodened scalar multiplication: if x R m and R then the vector x R mis dened by multiplying each component of x by the scalar . The set R m isclosed under addition and scalar multiplication. This just means that wheneverthese operations are applied to vectors in R m we obtain another vector in thesame set R m .

The set R m is an example of linear space, which can be dened more generallyas follows:

Denition 19. A real linear space, or R -linear space, consists of a set of objects

V along with two operations + (addition) and (scalar multiplication) subjectto these conditions:1. If u, v V then u + v V (closed under addition);

2. If u, v V then u + v = v + u (addition is commutative);

3. If u,v,w V then (u + v) + w = u + ( v + w) (addition is associative);

4. There is a zero vector 0 V such that v + 0 = v for every v V ;

5. Every v V has an additive inverse w V such that v + w = 0;

6. If v V and R then v V (closed under scalar multiplication);7. If v V and , R then ( + )

v =

v +

v;

8. If u, v V and R then (u + v) = u + v;9. If v V and , R then ( ) v = ( v);

10. If v V then 1 v = v.It is also possible to dene a complex linear space or C -linear space where thescalars are now complex.

Example 20. Verify the properties of Denition 19 for the set of column vectorsR m .

1. For any vector x and any vector y in R m , we have

x =x1...xm

y =y1...ym

x + y =x1 + y1...

xm + ym

.

So the vector x + y is also a column vector with m rows and belongs toR m .

12


13/159


2. For any vector x and any vector y in R m , the vectors x + y and y + x areequal. Indeed, we have

x + y =x1 + y1

...xm + ym

y + x =y1 + x1

...ym + xm

and the m components are equal because the addition of scalar numbersis commutative.

3. The addition of column vectors is associative. Indeed, we have

(x + y )+ z =(x1 + y1) + z1

...(x

m+ y

m) + z

m

x +( y + z ) =x1 + ( y1 + z1)

...x

m+ ( y

m+ z

m)

.

The m components are equal because the addition of scalar numbers isassociative.

4. The zero vector is the vector with all its m components equal to 0,

0 =0...0

.

5. For any vector x in R m , the vector y , dened by

x =x1...

xm

y = x1...

xm,

is such that x + y = 0 . The vector y is the additive inverse of x .

6. For any vector x in R m and any scalar in R , we have

x =x1...

xm

x =x 1

...x m

.

So the vector x is also a column vector with m rows and belongs to R m .

7. For any vector x in R m and any scalar and in R , we have

(+ )x =( + )x1

...( + )xm

=x 1 + x 1

...x m + x m

=x 1

...x m

+x 1

...x m

= x + x .

13


14/159


8. For any vector x and y in R m and any scalar in R , we have

(x + y ) = x1 + y1...

xm + ym

=(x1 + y1)...

(xm + ym )=

x 1 + y1...x m + ym

= x + y .

9. For any vector x in R m and any scalar and in R , we have

( )x =( )x1

...( )xm

=(x 1)

... (x m )

= x 1

...x m

= ( x ).

10. For any vector x in R m , we have

1 x = 1 x1...xm

=1 x1...1 xm

=x1...xm

= x .

So R m is a real linear space.

Example 21. The set of all possible functions mapping real numbers to realnumbers, f : R R , is denoted F (R , R ). We will verifty that F (R , R ) is areal linear space.

1. For any function f and g in F (R , R ), we have(f + g)(x) = f (x) + g(x).

So the function f + g is a function whose input is a real number and itsoutput is also a real number. f + g belongs to F (R , R ). For instance, if f (x) = 3 x2 and g(x) = cos( x), then f + g is the function dened by

(f + g)(x) = 3 x2 + cos( x), x R .

2. For any function f and g in F (R , R ), the functions f + g and g + f areequal. Indeed, we havex R , (f + g)(x) = f (x) + g(x) = g(x) + f (x) = ( g + f )(x).

We used the fact that the addition of scalar numbers is commutative.

3. The addition of functions is associative. Indeed, we have

x R , [(f + g) + h](x) = ( f + g)(x) + h(x) = f (x) + g(x) + h(x)= f (x) + ( g + h)(x) = [f + ( g + h)](x).

We used the fact that the addition of scalar numbers is associative.

14


15/159


4. The zero function 0 is the function identically 0 in R ,

0(x) = 0 , x R . (19)

5. The additive inverse of a function f is the continuous function f () denedby

f () (x) = f (x), x R . (20)Indeed, we have f (x) + f () (x) = f (x) f (x) = 0 for every x in R .

6. For any function f in F (R , R ) and any scalar in R , we havex R , (f )(x) = f (x).

So the function f is a function whose input is a real number and itsoutput is also a real number. f belongs to F (R , R ). For example, if g(x) = cos( x), 5g is the function dened by

(5g)(x) = 5 cos( x), x R .

7. For any function f in F (R , R ) and any scalar and in R , we havex R , [( + )f ](x) = ( + )f (x) = f (x)+ f (x) = ( f )(x)+( f )(x).

So the functions ( + )f and f + f are equal.

8. For any function f and g in F (R , R ) and any scalar in R , we havex R , [ (f + g)](x) = f (x) + g(x) = ( f )(x) + ( g )(x).

So the functions (f + g) and f + g are equal.

9. For any function f in F (R , R ) and any scalar and in R , we havex R , [( )f ](x) = ( )f (x) = f (x) = [f (x)] = (f )(x).

So the functions ( )f and (f ) are equal.

10. For any function f in F (R , R ), we havex R , (1 f )(x) = 1 f (x) = f (x).

Example 22. The set of all continuous functions mapping real numbers to realnumbers, f : R R , is denoted C 0(R , R ). C 0(R , R ) is a real linear space.

We emphasize that the zero vector takes different meanings according to the

linear space V . When V = Rm

, the zero vector is

0 =

00...0

, (21)

15


16/159


the vector with m zero components. When V = C0(R , R ), the zero vector 0 isthe function identically 0 in R ,0(x) = 0 , x R . (22)

We will mostly study linear algebra in the context of linear spaces of vectors,but the study of other linear spaces, particularly function spaces, is extremelyimportant in many branches of mathematics and many of the ideas introducedhere carry over to other linear spaces.

Exercise 23. Verify that R m is a Q -linear space, where Q is the set of signedrational numbers.

Exercise 24. Verify that C is a real linear space.

Exercise 25. Verify that C is a complex linear space.

Exercise 26. Show that the set P n of all polynomials with real coefficients of degree n is a real linear space. What is the zero vector?2.1 Subsets and subspacesSuppose we have a linear space V and S is a subset of V , which just meansevery element of S is also an element of V ,

v S = v V. (23)A subset might contain a nite or innite number of elements.

Example 27. The subset,

S 1 =12 ,

2.37 ,

1 , (24)

is a subset of R 2 with 3 elements.


S 2 = x R 3 : x2 = x21 + 3 x3 , (25)is a subset of R 3 with an innite number of elements, including

000

, 2

1

1,

141

.


S 3 = x R 2 : x2 = 3 x1 , (26)is a subset of R 2 with an innite number of elements, including

00 ,

2.3

6.9, 3 .

16


17/159


Denition 30. If S is a subset of a linear space V and S is closed underaddition and scalar multiplication, then we say thatS

is a subspace of V .

Consider the previous subsets.

The subset S 1 is not a subspace of R 2 because adding two vectors fromS 1 does not give a vector in S 1 .

The subset S 2 is not a subspace of R 3 because

2 2

1

1=

42

2/ S 2 .

The subset S 3 is a subspace of R 2 .Example 31. C 0(R , R ) denotes the set of functions f ,

f : R R ,that are continuous. For instance, f (x) = 3 x2 + cos( x) and g(x) = |x| belongto C 0(R , R ). C 0(R , R ) is a subspace of F (R , R ). Indeed, it is a subset of F (R , R ), which is a real linear space. For any function f and g in C 0(R , R ),the sum function f + g is continuous. So the sum function f + g belongs toC 0(R , R ). C 0(R , R ) is closed for the addition. For any function f in C 0(R , R )and any scalar in R , the function f is continuous. So the function f belongsto C 0(R , R ). C 0(R , R ) is closed for the scalar multiplication. Consequently,C 0(R , R ) is a subspace of F (R , R ). It is also a real linear space.Example 32. C 1(R , R ) denotes the set of functions f ,

f : R R ,that are continuous and differentiable and whose derivative f is also continuous.For instance, f (x) = 3 x2 +cos( x) is in C 1(R , R ). On the other hand, g(x) = |x|belongs to C 0(R , R ) but not to C 1(R , R ). C 1(R , R ) is a subspace of F (R , R ).Indeed, it is a subset of F (R , R ), which is a real linear space. For any function f and g in C 1(R , R ), the sum function f + g is continuous. It is also differentiableand its derivative, equal to f + g , is also continuous. So the sum functionf + g belongs to C 1(R , R ). C 1(R , R ) is closed for the addition. For any functionf in C 1(R , R ) and any scalar in R , the function f is continuous. It isalso differentiable and its derivative, equal to f , is also continuous. So thefunction f belongs to C 1(R , R ). C 1(R , R ) is closed for the scalar multiplication.Consequently, C 1(R , R ) is a subspace of

F (R , R ). It is also a real linear space.

Example 33. C p(R , R ) denotes the set of functions f ,

f : R R ,that are continuous and differentiable p times and whose p-th derivative f ( p) isalso continuous. C p(R , R ) is a subspace of C 0(R , R ).

17


18/159


Exercise 34. Show that the set P n of all polynomials with real coefficients of degree n is a subspace of C 0(R , R ).Note the following about subspaces:

The set S = V is a subspace of V (R 2 is a subspace of R 2). If S is asubspace of V that is not all of V then it is called a proper subspace of V . A subspace S of a real linear space V is also a real linear space. The set Z = {0}that contains only the zero element of the linear space V is a subspace of V since 0 + 0 = 0 and 0 = 0 so this set is closed underthese operations. The set Z = {0}is the only subspace of V that contains a nite number of elements (just 1 element). All other subspaces contain an innite number

of elements. Why? Because if v

S then v

S for any real number

(of which there are innitely many). If 1v = 2v then by the rules of Denition 19, we can rewrite this as (1 2)v = 0. But this can be trueonly if v = 0 or if 1 2 = 0 .

The fact that a subspace is also a real linear space suggests a technique to provethat a set U is a real linear space:

1. Find a real linear space U such that U U ;2. Show that U is a subspace of U .

Example 35. Consider the set

S = f C 0(R , R ) |f (0) = 0 .We would like to check that S is a real linear space. Two approaches are possible:

1. Use Denition 19 and check all 10 items.

2. Find a superset U , which is a real linear space, such that S U . Showthat S is a subspace of U by checking the properties of being closed forthe addition and the multiplication (only 2 checks):(a) u, v S , u + v S (b) R , u S , u S

To prove that S is a real linear space, we will use the second approach. We needto nd a superset. The denition of S suggests C 0(R , R ) as superset. Indeed,every function inS

is a continuous function. We know that C 0(R , R ) is a reallinear space. So we just need to check the properties of being closed.

Let f and g be two functions in S . The sum f + g is a continuous functionbecause f and g are continuous. To check whether f + g belongs to S , we needalso to compute the value of f + g at 0:(f + g)(0) = f (0) + g(0) = 0 + 0 = 0 .

18


19/159


So f + g belongs to S .Let f be a functions in

S and R . The product f is a continuous

function because f is continuous. To check whether f belongs to S , we needalso to compute the value of f at 0:(f )(0) = f (0) = 0 = 0.

So f belongs to S .We conclude that S is a subspace of C 0(R , R ) and it is also a real linearspace.

2.2 Linear dependence and independence

If x , y R m are any two vectors and = = 0 , then

x + y = 0 x + 0 y = 0 + 0 = 0.

So this trivial linear combination of x and y is always the zero vector.In a real linear space V , two vectors u, v V are said to be linearly dependent

if there is some nontrivial linear combination of u and v that gives 0, i.e. if thereare scalars , R that are not both equal to zero but for which u + v = 0.In general, two vectors in V are linearly dependent if (and only if) one is ascalar multiple of the other, for example if v = u for some scalar . Since thenu v = 0 (or any other linear combination with = 0 and = gives thezero vector).

Two vectors are said to be linearly independent when there is no nontriviallinear combination of u and v that gives 0. In other words, they are linearlyindependent if the equation

u + v = 0 (27)

has only the trivial solution = = 0 . Two vectors are linearly independentwhen one is not a scalar multiple of the other.


x = 12 , y =30

are linearly independent since neither one is a scalar multiple of the other.Another way to see this is that the equation u + v = 0 is

+ 3 2 =

00 .

The second component is zero only if = 0 . But then the rst componentbecomes 3 , which is zero only if = 0 . So (27) is satised only when = = 0 .

19


20/159


Example 37. The polynomials p and q, dened by

p(x) = 1 , q(x) = x, x R ,

are linearly independent. Indeed, the equation p + q = 0 becomes

+ x = 0 , x R .

Taking x = 0 gives that must be 0. Then any nonzero value of x implies that = 0 . So (27) is satised only when = = 0 .

The idea of linear dependence and independence can be extended to sets of more than 2 vectors.

Denition 38. The set of r vectors u(1) , u (2) , , u ( r ) V is linearly indepen-dent if the equation1u(1) + 2u(2) + + r u( r ) = 0 (28)

has only the trivial solution 1 = 2 = = r = 0 , i.e. if every nontriv-ial linear combination of the vectors is nonzero. When the set is not linearlyindependent, it is said to be linearly dependent.


x = 12 , y =30 , z =

11

are linearly dependent. Consider a linear combination resulting in the zerovector

x + y + z =00 .

Then we have + 3 +

2 + =00 .

Identifying each entry results in the following system of equations

+ 3 + = 02 + = 0

These equations imply that = 2 and = / 3. However, remains arbi-trary. For example, a linear combination with = 3, = 1 , and = 6 resultsin the zero vector. This non-trivial combination implies that the 3 vectors arelinearly dependent.Example 40. The vectors,

x (1) =700

, x (2) =120

, x (3) =345

,

20


21/159


are linearly independent. Indeeed, 1x (1) + 2x (2) + 3x (3) = 0 if and onlyif 1 = 2 = 3 = 0 . Let us prove this result. If 1 = 2 = 3 = 0 , then 1x (1) + 2x (2) + 3x (3) = 0 . If a linear combination of x (1) , x (2) , and x (3) iszero, then we have

1x (1) + 2x (2) + 3x (3) =7 1 + 2 + 3 3

2 2 + 4 35 3

=000

.

By matching the third component, we get that 3 has to be 0. Plugging thisvalue into the second component, we obtain that 2 is also 0. Then the rstcomponent implies that 1 is 0. A linear combination of x (1) , x (2) , and x (3) iszero if and only if 1 = 2 = 3 = 0 .

Exercise 41. Check the linear dependency of the vectors

x (1) = 123

, x (2) = 2.14.26.3

.

Exercise 42. Check the linear dependency of the vectors

x (1) =123

, x (2) = 2

11.5

, x (3) =100

.

Example 43. The functions 1, x, and x2 are linearly independent. Indeed,consider the linear combination

+ x + x 2 = 0 , x R (29)

Taking x = 0 implies that must be 0. Then taking x = 1 and x = 1 gives + = 0

+ = 0which implies that = = 0 . Summing the two equations results in = 0 .Plugging = = 0 in any equation gives that = 0 . So the linear combination(29) is zero if and only if = = = 0 .

Example 44. We will check the linear dependency of the functions 1, ex , andex . Consider a linear combination that is equal to the zero function:

+ e x + e x = 0 , x R .

Differentiating this relationship, we have alsoe x e x = 0 , x R .

Taking a second derivative, we get

e x + e x = 0 , x R .

21


22/159


Summing the last two equations, we obtain

2e x = 0 , x R ,

which implies that = 0 . Plugging this value into the equation after onedifferentiation, we have

e x = 0 , x Rand = 0 . Finally, the rst relation gives that = 0 . Consequently, thefunctions 1, ex , and ex are linearly independent.

Exercise 45. Check the linear dependency of the functions 1, cos(x ), andsin(x ) on [1, 1].2.3 Span of a set of vectors

Let V denote a real linear space and u(1) , , u( r ) V be a set of r vectors.Then the span of this set of vectors is the space of all linear combinations of these vectors,

span (u(1) , , u ( r ) ) = 1u(1) + + r u( r ) ; 1 , , r R . (30)This is a subspace of V . Since any linear combination of vectors in this set isagain a linear combination of u(1) , , u( r ) V .Example 46. The subspace S 3 , given by (26), can be written as

S 3 = span13 .

Example 47. The space

S = span10 ,

01

is all of R 2 , since any vector x R 2 can be written as a linear combination of these two vectors:

x1x2

= x110 + x2

01 .

Example 48. The space

S = span 1

0, 0

1, 2

3is all of R 2 , since any vector x R 2 can be written as a linear combination of these three vectors:

x1x2

= x110 + x2

01 + 0

23 .

22


23/159


Actually, in this case, there are innitely many different ways to write an arbi-trary vector x R 2 as a linear combination of these three vectors. For example,we could write it as

x1x2

= x110 + ( x2 3x1)

01 + 2 x1

23 .

Example 49. The set P 2 of real polynomials of degree at most 2 is a real linearspace. Any polynomial in P 2 is written as p(x) = + x + x 2 .

It is a linear combination of functions 1, x, and x2 . So we have

P 2 = span 1,x ,x 2 .

2.4 Basis vectorsConsider V a real linear space and u(1) , , u( r ) V be a set of r vectors.Then the span of this set of vectors,

span (u(1) , , u ( r ) ) = 1u(1) + + r u( r ) ; 1 , , r R , (31)denes a subspace of V . In fact it can be shown that any subspace of R m hasthis form it is the span of some set of vectors. A minimal set of vectors thatdene a space is called a basis for the space. What do we mean by minimal?

The subspace S 3 , given by (26), can be written as

S 3 = span 13 = span 26 = span 13 , 26 .In the latter case, note that the last two vectors are linearly dependent. Clearly,we require at least one vector to dene this particular space, but specifying twovectors is redundant. We say that either one of these vectors alone is a basis forthis particular space. More generally we make the following denition to makethis idea precise:

Denition 50. If S is a subspace of a linear space V , then the set of vectorsu(1) , , u( r ) V form a basis for S if

S = span (u(1) , , u( r ) ), (32a)u(1) , , u( r ) are linearly independent . (32b)

If a set of vectors spanning the space are not linearly independent, then wecan nd a basis consisting of fewer vectors.

23


24/159


25/159


26/159


The vector x is a vector with 3 rows and its last component is equal to 0. Sox belongs to

S and

S is closed for the scalar multiplication. So

S is a subspace

of R 3 and a real linear space. To nd a basis for S and the dimension, we writea general formula for any vector in S x S x =

x1x20

.

This general formula has two parameters x1 and x2 . So we can write

x S x =x1x20

= x1100

+ x2010

.

So any vector in S is a linear combination of the two vectors100

and010

,

which implies that

S = span100

,010

.

To obtain a basis, we need to check whether these two vectors are linearlyindependent. Consider a linear combination equal to the vector 0

100

+ 010

=000

0

=000

.

Identifying the entries, we obtain that = = 0 . The vectors are linearlyindependent and they span S . So they form a basis for S . The dimension of S is 2.Example 63. Check whether the set f C 1(R , R ) | f (x) = f (x) is a reallinear space. Determine a basis and the dimension. Denote S the set to study.We notice that S is a subset of C 1(R , R ), which is a real linear space. So wewill show that S is a subspace of C 1(R , R ). This will prove also that S is areal linear space. So we need to check whether S is closed for the addition andclosed for the scalar multiplication. Consider any function f and g in S , thenthe function f + g is a function in C 1(R , R ) because C 1(R , R ) is a real linearspace. To check whether f + g belongs to S , we have to compute the derivativeto see if it is equal to f + g:

(f + g) (t) = f (t) + g (t) = f (t) + g(t) = ( f (t) + g(t)) = ( f + g)( t)

So f + g belongs to S . Finally, we need to check whether S is closed for themultiplication. Consider any function f in S and any real number , then thefunction f is a function in C 1(R , R ) because C 1(R , R ) is a real linear space.

26


27/159


To check whether f belongs to S , we have to compute the derivative to see if it is equal to f :(f ) (t) = f (t) = (f (t)) = f (t) = ( f )( t).

So f belongs to S . We conclude that S is a subspace of C 1(R , R ) and it is alsoa real linear space. Any function in S is solution to the equation f (t) = f (t).A general solution is f (t) = e t , for any R . No other solution exists. Anysolution is proportional to et . So we have S = span( et ). A non-zero function isnecessarily linearly independent. So (et ) is a basis for S . The dimension of S is1.Example 64. Consider the set S = { p P 3 | p(0) = p(1) = p(2) = 0 }. S is composed of polynomials of degree at most 3 that take the value 0 whenevaluated at x = 0 , x = 1 , and x = 2 . S is a real linear space. Indeed it is asubset of P 3 , which is a real linear space. S is closed for the addition because,for any polynomials p and q in S , p+ q is a polynomial in P 3 . When evaluated at0 (or 1 or 2), p + q is equal to zero because p(0) = q(0) = 0 (or p(1) = q(1) = 0or p(2) = q(2) = 0 ). So S is closed for the addition. It is also closed for themultiplication because the values at 0 of p is equal to p(0) = 0 (the samething holds for x = 1 and x = 2 ). So S is a real linear space. We can write ageneral formula for any polynomial in S .

+ x + x 2 + x3 = 0(x = 0) = 0(x = 1) + + = 0(x = 2) 2 + 4 + 8 = 0

Substracting two times the next-to-last equation from the last equation, we get2 + 4 + 8 (2 + 2 + 2 ) = 2 + 6 = 0 = = 3.

Then we get for = = 3 = 2 . So any polynomial in S has theform p(x) = (2x 3x2 + x3).

The space S is spanned by the polynomial 2x 3x2 + x3 , which is non-zero.The polynomial 2x 3x2 + x3 forms a basis. The dimension of S is 1.

27


28/159


3 Linear Functions

First, recall what we mean by the notation

f : U V. (35)The function f takes an element of U as input. It is dened for any element of U . The function value f (u) is an element of V .

We are now ready to dene what we mean by a linear function.

Denition 65. Consider U and V two real linear spaces. The function f :U V is a linear function if both of the following conditions are satised:

u(1) , u (2) U, f (u(1) + u(2) ) = f (u(1) ) + f (u(2) ) (36a)u U and R , f (u ) = f (u) (36b)

When U and V are two complex linear spaces, condition (36b) is modied suchthat belongs to C . Note that f (0) = 0, by taking = 0 .

3.1 Linear functions from R to RThe conditions (36) are very restrictive. The only functions satisfying theseconditions are functions of the form

f (x) = ax (37)

where a is some xed real number. Indeed, we have

f (x) = f (x

1) = xf (1) (38)

and

f (x + y) = f (x) + f (y) = f (x 1) + f (y 1) = xf (1)+ yf (1) = ( x + y)f (1) . (39)The graph of such a function is simply a line through the origin with slope a,as illustrated in Figure 3 for two choices of a. The fact that the graph is a linehelps us remember why these functions are called linear.

Now consider the function g(x) = 6 x 3, whose graph is shown in Figure 4.This graph is also a straight line but this is not a linear function according to thestrict terms of Denition 65. It cannot be since g(0) = 0 and we can also easilycheck that g(1+ 2) = 15 while g(1)+ g(2) = 12 , for example. The function g(x)is properly called an affine function although people are often imprecise and call

it linear. It really consists of a linear function 6x plus a translation (shiftingeach point downwards by 3), and this is what is meant by an affine functionmore generally: a linear function shifted a constant (in this case 3).Another way to dene an affine function is to say that g(x) is affine if g(x) g(y) is a linear function of x y, or equivalently if, for any xed point x0 thefunction f (s) = g(x0 + s) g(x0) is a linear function of s. You can check

28


29/159


Figure 3: (Left) Graph of f (x) = 2 x. (Right) Graph of f (x) = 0.5x.

Figure 4: Graph of the affine function g(x) = 6 x 3.

that this is true for any function of the form g(x) = ax + b, and these are the

only affine functions for the case we are currently considering, the simple caseof a function mapping R to R . (Soon these ideas will be generalized to moreinteresting situations, so make sure you understand these basic ideas even if they seem trivial now!)

3.2 Linear functions from R to R m

We might have a situation where there are two different output values f 1(x)and f 2(x) that depend on all the same input value x. We could then use thesymbol f (x) to denote the vector

f (x) = f 1(x)f 2(x)(40)

containing these two output values. If the outputs are real numbers and aredened for any real number x, then we would say that f maps R to R 2 .

This function is said to be linear if both real functions f 1(x) and f 2(x) arelinear functions in the sense of Denition 65. From what we noticed in theprevious section, we see that a function f : R R 2 is linear only if it has the

29


30/159


formf (x) = a1x

a2x(41)

where the values a1 and a2 are two scalar constants.More generally, a function f : R R m is linear if and only if it has the form

f (x) =a1x

...am x

(42)

where the values a1 , , a m are m scalar constants.3.3 Linear functions from R n to RSuppose f is a function that depends on several input values, say x1 , x2 , andx3 . To make the notation simple, we can still talk about the function f (x ) if we now think of x as the vector

x =x1x2x3

. (43)

Using the standard unit basis (33), we write a linear combination in termsof the unit basis vectors

x1x2x3

=x100

+0

x20

+00

x3= x1

100

+ x2010

+ x3001

. (44)

It is easy to show

f (x ) = x1f 100

+ x2f 010

+ x3f 001

. (45)

An essential property of a linear function is the following: if we know the valuesof

f 100

, f 010

, and f 001

, (46)

then we can easily nd the value of f (x ) for any x R 3 . Indeed, any vector of R 3 is a linear combination of basis vectors

100

,010

, and001

. (47)

Taking linear combinations of vectors is a fundamental operation in linear alge-bra.

30


31/159


It turns out that if f : R n R is a linear function, then all we need todo is evaluate the function f for some well chosen set of n linearly independentvectors and we can evaluate f (x ) for any x in all of R n . This is one of the reasonswhy linear problems and linear equations are so important in applications.

You might guess that every linear function mapping R 3 to R must have theform

f (x ) = a1x1 + a2x2 + a3x3 (48)

for some constant real numbers a1 , a 2 , and a3 . You would be right! Moregenerally, a function f : R n R is linear if and only if it has the form

f (x ) = a1x1 + a2x2 + + an xn (49)where the values a1 , , a n are n scalar constants.

3.4 Linear functions from R n to R m

Now suppose that the function f is a vector with m components that dependson n input values. We can write

f (x ) =f 1(x )

...f m (x )

. (50)

According to Denition 65, f is a linear function if and only if the m com-ponents f 1 , , f m are linear functions of x .

From what we noticed in the previous sections, the ith function f i (x ) is then

given byf i (x ) = a i 1x1 + a i 2x2 + + a in xn =

n

j =1

a ij x j (51)

for i = 1 , 2, . . . , m .We often work with linear functions with many inputs and outputs and

so it is nice to simplify the notation for describing these functions. This wasrecognized many years ago and so the notation of a matrix was invented. Thefunction is uniquely determined by mn numbers that are naturally arranged ina matrix with m rows and n columns:

A =

a11 a12 a1na21 a22 a2n...

.

.....

am 1 am 2 amn. (52)

A common notation is

A = ( a ij ) i=1 ,,m ;j =1 ,,n = ( a ij )m n (53)

31


32/159


For shorthand, we might write

f (x ) = Ax , (54)where x is the vector of inputs with n components. This notation suggests thatwe multiply the m n matrix A by the vector x to obtain the vector f (x ).Denition 66. The set of all real matrices with m rows and n columns isdenoted R m n .

We dene the concept of matrix-vector multiplication so that this is correct:

Denition 67. If A is an m n matrix of the form (52) and x R n is ann-vector, then the product b = Ax is an m-vector ( b R m ) and the ithcomponent of b is

bi = a i 1x1 + a i 2x2 + + a in xn =n

j =1a ij x j . (55)

The matrix-vector multiplication can also be displayed as follows:

b = a 1 a 2 a nx1x2...

xn

= x1a 1 + x2a 2 + . . . + xn a n

where b is expressed as a linear combination of the columns a j . It is a slightchange of notation to highlight that x acts of A to produce b . For example, wehave

1 0 2

1 3 1321

= 1 3 + 0 2 + 2 1(1) 3 + 3 2 + 1 1= 54 . (56)

Note that once we have dened matrix-vector multiplication properly, thefact that any linear function can be written in the form (54) (for some particularmatrix A ) is a nice generalization of the fact that any single linear function of a single variable has the form f (x) = ax for some single number a. That simplecase is just the case m = n = 1 in which the matrix a is 1 1.Next we dene the addition between two matrices and the scalar multipli-cation.

Denition 68. For any matrices A = ( a ij )m

n and B = ( bij )m

n in R m n ,the addition A + B is a matrix in R m n whose entries are the sum of the entriesof A and B , i.e. A + B = ( a ij + bij )m n .

Denition 69. For any matrix A = ( a ij )m n in Rm n and any scalar in R ,

the scalar multiplication A is a matrix in R m n whose entries are the entriesof A multiplied by , i.e. A = ( a ij )m n .

32


33/159


An alternative denition of the addition of two matrices uses linear functions.Suppose we have two linear functions f (x ) = Ax and g (x ) = Bx that are denedby two different matrices A and B (both of which are m n, so both f and gmap R n to R m ). Now dene a new function h (x ) by

h (x ) = f (x ) + g (x ).

This means that, for any vector x R n , to compute the value of h (x ) R m ,we rst compute the two vectors f (x ) and g (x ). Then we add them together,using the vector addition rule from R m . So the ith component of h (x ) is justthe ith component of f (x ) added to the ith component of g (x ).

Since f and g are both linear functions, it turns out that the function h isalso a linear function. To see this, note that the ith component of h (x ) is

h i (x ) = f i (x ) + gi (x )

=n

j =1

a ij xj +n

j =1

bij x j

=n

j =1

(a ij + bij )x j .

(57)

But this means that h (x ) is dened by the matrix-vector multiplication

h (x ) = Cx

where C is the m n matrix with components cij = a ij + bij . In other words,C = A + B

where we dene the sum of two matrices of the same shape in the obvious way,by adding the corresponding elements of the two matrices. So h is a linearfunction and the matrix that denes it is simply the sum of the matrices A andB dening the functions f and g .

Proposition 70. R m n is a real linear space.

Remark. The set of all matrices with complex entries, m rows, and n columnsis denoted C m n , which is a complex linear space.

Next, we give a proof that R m n is a real linear space.

1. For any matrix A and any matrix B in R m n , we haveA = ( a ij )m n B = ( bij )m n A + B = ( a ij + bij )m n .

So the matrix A + B is also a matrix with m rows and n columns and itbelongs to R m n .

33


34/159


2. For any matrix A and any matrix B in R m n , the vectors A + B andB + A are equal. Indeed, we have

A + B = ( a ij + bij )m n B + A = ( bij + a ij )m nand the mn components are equal because the addition of scalar numbersis commutative.

3. The addition of matrices is associative. Indeed, we have

(A + B )+ C = (( a ij + bij ) + cij )m n A +( B + C ) = ( a ij + ( bij + cij )) m n .

The mn components are equal because the addition of scalar numbers isassociative.

4. The zero matrix in R m n is the matrix with all its mn components equalto 0,

0 =0 0... ...0 0

.

5. For any matrix A in R m n , the matrix B , dened by

A = ( a ij )m n B = ( a ij )m n ,is such that A + B = 0 . The matrix B is the additive inverse of A .

6. For any matrix A in R m n and any scalar in R , we have

A = ( a ij )m n A = ( a ij )m n .So the matrix A is also a matrix with m rows and n columns and itbelongs to R m n .

7. For any matrix A in R m n and any scalar and in R , we have

( + )A = (( + ) a ij )m n = ( a ij + a ij )m n= ( a ij )m n + ( a ij )m n = A + A .

8. For any matrices A and B in R m n and any scalar in R , we have

(A + B ) = (a ij + bij )m

n = ( a ij + bij )m

n

= ( a ij )m n + ( bij )m n = A + B .

9. For any matrix A in R m n and any scalar and in R , we have

( )A = (( ) a ij )m n = ( a ij )m n = (a ij )m n = ( A ) .

34


35/159


10. For any matrix A in R m n , we have

1 A = 1 (a ij )m n = (1 a ij )m n = ( a ij )m n .So R m n is a real linear space.Remark. Note that R m is the set of column vectors with m rows and 1 column.R m 1 denotes the set of matrices with m rows and 1 column.Proposition 71. The matrix-vector multiplication satises

Ax + Bx = ( A + B )x (58)

and Ax + Ay = A (x + y ). (59)

The proofs simply use the associativity of real numbers to rewrite a ij x j +bij x j as (a ij + bij )x j . Indeed we have

Ax =

nj =1 a1j x j

...nj =1 amj x j

Bx =

nj =1 b1j x j

...nj =1 bmj x j

Ax + Bx =

nj =1 a1j x j +

nj =1 b1j x j

...nj =1 amj x j +

nj =1 bmj x j

=

nj =1 (a1j + b1j )x j

...nj =1 (amj + bmj )x j

So we have proved something nontrivial about matrix-vector algebra using therules of standard algebra of real numbers. It is also true for the other property

of matrix-vector multiplication.Exercise 72. Consider the matrices A = ( a ij )m n and B = ( bij )m n . Let, R . What are the entries of the matrix A + B ?

Exercise 73. Consider the square matrix A = ( a ij )n n , i.e. m = n. Thefunction,

tr( A ) = a11 + a22 + + ann =n

i=1a ii , (60)

evaluates the trace of a square matrix. Show that the trace function is a linearfunction on R n n .

3.5 Linear differential operatorsRecall that algebra and the idea of linearity applies to to situations other thanvectors and matrices. Many of the fundamental concepts we learn about linearalgebra for vectors and matrices carry over directly to functions and differen-tial operators. Seeing these ideas in a different context now may help you tounderstand what linearity means more generally and why it is important.

35


36/159


Let us introduce the concept of a operator, which is just a function that takesa function as input and produces some other function as output. This gets abit confusing so we call it an operator instead of a function. A differentialoperator computes the output function by combining various derivatives of theinput function.

The simplest differential operator is just D = ddx . For example, if f (x) = x3

then D (f ) is the function f (x) = 3 x2 . The operator D is a linear operator, itsatises the same linearity properties as in Denition 65, where U = C 1(R , R )and V = C 0(R , R ).

The operator D is a linear operator because it is true that

D (u + v) = D (u) + D (v), sinced

dx(u(x) + v(x)) =

ddx

u(x) +d

dxv(x), (61)

and

D (u ) = D (u), since ddx (u (x)) = ddx u(x). (62)

These linearity properties extend of course to arbitrary linear combinations.You are used to using this when nding the derivatives of complicated functionsby splitting them up, e.g.

ddx

5x4 + 2 cos(6 x) = 5d

dx(x4) + 2

ddx

cos(6x) = 20 x3 12 sin(6x).Linearity was used in the rst step.

The second derivative operator d2

dx 2 and all higher order derivative operatorsare also linear operators. We obtain a general linear differential operator bytaking a linear combination of these differential operators (and also the zeroth

order derivative operator, which is just the identity operator that maps anyfunction to itself). For example, the operator

L = 8d2

dx2 4d

dx+ 6 I (63)

is a linear differential operator, where I denotes the identity operator. ApplyingL to a function u(x) results in the function

(Lu )(x) = 8 u (x) 4u (x) + 6 u(x). (64)Now consider the differential equation

8u (x)

4u (x) + 6 u(x) = x3 (65)

for 0 x 1 with u(0) = 1 and u(1) = 3 . The problem is to nd a funtion u(x)that satises the equation (65) everywhere in the interval and also satises thetwo boundary conditions. This is a linear differential equation since it has theform (Lu )(x) = g(x) where L is the linear operator of (64) and g(x) is a givenfunction g(x) = x3 .

36


37/159


38/159


4 Matrices

4.1 Space of m n matricesThe set of m n real matrices is denoted R m n . The previous section provedthat R m n is a real linear space. Its dimension is equal to mn . A simple basisfor R m n consists of the mn matrices with only one non-zero entry.

For example, consider R 32 . Any matrix with 3 columns and 2 rows iswritten as follows

A =a11 a12a21 a22a31 a32

, where a11 , a12 , a21 , a22 , a31 , a32 R .

The matrix A depends on 6 parameters: a11 , a12 , a21 , a22 , a31 , and a32 . Wewrite a linear combination of matrices where each parameter is multiplying amatrix

A = a111 00 00 0

+ a120 10 00 0

+ a210 01 00 0

+ a220 00 10 0

+ a310 00 01 0

+ a320 00 00 1

.

This last formula illustrates that the 6 matrices span R 32 ,

R 32 = span1 00 00 0

,0 10 00 0

,0 01 00 0

,0 00 10 0

,0 00 01 0

,0 00 00 1

.

These 6 matrices are also linearly independent. Indeed, we have

1 00 00 0

+ 0 10 00 0

+ 0 01 00 0

+ 0 00 10 0

+0 00 01 0

+ 0 00 00 1

=

=0 00 00 0

.

By identifying each entry to zero, we obtain the following 6 equations

= 0 (from entry (1,1)) = 0 (from entry (1,2)) = 0 (from entry (2,1)) = 0 (from entry (2,2))

= 0 (from entry (3,1)) = 0 (from entry (3,2))

38


39/159


So these 6 matrices are linearly independent because the only linear combinationresulting in the zero matrix is the trivial combination with

= = = = = = 0 .

Consequently, a basis for R 32 is given by

1 00 00 0

,0 10 00 0

,0 01 00 0

,0 00 10 0

,0 00 01 0

,0 00 00 1

. (66)

The dimension of R 32 is 6.Remark. The set of m n complex matrices is denoted C m n . It is a real linearspace of dimension 2mn .Remark. The set of m

n complex matrices is denoted C m n . It is a complex

linear space of dimension mn .

4.2 Matrix-matrix multiplicationIf A R m r and B R r n , then we can dene the product C = AB , whichwill be a matrix in R m n . Note that this product is only dened if the numberof columns in A is equal to the number of rows in B .

The elements of the product matrix C are given by

cij =r

k =1

a ik bkj . (67)

For example, if

A =

a11 a12a21 a22a31 a32a41 a42

, B = b11 b12 b13b21 b22 b23, (68)

then C R 34 and

C =

c11 c12 c13c21 c22 c23c31 c32 c33c41 c42 c43

, (69)

where, for example,

c11 = a11 b11 + a12 b21 ,c12 = a11 b12 + a12 b22 ,

...c43 = a41 b13 + a42 b23 .

39


40/159


Figure 5: Illustration of Matrix-Matrix product (Diagram from Wikipedia).

Figure 6: Example of Matrix-Matrix product (Diagram from Wikipedia).

The matrix-matrix product is illustrated in Figure 5. A numerical example isdepicted in Figure 6. The diagram highlights the origins of each coefficient.

Written in terms of columns, the product is

c1 cn = A b 1 b n = Ab 1 Ab n .(70)

Remark. In the case where n = 1 , B only has one column, the matrix-matrixmultiplication AB agrees with matrix-vector multiplication.Remark. Yet another way to view matrix-matrix multiplication is in terms of

rows. The ith row of the product C = AB is the ith row of A multiplied bythe matrix B .Suppose A , B R n n are both square and of the same size. Then the

products AB and BA are both dened. Each product is again an n n ma-trix. Note, however, that in general AB = BA . Matrix multiplication is notcommutative in general!

40


41/159


Example 76. Let

A =1 20 3 , B =

4 51 0 . (71)

Then

AB = 6 53 0 , BA =4 231 2 . (72)

Denition 77. Consider A , B R n n . When the products AB and BA areequal, we say that A and B commute.

Example 78. A particular example of matrix-matrix product is the outer prod-uct. Consider the product of an m-dimensional column vector, u R m 1 , withan n-dimensional row vector, v R 1n . The outer product is an m n matrixthat can be written

u v1 vn = v1u vn u =u1v1 u1vn... ...un v1 un vn (73)

The columns are all multiple of the same vector u , and similarly, the rowsare all multiple of the same vector v .

4.3 Range and rank of a matrixDenition 79. Let a (1) , , a (n ) R m denote the columns of a matrix AR m n . The column space of A , also called the range of A , is the subspace of R m spanned by the columns of A . It is denoted by

R(A ) = span( a (1) , , a (n ) ). (74)Denition 80. The dimension of the range of A is called the rank of A :

rank( A ) = dim( R(A )) . (75)Note the following:

rank( A ) m, since R(A ) is a subspace of R m ; rank( A ) n, since R(A ) is spanned by the n columns of A . So a basisfor

R(A ) has at most n vectors;

rank( A ) = n if and only if the columns of A are linearly independent. Inthis case, the columns form a basis for R(A );

if rank( A ) < n , then the columns are linearly dependent and there existsa vector z = 0 such that Az = 0 = 0 m 1 .

41


42/159


Similarly, the row rank of a matrix is the dimension of of the space spannedby its rows. Row rank always equals column rank. So we refer to this numbersimply as the rank of a matrix. An m n matrix of full rank is a matrix withthe maximal possible rank ( min( m, n )). For example, a matrix of full rank withm n must have n linearly independent columns.Remark. The rank is dened similarly for complex matrices in C m n .Example 81. Consider the matrix

A =2 6

5 31 2.

We denote a 1 and a 2 the two columns of A . The range of A is spanned by thesetwo column vectors:

R(A ) = span( a 1 , a 2).

To nd the dimension of R(A ), we need to nd a basis. The dimension willbe the number of vectors in the basis. To get a basis, we beed to nd linearlyindependent vectors that span the range of A . Assume there exists a linearcombination of a 1 and a 2 equal to the zero vector:

2

51+

632

=000

2 + 6 = 0

5 + 3 = 0 + 2 = 0The last equation gives that = 2 . Replacing this value into the rstequation gives 6 4 = 0 , which implies that = 0 . Consequently, we have = = 0 and the two column vectors are linearly independent. So (a 1 , a 2) isa basis for

R(A ). The rank of A is 2.

Example 82. Consider the matrix

A = 3 02 4 .

We denote a 1 and a 2 the two columns of A . The range of A is spanned by thesetwo column vectors:

R(A ) = span( a 1 , a 2).To nd the dimension of R(A ), we need to nd a basis. The dimension willbe the number of vectors in the basis. To get a basis, we beed to nd linearlyindependent vectors that span the range of A . Assume there exists a linearcombination of a 1 and a 2 equal to the zero vector:

32 + 04 =

00

3 = 02 + 4 = 0

The rst equation gives that = 0 . Replacing this value into the last equationgives = 0 . The two column vectors are linearly independent. So (a 1 , a 2) is abasis for R(A ). The rank of A is 2.

42


43/159


44/159


45/159



A = 3 0 12 8 21 2 1

.

We have seen that the rank of this matrix is 2. The rank-nullity theorem gives

dim N (A ) = 3 2 = 1.The dimension of its null space is 1. We have N (A ) = span( z ), where z is anonzero vector. Assume z =

z1z2z3

. We have

Az = 03 0 12

8 2

1 2 1z1z2z3

=3z1 z32z1

8z2 + z3

z1 2z2 + z3The rst equation gives that z3 = 3 z1 . Replacing this value into the second andthird equation gives

z3 = 3 z12z1 8z2 + 6 z1 = 0z1 2z2 + 3 z1 = 0

z3 = 3 z18z1 8z2 = 02z1 2z2 = 0

which implies that z2 = z1 but leaves the value of z1 arbitrary. It is normalthat z1 is arbitrary because the dimension of the null space is 1. So we shouldexpect 1 parameter. For example, we can choose z1 = 1 , z2 = 1 , and z3 = 3 .Then we have

3 0 12 8 21 2 1

113

= 3 + 0 32 8 + 61 2 + 3

= 000

.

This vector is nonzero. It spans the null space. So a basis of N (A ) is113

.


A =1 1 11 1 11 1 1

.

Assume z =z1z2z3

is a nonzero vector belonging to the null space of A . We

have

Az = 01 1 11 1 11 1 1

z1z2z3

=z1 + z2 + z3z1 + z2 + z3z1 + z2 + z3

45


46/159


47/159


Note that rank( A T ) = rank( A ), since the column rank and row rank areequal. However, the dimensions for

N (A ) and

N (A T ) can differ. For example,

the matrixA = 3 0 12 4 6

has a null space of dimension 1 and its transpose a null space of dimension 0.The null spaces are

N (A ) = span1

53 N (A T ) = {0}= {0 21}.

Remark 94. A symmetric matrix must be square.Note that we have:

AT T

= A ; the identity matrix is a symmetric matrix;

a diagonal matrix, whose non-zero entries are only on the diagonal, issymmetric.

If A R m r and B R r n , then the product C = AB R m n exists. Wecan take the transpose of this matrix and will get an n m matrix C T R n m .Note that in this case B T R n r and A T R r m . So the matrix productB T A T is dened and is an n m matrix. In fact, this is just equal to C T R n m . So, in general, it is true that

(AB )T = B T A T . (79)

The transpose of the product is the product of the transposes, but with theorder reversed!

If A R m n is a matrix, then the product A T A exists and it is a squarematrix of dimension n. The matrix A T A is a symmetric matrix. Indeed, wehave

A T AT

= ( A )T A T T

= A T A .

Denition 95. The adjoint of a complex m n matrix A , denoted A , is then m matrix whose (i, j ) entry is the conjugate ( j,i ) entry of A ,(A ) ij = a ij , (80)

obtained by negating the imaginary part of a ij .

Denition 96. When A = A , the matrix A is called hermitian. When A =

A , the matrix A is called skew-hermitian.Note that an hermitian matrix must be square. Similarly to the transpose,

the adjoint of the product of matrices is the product of the adjoint, with theorder reversed.

47


48/159


Exercise 97. Let A be a square real matrix. Show that A + A T is a symmetricmatrix. Show that A

A T is a skew-symmetric matrix. Prove that any square

matrix is the sum of a symmetric and a skew-symmetric matrix.

Exercise 98. Characterize the diagonal entries of a real skew-symmetric ma-trix.

Exercise 99. Characterize the diagonal entries of a complex hermitian matrix.

Consider x R 3 ,

x =x1x2x3

,

then we have

x T x = x1 x2 x3x

1x2x3

= x21 + x22 + x23 .

We have proved that, for x R 3 ,

x 2 = x T x . (81)This relation is also true in R m . For the inner product dened in (14), wesimilarly prove that

x , y R m , x y = x T y . (82)In C m , we replace the transpose operation T with the adjoint operator .

4.6 Matrix inverseThe n n matrix that has 1s on the diagonal and 0s everywhere else is calledthe identity matrix and denoted by I .

It is called the identity matrix because for any x R n ,

Ix = x .

Also, if B R n n ,

IB = Ib 1 Ib n = b 1 b n = B .

So multiplying a matrix by I leaves it unchanged.More generally, if B R m n then multiplying B on the left by an m midentity matrix or on the right by an n n identity matrix leaves B unchanged.

Denition 100. A nonsingular or invertible matrix is a square matrix of fullrank.

48


49/159


Note that a full rank n n square matrix has columns that form a basisfor the linear space R n (or C n for complex matrices). Therefore, any vector inR n has a unique expression as a linear combination of the column vectors. Inparticular, every basis vector e j of the standard unit basis (33) has a uniqueexpansion in column vectors,

e j =n

i=1

a i zij . (83)

Let z j denote the column vector with entries zij . Then we have

e j = Az j

and combining all the vectors z j in the matrix Z , we get

e 1 e n = I = AZ .

The matrix Z is called the inverse matrix of A and written as A 1 . It shouldnot be confused with the additive inverse of A that is A . The matrix A 1 isthe inverse for the multiplication and satises the relation

AA 1 = I . (84)

It is the matrix version of the familiar expression

aa 1 = 1

for any nonzero scalar a R .

Theorem 101. For A R n n , the following conditions are equivalent: A has a unique inverse A 1 such that AA 1 = A 1A = I , rank( A ) = n,

R(A ) = R n , N (A ) = {0}.

A similar result holds for complex matrices when replacing R with C .

For any nonzero scalar a R , the expression

aa 1 = a1a = 1

holds. The same is true for invertible matrices, i.e.

AA 1 = A 1A = I . (85)

49


50/159


Starting from (84), we multiply on the right by A to get

AA 1A = A and A A 1Ax x = 0 x R n .Since the null space of A is trivial, we get

A 1Ax = x x R n ,

proving that A 1A = I . The inverse matrix commutes with A and the productin either order is the identity matrix.

Corollary 102. If A is invertible, then A 1 is invertible and we have A 1 1 =A .

Recall, from Section 4.5, that the transpose of a matrix product is the in-

verted product of the transposes. A similar formula holds for inverses in thesquare case. If A , B R n n are both nonsingular, then so is their product ABand

(AB )1 = B 1A 1 . (86)

Again the order is reversed. Indeed, we can write

B 1A 1AB = B 1 A 1A B = B 1 IB = B 1B = I

and use the uniqueness of the inverse matrix. If either A or B is singular, thenthe products AB and BA will also be singular and noninvertible. We emphasizethat the inverse of a matrix is dened only for square matrices. So equation(86) holds only when A and B are both invertible and, in particular, square.

Note that, for any square invertible matrix A , we have

A T 1 = A 1 T . (87)

Proposition 103. The inverse of a 2 2 matrix is equal toa bc d

1=

1ad bc

d bc a

(88)

if and only if ad bc = 0 .Exercise 104. Prove formula (88).

The scalar ad

bc is called the determinant of the matrix,

det a bc d = ad bc. (89)A 2 2 matrix is invertible if and only if its determinant is nonzero. Thedeterminant can be dened for any square n n matrix. For example, for 33

50


51/159


matrices, we have

deta b cd e f g h i

= a det e f h i bdetd f g i + c det

d eg h

deta b cd e f g h i

= a(ei hf ) b(di gf ) + c(dh ge)

deta b cd e f g h i

= ( aei + bfg + cdh) (ahf + bdi + cge)

We can generalize that a n n matrix is invertible if and only if its de-terminant is nonzero. However, the determinant rarely nds a useful role innumerical algorithms.The determinant satises the following properties:

det( I ) = 1 ;

det( AB ) = det( A ) det( B );

det( A T ) = det( A );

det( A 1) = 1 / det( A );

det( A ) = n det( A ), where A is a square matrix of dimension n.

Remark. When writing the product x = A 1b , we should not think of x =A

1b as the result of applying A

1 to b . Instead, we should understand x as

the unique vector that satises the equation Ax = b . This means that x isthe vector of coefficients of the unique linear combination of b in the basis of columns of A . Multiplication by A 1 is a change of basis operation.Remark. Numerically, we rarely work with inverse matrices. If we knew theinverse matrix, then we could solve any linear system Ax = b simply by mul-tiplying b by A 1 . However, in practice, there are better ways to solve thesystem ( e.g., Gaussian elimination) that require less work than computing theinverse matrix and often give more accurate solutions (when the rounding errorsof computer arithmetic are taken into account).

4.7 Orthogonal/Unitary matrices

Orthogonal and unitary matrices play an important role in numerical algo-rithms. We give their denitions here. Some of their properties will be describedlater in the notes.

Denition 105. A square n n real matrix O is orthogonal if O T = O 1 orO T O = I .

51


52/159


Denition 106. A square n n complex matrix Q is unitary if Q = Q 1 orQ Q = I .When n = 2 , the orthogonal matrices are

cos sin

sin cos and cos sin sin cos

(90)

where belongs to R .Let O denote an n n real orthogonal matrix and x an n 1 real vector.The 2-norm of Ox is equal to

Ox 2 = (Ox )T Ox = x T O T Ox = x T x = x 2 . (91)The multiplication by an orthogonal matrix preserves the 2-norm of a vector.

Orthogonal matrices preserve also the inner product (14). Note that they donot preserve the -norm or the 1-norm. For example,10

= 1 and

2/ 2 2/ 22/ 2 2/ 2

10

=

22

.

In C m , the unitary matrices preserve the 2-norm of complex vectors and theinner product.

4.8 Useful commands in Matlab

Here are a few commands in Matlab useful for this section.

C = A*Bcomputes the matrix-matrix product.

rank(A) computes the rank of a matrix.

null(A) generates an orthonormal basis for the null space of A .

A computes the adjoint matrix A . When A has only real entries, theadjoint matrix A is equal to the transpose matrix A T . To compute thematrix (without the complex conjuguate), Matlab uses A. .

I = eye(n) creates the identity matrix in R n n . inv(A) computes the inverse of matrix A when it exists.

det(A) computes the determinant of a matrix.

52


53/159


5 Norms and Inner Products

In this section, we generalize the norms and the inner product dened on R m(see Section 1.3) to linear spaces. In particular, we discuss norm and innerproducts for matrices.

5.1 Norms

Denition 107. Consider U a real linear space. A norm is a map fromU R + , satisfying the following conditions :u U, u 0, (92a)u = 0 if and only if u = 0, (92b) R , u = | | u , (92c)u(1) , u(2) U, u(1) + u(2) u(1) + u(2) (Triangle inequality ). (92d)

To dene a norm for matrices, several approaches are possible. First, welook at the matrix just through its entries. In that case, the resulting matrixnorms are very similar to vector norms. For example, the map

A = ( a ij )m n m

i=1

n

j =1|a ij |

p

1/p

is a norm on R m n . Note that it is similar to the vector p-norm on R mn .

Example 108. Consider the linear space of real matrices U = R m n and thematrix A = ( a ij )m n . The map,

A F = mi =1

n

j =1|a ij |

2 , (93)

is called the Frobenius norm. It is similar to the 2-norm of R mn . It is easy to

show that this map is a norm according to Denition 107. The Frobenius normcan be used to bound the product of matrices. Let C = AB with entries cijand let a row,i be the ith row of A and b j the j th column of B . Then we havecij = a row,i b j . So, by the Cauchy-Schwarz inequality, we have

|cij | a row,i 2 b j 2 .

53


54/159


Squaring both sides and summing over i and j , we get

AB 2F =m

i=1

n

j =1|cij |

2 ,

m

i=1

n

j =1

a row,i22 b j

22 ,

m

i=1

a row,i22

n

j =1

b j22 = A

2F B

2F .

Another approach to dene a matrix norm is to assess the effect of thematrix on the length of a vector. This approach is intimately related to a vectornorm. The resulting norm is called the induced or associated matrix norm.Considering all the nonzero vectors x , the induced matrix norm compares thelength of the vectors x and Ax and is equal to the largest possible ratio betweenthese lengths.

Denition 109. Consider the linear space of real matrices R m n and the vectornorms (n ) on the domain space R n R + and (m ) on the range spaceR m R + . The induced matrix norm is dened as

A (m,n ) = maxx = 0Ax (m )

x (n )= max

x( n ) =1

Ax (m ) . (94)

The induced matrix norm is the largest value of the ratios Ax (m ) / x (n ) .It is easy to verify that (m,n ) satisfy the properties (92) dening a norm.Note that we used the subscripts (n) and (m) to avoid any confusion with thevector p-norms.

Sometimes, the ratio Ax (m ) / x (n ) is called the amplication factor orgain in the direction x . So the norm A (m,n ) is the maximum gain over alldirections.

Exercise 110. Prove that the norm (m,n ) satises the properties (92).

Example 111. Consider the matrix A = 2 00 1 . We have the following

relations using, for example, the 1-norm of the vectors, i.e. the (1,1) inducedmatrix norm:

2 00 1

10 =

20

10 1 = 1 A

10 1 = 2 . The

ratio is here 2.

2 00 1

01 =

01

01 1

= 1 A 01 1= 1 . The

ratio is here 1.

54


55/159


2 00 1

11 =

21

11 1

= 2 A 11 1= 3 . The

ratio is here 3/ 2.

2 00 1

21 =

41

21 1

= 3 A 21 1= 5 . The

ratio is here 5/ 3.

From these examples, we see that the ratio depends on the vector x . It seemsthat 2 will be the largest one. It is, indeed, the case. Consider an arbitraryvector x . Then we have

x = x1x2x 1 = |x1|+ |x2|

and

Ax =2x1x2 Ax 1 = 2 |x1|+ |x2| .

So the ratio is equal to

Ax 1x 1

=2 |x1|+ |x2||x1|+ |x2|

2 |x1|+ 2 |x2||x1|+ |x2|

= 2 .

This bound shows that the ratio can not be larger than 2. Since we have foundone vector for which the ratio is exactly 2, the (1,1) induced matrix norm forA is equal to 2. This norm value indicates that the matrix A can multiplythe length of a vector by 2, at most. It could be less than 2 for some vectors.However, it does not say anything about the direction of the vector.

Example 112. Let A be an m n matrix. The (1, 1) induced norm is oftendenoted A(1 ,1)

= A1. This notation is not confusing with the vector 1-norm

because A is a matrix. For any vector x R n , we have

Ax 1 =n

j =1

a j x j1

n

j =1|x j | a j 1 x 1 max1j n a j 1 ,

where a j is the j th colulumn vector of A . Note that a j 1 is the vector 1-normbecause a j is a vector. Therefore, the induced matrix 1-norm satises

A 1 max1j n a j 1 .By choosing x = e j , the unit standard basis vector of , where j is maximizing

a j 1 , we attain the bound. Thus the matrix 1-norm is

A 1 = max1j na j 1 = max1j n

m

i =1|a ij | , (95)

which is equal to the maximum column sum. We can write

Ax 1 A 1 x 1 .

55


56/159


Example 113. Let A be an m n matrix. The (, ) induced norm is oftendenoted A (

,

) = A

. This notation is not confusing with the vector

-norm because A is a matrix. The matrix -norm isA = max1im

n

j =1|a ij | , (96)

which is equal to the maximum row sum. We can write

Ax A x .When comparing equations (95) and (96), we remark that, for any matrix

A R m n , we haveA = A

T 1 . (97)

A similar property holds for complex matrices when using the adjoint matrixA .Example 114. Let A be an m n matrix. The (2, 2) induced norm is denotedA (2 ,2) = A 2 . This notation is not confusing with the vector 2-norm becauseA is a matrix. The matrix 2-norm is dened by

A 2 = maxx =0Ax 2x 2

= maxx =0 x T A T Axx T x . (98)We can write

Ax 2 A 2 x 2 .Later in the notes, we will introduce the spectral radius that describes preciselythe 2-norm of a matrix. For the moment, just remember that the 2-norm of amatrix A is related to the matrix A T A . As a consequence of (98), we remarkthat the 2-norm of a real matrix is invariant under multiplication by orthogonalmatrices,

A R m n , OA 2 = A 2 , O R m m satisfying O T O = I . (99)

The 2-norm of a complex matrix is invariant under multiplication by unitarymatrices.

Similarly to norms in R m , all the matrix norms are related to each other.Recall that we have

x x 2 n x , (100)for any vector x R n . We can write also, for x = 0 ,

1n x

1x 2

1x

.

For any vector y R m , we have

y y 2 m y .

56


57/159


58/159


The induced matrix norm of a matrix product can also be bounded. Let Abe an m

r real matrix and B be an r

n real matrix. For any x R n , we

haveABx m A (m,r ) Bx r A (m,r ) B ( r,n ) x n .

Therefore, the induced matrix norm satisfy

AB (m,n ) A (m,r ) B ( r,n ) . (104)In particular, we have

AB 1 A 1 B 1AB 2 A 2 B 2AB A B (105)

Exercise 118. Consider U = R n and W a nonsingular matrix. Show that themap

x W = Wx 2is a norm.

Remark. When U is a complex linear space, a norm remains a map from U R + . The notions introduced in this Section can be extended to C m n .Exercise 119. Consider U = C 0([0, 1], R ). Show that the map

f 10 f (t)2dtis a norm.

5.2 Inner productsDenition 120. Consider U a real linear space. An inner product < , > is amap U U R satisfying the following properties

u U,< u,u > 0, (106a)< u, u > = 0 if and only if u = 0, (106b) R , u, v U,< u,v > = < u,v >, (106c)u, v U,< u, v > = < v, u >, (106d)u, v, w U,< u + v,w > = < u, w > + < v,w > . (106e)

An inner product is a symmetric positive-denite bilinear form.

Exercise 121. Prove that an inner product < , > on a real linear space U satisesu,v,w U, , R , < u, v + w > = < u, v > + < u,w > . (107)

58


59/159


Example 122. The Euclidean inner product of two column vectors in R m (14)satisfy the properties (113).

The Cauchy-Schwarz inequality holds for any inner product. For any vectorsu and v of U , we write

|< u,v > | < u, u > < v,v >, (108)with equality if and only if v is proportional to u. The proof uses

< u v,u v > 0with

=< u, v >< v, v >

.

Exercise 123. Prove the Cauchy-Schwarz inequality (108).Proposition 124. Given U a real linear space and an inner product < , >on U . The map from U to R + , dened by

u < u,u >, (109)is a norm.

Exercise 125. Prove Proposition 124.

Example 126. Consider U = R m n . The map

(A , B ) tr( A T B ) (110)is an inner product. It satises also

tr( A T A ) = A 2F . (111)

From this last expression, we remark that the Frobenius norm is also invariantunder multiplication by orthogonal matrices,

A R m n , OA F = A F , O R m m satisfying O T O = I . (112)

The associated inner product is also invariant under multiplication by an or-thogonal matrix,

tr (OA )T OB = tr( A T O T OB ) = tr( A T B ).

Exercise 127. Consider U = C 0([0, 1], R ). Show that the map

(f, g ) 1

0f (t)g(t)dt

is an inner product.

59


60/159


The concept of orthogonality also extends to general real linear spaces.

Denition 128. Let U be a real linear space. We say that u, v U areorthogonal for the inner product < , > when < u,v > = 0 , i.e. their innerproduct is 0.

We conclude by giving the denition of an inner product on a complex linearspace. The Cauchy-Schwarz inequality and the orthogonality still holds in thecomplex case.

Denition 129. Consider U a complex linear space. An inner product < , >is a map U U C satisfying the following propertiesu U,< u,u > 0, (113a)

< u, u > = 0 if and only if u = 0, (113b)

R , u, v U,< u,v > = < u,v >, (113c)u, v U,< u, v > = < v,u >, (113d)u, v, w U,< u + v,w > = < u, w > + < v,w > . (113e)

An inner product is a symmetric positive-denite sesquilinear form.

5.3 ErrorsIn order to discuss the accuracy of a numerical solution or the relative meritsof one numerical method versus another, it is necessary to choose a manner of measuring the error. It may seem obvious what is meant by the error. Butthere are often many different ways to measure the error that can sometimesgive quite different impressions as to the accuracy of an approximate solution.

Consider a problem to which the true answer is a vector x R m . Denotean approximation by x . Then the error in this approximation is

e = x x .5.3.1 Absolute error

A natural measure of this error would be the norm of e ,

e = x x .This is called the absolute error in the approximation.

As an example, suppose that x = 2 .2 while some numerical method producedan approximate value x = 2 .20345. Then the absolute error is

|x x| = 0 .00345 = 3 .45 103 .This seems quite reasonable we have a fairly accurate solution with threecorrect digits and the absolute error is fairly small, on the order of 103 . We

60


61/159


62/159


problem so that some numbers are orders of magnitude larger than others fornonphysical reasons. Unless otherwise noted, we will generally assume that theproblem is scaled in such a way that the absolute error is meaningful.

5.4 ConditioningConditioning is a fundamental issue of numerical analysis that, until now, wehave skirted. It pertains to the perturbation behavior of a mathematical prob-lem. A well-conditioned problem is one with the property that all small pertur-bations of the input lead to only small changes in the ouput of the problem. Anill-conditioned problem is one with the property that some small perturbationof the input can lead to a large change in the ouput.

For example, data errors form a source of perturbations. A mathematicalmodel of some real world phenomenon typically involves some parameters orother data describing the particular situation being modeled. These values arealmost never known exactly. There may be measurement errors or there maybe values that cannot be measured in any precise way and must be guessed bythe modeler. Errors in the data mean that even if the model is very good andthe equation is then solved exactly it may not match reality. It is important forthe numerical analyst to evaluate the effect of these data errors.

Consider the matrix

A =12

1 11 + 1010 1 1010

.

Its inverse matrix A 1 is

A 1 = 1 1010 10101 + 10

10

1010 .

The linear system

Ax = 11

has the solutionx = 11 .

The linear system

A x = 1.10.9

has the solution

x =1.1

0.2

1010

1.1 + 0 .2 1010 .Here a relatively small variation in the right hand side leads to extremely largevariations in the vector x . The problem is ill-conditioned.

In the following, we dene a number called to the condition number to assesswhat is the conditioning of a problem.

62


63/159


5.4.1 Error when computing Ax = b

Suppose that the vector x is perturbed to obtain the vector x . If b and bare, respectively, satisfying Ax = b and A x = b , we would like to bound thedifference between b and b .

First, we bound the difference in absolute terms. We have

b b 2 = Ax A x 2b b 2 = A (x x ) 2

So we obtainb b 2 A 2 x x 2 . (114)

This bound gives an absolute measure of the perturbation in the right hand sideb . A small norm A

2means that a small perturbation in the vector x leads to

a small perturbation in the vector b . On the other hand, a large value for A 2means that the perturbation in b can be large, even when x x 2 is small.Note that the bound (114) is sharp because there exists a perturbation x xsuch that b b 2 = A 2 x x 2 .

To estimate the relative perturbation, we write

b b 2b

notes - nma

Documents