chapter i linearity: basic concepts and examplesckfong/la1.pdf · linearity: basic concepts and...

CHAPTER I

LINEARITY: BASIC CONCEPTS AND EXAMPLES

In this chapter we start with the concept of general linear spaces with elements in

it called vectors, for “setting up the stage”. Then we introduce “actors” called linear

mappings, which act upon vectors. In the mathematical literature, “vector spaces” is

synonymous to “linear spaces” and these words will be used exchangeably. Also, “linear

transformations” and “linear mappings” or simply “linear maps”, are also synonymous.

§1. Linear Spaces and Linear Maps

1.1. A vector space is an entity containing objects called vectors. A vector is usually

conceived to be something which has a magnitude and direction, so that it can be drawn

as an arrow:

You can add or subtract two vectors:

You can also multiply vectors by scalars:

Such things should be familiar to you.

However, we should not be so narrow-minded to think that only those objects repre-

1

sented geometrically by arrows in a 2D or 3D space can be regarded as vectors. As long

as we have a collection of objects among which two algebraic operations called addition

and scalar multiplication can be performed, so that certain rules of such operations are

obeyed, we may regard this collection as a vector space and call the objects in this col-

lection vectors. Our definition of vector spaces should be so general that we encounter

vector spaces almost everyday and almost everywhere. Examples of vector spaces include

many spaces of functions, spaces of polynomials, spaces of sequences etc. (in addition to

the well-known 3D space in which vectors are represented as arrows.) The universality

and the omnipresence of vector spaces is one good reason for placing linear algebra in the

position of paramount importance in basic mathematics.

1.2. After this propaganda, we come to the technical side of the definition. We start

with a collection V of objects called vectors, designated by block letters u, v, etc. Suppose

that V is equipped with two algebraic operations:

1. Addition, allowing us to add two vectors u and v to obtain their sum u + v.

2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av.

Then V will be legitimately called a vector space if they obey a set of “natural” axioms.

The complete set of axioms will be listed in Appendix A at the end of the present chapter.

There is no need to memorize them, but here we mention some to show that they are

indeed very natural.

u + v = v + u (addition is commutative)

u + (v + w) = (u + v) + w (addition is associative)

u + 0 = u a(u + v) = au + av

We are a bit vague about “scalars” in the above definition. What are scalars? The

obvious answer is: they are numbers. But what sort of numbers are they?

If we allow scalars to be complex numbers, then we say that V is a complex vector

space, or a vector space over (the complex field) C. If we restrict scalars to real numbers,

then V is called a real vector space, or a vector space over (the real field) R.

More generally, scalars are taken from something called a field. If the “field of scalars”

is denoted by F, we call V a vector space over F. In recent years, finite fields become an

important subject due to its vast applications such as cryptography and coding theory.

We only briefly describe about fields other than R and C in Appendix B at the end of the

present chapter. In this course we only consider vector spaces over R or C.

2

1.3. Examples. The best way to understand the concept of vector spaces is to go

through a large number of examples and work on them in the future.

Example 1.3.1. Rn, a real vector space.

The vectors in Rn are n-tuples of real numbers. A typical vector may be written as

v = (v1, v2, . . . , vn),

where v1, v2, . . . , vn are real numbers. Two n-tuples are equal only when their correspond-

ing components are equal: for u = (u1, . . . , un) and v = (v1, . . . , vn) in Rn, u = v only

when u1 = v1, u2 = v2, . . . , un = vn. The addition and scalar multiplication in this space

are defined in the componentwise manner: for u = (u1, u2 . . . , un), v = (v1, v2, . . . , vn)

in Rn and a in R (that is, a is a real number),

u + v = (u1 + v1, u2 + v2, . . . , un + vn), au = (au1, au2, . . . , aun).

These two algebraic operations are quite simple and natural. Notice that, when n = 1, we

have the vector space R1, which can be identified with R. This shows that R itself can

be considered as a real vector space.

Example 1.3.2. Cn, a complex vector space.

The space Cn consists of all n-tuples of complex numbers. Everything in this space does

in the same way as the space Rn of the previous example: all we have to do is replace real

scalars by complex numbers. Sometimes we would like to make a statement for both spaces

Rn and Cn. To avoid repetition, we use the letter F for R or C, or even a general field of

scalars. We write Fn for the space of all n–tuples of scalars in F. Using the identification

of F1 with F, we see that scalars can be regarded as vectors, if we wish.

Example 1.3.3. Mmn(F) (the space of m× n matrices over F.)

A vector in Fn is formed by arranging n scalars in a row. A variant of Fn is to form a

vector by arranging mn scalars in a m×n matrix. Denote by Mmn(F) the set of all m×nmatrices with entries in F. With the usual addition and scalar multiplication, Mmn(F)

is a vector space. A vector in this space is actually a matrix. In the future, to simplify

our notation, we will write Mmn for Mmn(F), if which field F we are working with is

understood.

Example 1.3.4. Direct product.

This example tells us how to construct a new vector space from given vector spaces. Let

V1, V2, . . . , Vn be vector spaces over the same field F. Consider the set V of all n-tuples

3

of the form v = (v1,v2, . . . ,vn) where v1 is a vector in V1, v2 as a vector in V2, etc.

The addition and the scalar multiplication for V is defined in the same fashion as the

corresponding operations of Rn described in Example 1.3.1: for u = (u1,u2, . . . ,un) and

v = (v1,v2, . . . ,vn) in V , and for a ∈ F,

u + v = (u1 + v1, u2 + v2, . . . . . . , un + vn)

au = (au1, au2, . . . . . . , aun).

This space V is a generalization of the previous example of Fn by replacing numbers in

the entries of n-tuples by vectors. If we take V1 = V2 = · · · = Vn = F, then we recover Fn.The vector space V constructed in this way is called the direct product of V1, V2 . . . , Vnand the usual notation to express this is V = V1 × V2 × · · · × Vn, or V = Πnk = 1Vk.

Example 1.3.5. Space of functions.

This example is a space of functions with a common domain, say X (some nonempty set).

A function f on X is just a way to assign to each point x in X a value denoted by f(x).

This value could be a scalar or a vector. Since scalars can be regarded as vectors, so it

is enough to consider the vector–valued case. Take any vector space V with F as its field

of scalars. Denote by F(X,V ) the set of all functions f from X to V : f assigns to eachpoint x in X to a vector denoted by f(x) in V . Given “vectors” (which are functions in

this case) f and g in F(X,V ), the sum f + g and the scalar multiple af of f a scalar aare formally defined as follows

(f + g)(x) = f(x) + g(x), (af)(x) = a.f(x), x ∈ X.

If we treat x as a variable symbol so that f is written as f(x), then the sum of “vectors”

f(x) and g(x) in F(X,V ) is simply f(x) + g(x). In the special case with V = C and

X = N = {0, 1, 2, 3, . . .},

each f ∈ F(N,C) represents a sequence (of complex numbers):

{f(n)}n≥0 ≡ (f(0), f(1), f(2), · · · )

Conversely, each sequence {an}n≥0 of complex numbers determines a function f on N,given by f(n) = an. Thus we can identify the space F(N,C) with the space of sequences,which will be denoted by S. For a = {an}n≥0 and b = {bn}n≥0 in S, we define thesum a+ b and the scalar multiple λa of a by a complex number λ as follows:

a+ b = (a0, a1, a2, . . . ) + (b0, b1, b2, . . . ) = (a0 + b0, a1 + b1, a2 + b2, . . . )

λa = λ(a0, a1, a2, . . . ) = (λa0, λa1, λa2, . . . )

4

We will need the sequence space S to study difference equations and recursion relations.

Example 1.3.6. The space of all polynomials.

We specialize the previous example by taking X = C, the complex plane, and V = C,

considered as a complex vector space. In some sense, the space F(C,C) is too big. Weshould look at something smaller inside. Recall that a polynomial is a function p on R

which can be written in the form

p(x) = a0 + a1x+ a2x2 + a3x

3 + · · · + anxn,

where a0, a1 . . . , an are certain complex numbers and n is certain positive integer. It is clear

that the sum of two polynomials is also a polynomial, and scalar multiples of polynomials

are polynomials. Thus, if we denote by P the set of all polynomials, we can define thesum of two polynomials and a scalar multiple of a polynomial in the same way as we do

for functions to give a linear structure to P.

1.4. A smaller vector space “sitting” inside a bigger one, such as P in F(C,C) inthe last example, is a very common phenomenon. To describe this in formal language we

introduce the following:

Definition 1.4.1. A (nonempty) subsetM of a vector space V satisfying the following

condition is called a subspace of V :

(S) For all u and v in M , and for all scalars a and b, au + bv are in M .

According to this definition, P is a subspace of F(C,C). A subspace of some vector spaceis a vector space on its own right.

Example 1.4.2. Pn, the space of polynomials of degree ≤ n.

Recall that a polynomial of degree n is a function of the form

p(x) = a0 + a1x+ a2x2 + · · · + anxn (P )

with an �= 0. If we drop the condition an �= 0 here, we cannot tell the exact degree ofp(x) – all we can say is that the degree of p(x) is at most n. Denote by Pn the subset of

P consisting of polynomials of degrees at most n, that is, polynomials of the form givenin (P ). It is clear that condition (S) in the definition of subspace above is satisfied for

M = Pn and V = P. Therefore Pn is a subspace of P and hence itself is a vector space.

1.5. Now we consider the concept of linear mappings, or linear transformations

By a mapping (or simply a map) or a transformation we usually mean a way of sending

5

objects from one space into another. A transformation T from one vector space V to

another W (over the same field) is linear if the following identity holds:

T (αx + βy) = αTx + βTy (LT )

for all vectors x and y in V and all scalars α and β. Notice that the author deliberately

use letters x, y instead of u, v as vectors, and α, β instead of a, b for scalars, in order to

broaden the scope of our notation. Also, following the usual custom in linear algebra, we

omit the round brackets in T (x) and simply write T (x) as Tx.)

The linearity condition (LT ) can be split into two:

T (x + y) = Tx + Ty (LT1)

for all x and y in V , and

T (αx) = αTx (LT2)

for all vectors x in V and for all scalars α. Identity (LT1) is a special case of (LT ) obtained

by setting α = β = 1 in (LT ). It says: the transformation T preserves the operation of

addition. The following figure helps to illuminate its meaning:

Identity (LT2) is another special case of (LT ) obtained by setting β = 0. It says: T

preserves the scalar multiplication. You should draw a figure of arrows similar to the

above one to clarify its meaning. Condition (LT ) can be replaced by conditions (LT1) and

(LT2) together, (although we don’t see any advantage of doing so). Indeed, we can check

that (LT ) is a consequence of this pair of conditions:

T (αx + βy) = T (αx) + T (βy) (by (LT1))

= αTx + βTy (applying (LT2) twice.)

By mathematical induction, we can establish the following extension of (LT ):

T (α1x1+α2x2+ · · · +αnxn) = α1Tx1+α2Tx2+ · · · +αnTxn

6

We note that (LT ) is just the case n = 2 written in a different way.

1.6. Examples of linear transformations are so many that you can find them almost

everywhere, almost any time. Here we consider a few.

Example 1.6.1. Sampling functions

Let FS be a linear space of functions (allowed to take complex values) defined on a setS. Pick some points s1, s2, . . . , sn in S as “observation sites”. The observed values of a

function in FS are arranged in a row as a vector in Cn and is denoted by Tf :

Tf = (f(s1), f(s2), . . . , f(sn)).

Then the transformation T sending f (from FS) to Tf (in Cn) is linear. To show thelinearity of T , we need to check the identity T (af + bg) = aTf + bTg for all f , g in FSand scalars a, b. The left hand side of this identity is, according to the definition of T ,

T (af + bg) = ((af + bg)(s1), (af + bg)(s2), . . . , (af + bg)(sn))

= (af(s1) + bg(s1), af(s2) + bg(s2), . . . , af(sn) + bg(sn))

= a(f(s1), f(s2), . . . , f(sn)) + b(g(s1), g(s2), . . . , g(sn))

which is aTf + bTg, that is, the right hand side. This proves the linearity of T .

Example 1.6.2. Tax = x · a

Recall that the inner product (or the scalar product, or the dot product) of two vectors

x = (x1, x2, · · · , xn) and y = (y1, y2, · · · , yn) in Rn is given by

x · y =∑n

k = 1xkyk (= x1y1 + x2y2 + · · · + xnyn).

Fix an arbitrary vector a in Rn. Then the transformation Ta from Rn to R sending a vector

x in Rn to x · a is linear. To prove this, we have to check Ta(αx + βy) = αTax + βTay.This can be done as follows:

Ta(αx + βy) = (αx + βy) · a =∑n

k = 1(αxk + βxk)ak

= α∑n

k = 1xkak + β

∑n

k = 1ykak = α x · a + β y · a = αTax + βTay.

Aside: As you can see, checking things like this is rather routine. The important thing to

learn is: how to present it neatly and correctly.

Example 1.6.3. Let a be a fixed vector in R3. Then the transformation Ca from R3

into R3 itself given by Ca(x) = x×a (the cross product of x and a) is linear. The linearityof Ca can be proved in the same fashion as that of the previous examples.

7

Example 1.6.4. A differential operator

Recall that Pn stands for the space of all polynomials of degrees at most n. For p ∈ Pn,let Dp = dp

dx, the derivative of p. Then D : Pn → Pn is linear. Why? Well, the linearity

of D is manifested by the identity D(ap + bq) = aDp + bDq, where p, q ∈ Pn and a, bare scalars. But this identity is nothing but another way to write down the well-known

equalityd

dx(ap(x) + bq(x)) = a

d

dxp(x) + b

d

dxq(x).

Example 1.6.5. Translation

Same space as the previous example: Pn. Let h be a fixed real number. Consider the

transformation T : Pn → Pn sending a polynomial p(x) ∈ Pn to the polynomial p(x+ h).Here p(x+ h) is of course obtained by replacing x in p(x) by x+ h. In other words, it is p

evaluated at x+h. For instance, if p(x) is 1+x+x3 and h = 1, the T (p) is the polynomial

1 + (x+ 1) + (x+ 1)3 = 1 +x+ 1 + (x3 + 3x2 + 3x+ 1) = 3 + 4x+ 3x2 +x3. We claim that

T is linear. To prove this, we have to verify T (ap+ bq) = aTp+ bTq. What is T (ap+ bq)?

It is ap + bq evaluated at x + h, namely, ap(x + h) + bq(x + h). Now Tp and Tq are the

polynomials p(x+h) and q(x+h) respectively, so aTp+ bTq is also ap(x+h) + bq(x+h).

Hence T is linear.

Example 1.6.6. Shift S

This is the operator denoted by S on the space S of sequences, defined by

S(a0, a1, a2, . . . ) = (a1, a2, a3, . . . ).

If we write a = (a0, a1, a2, . . . ), then (Sa)n = an+ 1. In many books on difference

equations, the last equality is written as San = an + 1. Strictly speaking, this is incorrect.

But we accept this and interpret it as the correct one, namely (Sa)n = an + 1. It is not

hard to check that S is indeed linear.

“Monexample” 1.6.7. Let V = P, the space of all polynomials, and let T be thetransformation from V into itself, sending a polynomial to its square: Tp = p2. Then T is

not linear. One way to see this is by noticing T (−x) = (−x)2 = x2. If T were linear, thenT (−x) should be −T (x) = −x2, instead of x2. Aside: This is by no means the only wayto prove that T given here is nonlinear. You may also argue that T (2 · 1) = T (2) = 22 = 4,which is not the same as 2T (1) = 2. Another argument: T (1+x) = (1+x)2 = 1+2x+x2,

which is not the same as T (1) + T (x) = 1 + x2. There are many ways to tell that T is not

linear. One is as good as others. Don’t waste your time by presenting more than one.

8

We should mention a term widely used in scientific literature: linear operator. By

a linear operator, or simply an operator, on a vector space V we mean a linear transfor-

mation from V into V itself. Thus Examples 1.6.3, 1.6.4, 1.6.5 and 1.6.6 above are linear

operators. Another term also widely used: linear functional, (or covector in physics

and engineering literature). If V is a vector space over F, by a linear functional of V we

mean a linear transformation from V into F1 ≡ F. Thus a linear functional of V is afunction φ : V → F such that

φ(a1v1 + a2v2) = a1φ(v1) + a2φ(v2)

for all v1,v2 ∈ V and a1, a2 ∈ F. The mapping Ta in Example 1.6.2 is an example of alinear functional of Rn.

1.7. Given vector spaces U and V over the same field, we denote by L (U, V ) theset of all linear mappings from U to V . For S, T in L (U, V ) and a scalar a, we define thesum S + T and scalar multiple aS by putting

(S + T )x = Sx + Tx, (aS)x = aSx. x ∈ U (1.7.1)

We check that both S + T and aS are linear. To simplify our presentation, we let R =

aS + bT and check its linearity. Take u1,u2 ∈ U and scalars c1 and c2. We have to showR(c1u1 + c2u2) = c1Ru1 + c2Ru2. Indeed

R(c1u1 + c2u2) = (aS + bT )(c1u1 + c2u2)

= a S(c1u1 + c2u2) + b T (c1u1 + c2u2) [because of (1.7.1)]

= a(c1Su1 + c2Su2) + b(c1Tu1 + c2Tu2) [because S, T are linear]

= c1(aSu1 + bTu1) + c2(aSu2 + bTu2) = c1Ru1 + c2Ru2.

With addition and scalar multiplication defined here, L (U, V ) becomes a vector space.

According to our notation introduced above, given a vector space V , the symbol

L (V, V ) stands the set of all linear operators on V . This symbol looks a bit clumsy andhence we will rewrite it simply as L (V ). On the other hand, L (V,F), the set of all linearfunctionals, will be denoted by V ′, (or V ∗ in some books), called the dual space of V . We

summarize our notation here:

L (U, V ) =the set of all linear transformation from U to V.L (V ) =the set of all linear operators on V.V ′ =the set of all linear functionals of V = the dual space of V.

9

We have seen that L (U, V ) is a vector space under some natural way to define additionand scalar multiplication. A fortiori, L (V ) and V ′ are vector spaces. Actually, L (V ) ismore than just a vector space. Besides addition and scalar multiplication, it has a third

operation: composition. The composite, or the product ST , of operators S, T ∈ L (V ), isdefined by putting

(ST )v = S(Tv). (1.7.2)

Aside: To apply ST to a vector v, we apply T to v first to get Tv, followed by applying

S. Writing (1.7.2) as (ST )(v) = S(v)T (v) is a horrendous mistake.

To give quick examples: let V = P, the space of all polynomials, and let D,M,Ton V be given by D(p(x)) = p′(x) ≡ d

dxp(x), M(p(x)) = xp(x) and T (p(x)) = p(x+1).

Then (MD)(p(x)) = M(p′(x)) = xp′(x), (DM)(p(x)) = D(xp(x)) = p(x) + xp′(x),

(TM)(p(x)) = T (xp(x)) = (x + 1)p(x + 1), (MT )(p(x)) = M(p(x + 1)) = xp(x + 1),

(DT )(p(x)) = D(p(x+ 1)) = p′(x+ 1) and (TD)(p(x)) = T (p′(x)) = p′(x+ 1).

As you can see, MD is not the same as DM (also MT �= TM). Thus, in general, theproduct ST and TS are different. In case ST and TS are the same, i.e. ST = TS, then

we say T, S commute, or T commutes with S. For example, the operators D and T

given above commute.

To justify our definition of the product ST for S, T ∈ L (V ) given above, we mustcheck its linearity. Again, it is a routine matter. For v1,v2 ∈ V and a1, a2 ∈ F,

(ST )(a1v1 + a2v2) = S(T (a1v1 + a2v2)) = S(a1Tv1 + a2Tv2)

= a1S(Tv1) + a2S(Tv2) = a1(ST )v1 + a2(ST )v2.

We have the following elementary properties concerning three operations (addition, scalar

multiplication, composition) among operators: for R, S, T ∈ L (V ) and scalar a,

(RS)T = R(ST )

R(S + T ) = RS +RT

(R+ S)T = RT + ST

a(ST ) = (aS)T = S(aT ).

They are verified in a routine manner, e.g., to show (R+ S)T = RT +RS, we compute:

((R+ S)T )x = (R+ S)(Tx) = R(Tx) + S(Tx)

= RTx + STx = (RT + ST )x.

10

There are two special linear operators on V worth mention: the zero operator O and the

identity operator I: O sends every vector to the zero vector and I sends every vector to

itself, that is, for all v ∈ V , Ov = 0 and Iv = v.

We say that a linear mapping T from a vector space V to a vector space W is invertible

if there is a linear mapping S from W to V such that ST = IV and TS = IW . Here IVstands for the identity operator on V . In the future we simply write I for IV so that the

identities ST = IV and TS = IW becomes ST = I and TS = I. The linear map S in this

case is uniquely determined by T and will be denoted by T−1. Thus ST = I and TS = I

become T−1T = I and TT−1 = I. To give a quick example, consider the linear operator

T on the space P of all polynomials given by T (p(x)) = p(x + h), where h is a constant.Then T is invertible and its inverse T−1 is given by T−1(p(x)) = p(x− h). The followingfact is basic:

Proposition 1.7.1. If linear operators S and T on a vector space V are invertible,

then ST is also invertible with (ST )−1 = T−1S−1.

To prove this, we check directly that T−1S−1 is the inverse of ST :

(ST )(T−1S−1) = STT−1S−1 = SIS−1 = SS−1 = I,

(T−1S−1)(ST ) = T−1S−1ST = T−1IT = T−1T = I.

The order reversing of S and T in the identity (ST )−1 = T−1S−1 has the following analogy

attributed to a famous mathematician named Herman Weyl: we put on socks first before

putting on shoes; but taking them off, we remove shoes first.

11

EXERCISE SET I.1.

Review Questions: Do I grasp the basic concepts of vector spaces and linear mappings?

Am I able to write down properly the formal definitions of linear mappings, linear operators

and linear functionals (as a professional mathematician does, at which I would not be

embarrassed if it were in print)? What are the major examples of vector spaces and linear

maps described in this section? Do I recognize the following symbols and understand

perfectly well what they stand for?

R3, C2, F5, M4,5, P3, f + g, 2f − 3g.L (V,W ), S + T , ST , ST − TS, L (V ), V ′

Why do we need the abstract conceptual framework of vector spaces and linear mappings?

What would we miss if we restrict ourselves to the standard spaces Rn and Cn instead of

working with this abstract notion?

Drills

1. Write down the general form of a vector in each of the following vector spaces:

(a) F4 (F = R or C), (b) P2, (c) P4, (d) R3 × P1.

2. Find u + (−2)v, where u,v ∈ V , in each of the following cases:(a) V = C2, u = (2 + 3i, 3 − 2i) and v = (1 + i, 1 − i).(b) V = P2; u and v are polynomials 1 − 2x+ x2 and 1 + x− x2 respectively.(c) V = F(R,C); u and v are functions cosx+ i sinx and cosx− i sinx respectively.

3. True or false:

(a) A set {0} consisting of a single element 0 with addition and scalar multiplicationdefined in the following way is a vector space: 0 + 0 = 0, a.0 = 0.

(b) All polynomials of the form a+(1+a)x+bx2 form a vector space under the usual

addition and scalar multiplication.

(c) P2 is a subspace of P3.

(d) If U is a subspace of V and if V is a subspace of U , then U = V .

(e) If U is a subspace of V and if W is a subspace of U , then W is a subspace of V .

4. Let p(x) = x2 − x+ 2 and q(x) = 2x2 + 1. Find: p(x+ 1), p(1), p(q(x)), xp(x), p(x)2,p(x2), (p+ q)(x), q(x+ p(x)).

5. In each of the following cases, is the given transformation from one vector space into

another linear? Why? (Questions like this should be answered in the following way:

12

if your answer is “Yes”, then you have to prove it. If your anwer is “No”, you should

disprove it by pointing out one instance where the linearity fails; (only one is needed

— don’t waste your time to find or write down another.)

(a) T : P → P given by T (p(x)) = p(x2).(b) T : P → P given by T (p(x)) = p(x)2.(c) T : P → R given by T (p(x)) = p(1); (here, of course, p(1) stands for the value of

p at 1.)

(d) T : P → R given by T (p(x)) = p(1) + 1.(e) The map T : M2,2 → M3,3 (Reminder: Mm,n stands for the vector space of all

m × n matrices) given by T (X) = AXB, where A and B are fixed matrices ofsizes 3 × 2 and 2 × 3 respectively.

(f) The map T : M2,2 → M2,2 given by T (X) = X + A, where A is a fixed 2 × 2matrix. (♠ Caution: Be careful about the way you put down your answer.)

(g) T : M2,2 →M2,2 given by T (X) = X2.(h) V is a vector space and v is a fixed vector in V ; T : L (V ) → V is given by

T (X) = X(v); (here, of course X ∈ L (V ), i.e. X is an operator on V , and X(v)is the vector in V obtained by applying X to v.)

Exercises

1. Let R, S, and T be linear operators on a vector space V . Simplify the following

expressions:

(a) R(S + T ) − S(T +R) − (R− S)T .(b) S(S−1T + TS−1) + (S−1T + TS−1)S + S−1(ST − TS) − (ST − TS)S−1. (S is

assumed to be invertible.)

(c) [[R, S], T ] + [[S, T ], R] + [[T,R], S]; (for linear operators P , Q on V , their commu-

tator [P,Q] is defined to be the linear operator PQ−QP .)

2. On the linear space P of all polynomials, define the operators M , D and U by puttingM(p(x)) = xp(x), D(p(x)) = p′(x) (the derivative of p(x)), U(p(x)) = p(x + 1).

Compute

(a) [M,D] (= MD −DM).(b) UMU−1 −M ; (notice that U−1(p(x)) = p(x− 1)).

3. Let S and T be two linear operators satisfying the relation STS = S. (In this situation

we may call T a generalized inverse of S.) Let P = ST , Q = TS and T0 = QTP .

13

Verify that (a) P 2 = P , (b) Q2 = Q, (c) PS = S, (d) SQ = S, (e) ST0S = S, (f)

T0 = TST and (g) T0ST0 = T0.

4. In each of the following cases, find S2, T 2 ST and TS for S, T ∈ L (V ).(a) V = R2; S((x1, x2)) = (x2, x1), T ((x1, x2)) =

12(x1 + x2, x1 + x2).

(b) V = P1; S(p(x)) = p′(x), T (p(x)) = p(0).

(c) V = M2,2 (the space of all 2 × 2 matrices); S(X) = AX , T (X) = XB, where

A =

[

0 10 0

]

, B =

[

0 00 1

]

.

Problems

1*. Prove that, for linear operators S, T on a vector space V , if S, T and S + T are

invertible, then S−1 + T−1 is also invertible.

2*. (This is a very hard problem) Let S and T be linear operators on a vector space V .

Prove that I − ST is invertible if and only if I − TS is invertible.

14

§2. Linear Transformations and Matrices

2.1. The main goal of the present section is to study the relation between linear

mappings and matrices. We show that every m × n A matrix naturally gives rise to alinear mapping MA : F

n → Fm and vice versa. Then we show that, given coordinatesystems in vector spaces V andW , every linear mapping T : V →W can be represented bya matrix. The device set up here tells us that matrices can be regarded as linear mappings

and problems about linear mappings can often be reduced to problems in matrices.

To understand linear transformations better, let us look at the simplest situation in

which both the domain and the range are the one-dimensional space F1 ≡ F; here F iseither R or C. Let T : F → F be linear. Then, for every x ∈ F, Tx = T (x·1) = xT1.Certainly T1 ∈ F, i.e. T1 is just a scalar. Why don’t we give it a name—let’s call ita. Thus Tx = ax. On the other hand, it is easy to check that a transformation T from

F to F of the form Tx = ax is linear, where a is a fixed scalar. We have shown that

linear transformations from F to F are exactly those of the form T (x) = ax. This gives a

thorough description of such transformations. Nothing more we can say!

Once we succeed in tackling a special case, we should move to a more interesting

general situation in which the domain of a given linear transformation is Fn and its range

is in Fm. The problem now is to describe all such transformations. As you will see,

the answer turns out to be similar to the one for our special case: the real number a is

replaced by an m × n matrix, say A, and x is replaced by an n× 1 column vector x andthe “outcome” Tx is the m×1 column vector Ax. Thus linear transformations from Fn toFm are exactly those T which can be put in the form Tx = Ax, where A is a fixed m× nmatrix.

2.2. But the above answer seems to have a serious conflict with the previous conven-

tion: a vector in Fn is used to be a row, i.e. something like x = (x1, x2, . . . , xn) instead of

a column:

x =

x1x2...xn

. (2.2, 1).

The rationale of our previous convention is clear: the column in (2.2.1) is awkward to write.

Worse: its outstanding look draws unwarranted attention! This prompts us to adopt the

following rule:

Something in a row surrounded by the round brackets “(” and “)” is the same thing

as the transpose of this row (which becomes a column) surrounded by the square

15

brackets “[” and “]”, such as

(cow, pig, dog, cat) =

cowpigdogcat

.

In order to work under this rule, we have to be very careful about the brackets. This

rule works well, but is too “brutal”. We try to avoid applying it directly as much as

possible. Certainly we may regard the column (2.2.1) as the transpose of a row and put

x = [x1 x2 · · · xn]⊤ or x = [x1, x2, . . . , xn]⊤ with commas for clarity, and sometimeswe shall do so in the future. But this still looks rather clumsy.

2.3. We proclaim:

Every matrix associates with a God-given linear transformation.

In details, given a m× n matrix A with entries in the field F, we define a linear transfor-mation MA from F

n to Fm simply by putting

MAx = Ax (M)

for all x = [x1, x2, . . . , xn]⊤ ∈ Fn. The transformation MA defined in this way may be

called the multiplication by A. (That is why we choose the symbol MA to denote it.) The

linearity of MA follows immediately from some well-known properties of matrix algebra:

for all u and v in Fn and a, b ∈ F, we have

MA(au + bv) = A(au + bv) = aAu + bAv = aMAu + bMAv.

To give a quick example, suppose A =

[

2 34 5

]

. Then

MAx = MA

[

x1x2

]

=

[

2 34 5

] [

x1x2

]

=

[

2x1 + 3x24x1 + 5x2

]

.

Hence MA is a linear operator on R2 sending (x1, x2) to (2x1 + 3x2, 4x1 + 5x2).

The converse of the above proclamation is also true:

Theorem 2.3.1. Every linear transformation T from Fn to Fm is of the form MAfor some m× n matrix A, i.e. there exists an m× n matrix A such that Tx = Ax for allx ∈ Fn. Furthermore, the matrix A here is uniquely determined by T .

16

This theorem tells us that there is a one-to-one correspondence between the set

L (Fn,Fm) of all linear transformations from Fn to Fm and the set Mm,n(F) of all m× nmatrices with entries in F:

A ∈Mm,n(F) ←→ T = MA ∈ L (Fn,Fm).

Before we embark on the proof of this theorem, let us take a look at the linear trans-

formations Ta : Rn → R and Ca : R3 → R3 of Examples 1.6.2 and 1.6.3 in the previous

section, defined by Ta(x) = a · x and Ca(x) = a × x respectively. According to theabove theorem, there are matrices A and B of sizes 1× n and 3× 3 respectively such thatTax = Ax and Cax = Bx. What are A and B? Well, let us write

Tax = a1x1 + a2x2 + · · · anxn = [a1 a2 · · · an]

x1x2...xn

.

Cax =

a2x3 − a3x2a3x1 − a1x3a1x2 − a2x1

=

0 −a3 a2a3 0 −a1

−a2 a1 0

x1x2x3

.

Therefore A = [a1 a2 · · · an], while B is the skew-symmetric matrix given as

B =

0 −a3 a2a3 0 −a1

−a2 a1 0

.

Notice that B⊤ = −B; (here B⊤ is the transpose of B).

2.4. Now we turn to the proof of Theorem 2.3.1 above. There are two parts in the

conclusion of the theorem: first, there is a matrix A with the property that Tx = Ax, and

second, such a matrix is completely determined by T . The first part concerns the existence

of A and the second part its uniqueness. To prove the existence part, we have to find A

(which seems to be hiding somewhere.) To prove the uniqueness part, it is enough to find

out in what way does T determines A. Normally we prove the existence part first, because

normally we think: if it didn’t exist, what would be the point of proving (or even talking

about) its uniqueness? However, logically they are independent entities and it doesn’t

matter which comes first. Here we prove the uniqueness part first—this strategy of proof

is at odds with our normal thinking. There are two reasons for taking this strategy. First,

the uniqueness part is easier. Second, the proof of uniqueness part actually points out a

way to find A. It helps us to figure out how to prove the existence part! (You may not be

17

used to this unusual way of thinking. But people usually become smart when they begin

to think in an unusual way.)

Proof of Theorem 2.3.1. Suppose that Tx = Ax and we have to prove that A is

uniquely determined by T . Let A1, A2, . . . , An be columns of A so that

A = [A1 A2 · · · An].

It is easy to check that, for x = (x1, x2, · · · , xn) ≡ [x1 x2 · · · xn]⊤ ∈ Rn, T (x) is given by

Ax = [A1 A2 · · · An]

x1x2...xn

= A1x1 +A2x2 + · · · +Anxn. (2.4.1)

Let us find the result of T acting on the standard basis vectors:

e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1).

Putting x = e1 = (1, 0, 0, . . . , 0) in (2.4.1), i.e. x1 = 1, x2 = 0, x3 = 0 etc., we obtain

Te1 = A1. In the same way we can obtain Te2 = A2 etc. Now T completely determines

Te1 = A1, Te2 = A2, . . . , Ten = An

which in turn determines the matrix A = [A1 A2 · · · An].

Next we prove the “existence part”. Let T : Fm → Fn be a linear transformation.We have to find a matrix A such that Tx = Ax for all x ∈ Fn. Let A be the m×n matrixwith Tek as its kth column (1 ≤ k ≤ n). In other words, A = [A1 A2 · · · An], whereAk = Tek for 1 ≤ k ≤ n. We have to check Tx = Ax in order to show that A obtained inthis way will do the job. Let us “recycle” the computation given in (2.4.1) and write

Ax = A1x1 +A2x2 + · · · +Anxn := x1A1 + x2A2 + · · · + xnAn,

(x1A1 is the usual way for a scalar multiple of a column vector, and A1x1 is the correct

way x when x1 is regarded as a 1 × 1 matrix) from which it follows that

Ax = x1Te1 + x2Te2 + · · · + xnTen= T (x1e1 + x2e2 + · · · + xnen) = Tx.

Here we have used the linearity of T and the following elementary manipulation

x1e1 + x2e2 + · · · + xnen = x1(1, 0, . . . , 0) + x2(0, 1, . . . , 0) + · · · + xn(0, 0, . . . , 1)= (x1, x2, . . . , xn) = x.

18

Hence T = MA. The proof of Theorem A is complete. From the proof here, we see that

Fact. The columns of an m × n matrix A are MAe1, MAe2, . . . , MAen, wheree1, e2, . . . , en are standard basis vectors.

2.5. Now we use the fact stated at the end of the last subsection to decide the matrix

of some linear operators on vector spaces of the form Fn.

Example 2.5.1. Permutation matrices

Let π be a permutation of the set [n] ≡ {1, 2, . . . , n}; in other words, π is a bijection of[n]. Consider the map Tπ : F

n → Fn given by

Tπ(x1, x2, . . . , xn) = (xπ(1), xπ(2), . . . , xπ(n)).

Then it is routine to check that Tπ is linear. Hence, by Theorem A, there is a unique n×nmatrix P such that Tπx = Px, called the permutation matrix associated with π. Now

we want to give a specific description of P . According to the fact stated above, the first

column of P is given by Tπe1. For x = e1, we have x1 = 1, x2 = 0, x3 = 0 etc. Its image

is Tπx = (xπ(1), xπ(2), . . . , xπ(n)), where the kth entry xπ(k) is 1 precisely when π(k) = 1,

or k = π−1(1). Thus, except the π−1(1) entry, which is 1, all of the other entries of Tπe1is 0. Hence Tπe1 = eπ − 1(1). In the same way we have Tπe2 = eπ − 1(2), Tπe3 = eπ − 1(3) etc.

So we have

P = [eπ − 1(1) eπ − 1(2) · · · eπ − 1(n)].

To give a specific example, let n = 3 and let π be given by π(1) = 2, π(2) = 3 and π(3) = 1.

Then π−1(1) = 3, π−1(2) = 1, and π−1(3) = 2. Thus T (x1, x2, x3) = (x2, x3, x1) and the

permutation matrix for Tπ is

P = [eπ − 1(1) eπ − 1(2) eπ − 1(3)] = [e3 e1 e2] =

0 1 00 0 11 0 0

.

Notice that each row and each column has exactly one entry equal to 1 and the rest entries

are zeros. (Remark: Permutation matrices play an important role in doubly stochastic

matrices, which are useful for establishing some matrix inequalities. It turns out that all

doubly stochastic matrices form a convex set and the permutation matrices turn out to be

exactly the so called “extreme points” of this convex set.)

Example 2.5.2. Rotations

For a fixed real number θ, let T ≡ Tθ be the operator on R2 sending an arbitrary vector

19

v to Tθv is obtained by turning v through the angle θ in the anticlockwise direction:

It is not too hard to see that Tθ is a linear operator on R2. The question is: what is the

2 × 2 matrix inducing this linear operator? Let

A =

[

a bc d

]

.

be the matrix inducing T ≡ Tθ: Tv = Av for all v ∈ R2. Letting e1 = (1, 0) (= [1, 0]⊤)and e2 = (0, 1), we have Te1 = (a, c) and Te2 = (b, d). On the other hand, from the

following figure, we see that Te1 = (cos θ, sin θ), Te2 = (− sin θ, cos θ):

We conclude that the matrix which induces Tθ is

Aθ ≡[

cos θ − sin θsin θ cos θ

]

. (2.5.1)

It is a priori clear that the result of rotating a vector by an angle α, followed by a rotation

through an angle β, is the same as a single rotation with angle α + β. Putting this in

mathematical symbols, we have TαTβ = Tα + β . Those matrices inducing operators in this

identity have the same relation: Aα + β = AαAβ. From this relation and (2.5.1) above, it

follows immediately that

cos(α+ β) = cosα cosβ − sinα sinβ,sin(α+ β) = cosα sinβ + sinα cosβ,

which are well-known (but by no means obvious) identities in trigonometry.

20

2.6. In this section we discuss a book-keeping device to label vectors by columns of

numbers and to represent linear transformations by matrices, relative to some coordinate

system, or more precisely, a basis. By means of such a device, a problem about vectors

or linear transformations is converted to the corresponding problem about columns and

matrices, which is in general easier to manipulate. Bases in linear algebra serve the same

purpose as frames of reference in physics.

Definition 2.6.1. An ordered set of vectors b1,b2, . . . ,bn in a vector space V is

called a basis if each vector v in V can be written in a unique way as

v = v1b1 + v2b2 + . . .+ vnbn (2.6.1)

for some scalars v1, v2, . . . , vn. (Remark: Notice that there are two ingredients in this

definition: the possibility to write (2.6.1) and the uniqueness of such an expression.)

With a fixed basis B = {b1,b2, . . . ,bn} in a vector space, each vector v in V determines (ina unique manner) the scalars v1, v2, . . . , vn, via (2.6.1) above. These scalars are called the

coordinates of v relative to the basis B. We will arrange them into a matrix with a singlecolumn, denoted by [v]B, and call it the column representation or the coordinate

vector of v relative to B:

[v]B = (v1, v2, . . . , vn) ≡ [v1 v2 . . . vn]⊤.

The coordinates (or the column representation) of a vector depend on our choice of basis

at the outset. The subscript B in [v]B emphasizes this dependence. For conveniencesometimes we drop this subscript and write [v] if our choice of basis is understood.

Example 2.6.2. Standard examples of bases:

1. The standard basis for Fn. The vectors

e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1)

form a basis of Fn, called the standard basis (or the natural basis, or the usual basis)

of Fn. The kth vector ek in this basis, has 1 in the kth entry and 0 elsewhere. With

respect to this basis, the column representation of a vector v = (v1, v2, . . . , vn) in Fn is

[v] = [v1, v2, . . . , vn]⊤; (convince yourself this is the case.)

2. The standard basis for Pn. The monomials

1, x, x2, . . . , xn

21

form a basis of Pn. With respect to this basis, the column representing a polynomial

p(x) = a0 + a1x+ a2x2 + · · · + anxn in Pn is given by [ p ] = [a0, a1, a2, . . . , an]⊤.

Example 2.6.3. (a) Find the column representation of (1, 3) in R2 relative to the

basis B = {(1, 1), (1,−1)}. (b) Find the column representation of x3 relative to the basisE = {1, x− 1, (x− 1)2, (x− 1)3} in P3.

Solution. (a) Suppose [(1, 3)]B = [a, b]⊤. Then we have (1, 3) = a(1, 1) + b(1,−1),

which gives 1 = a+b, 3 = a−b. Thus a = 2, b = −1. So the answer is [(1, 3)]B = [2,−1]⊤.(b) Let [x3] E = (a0, a1, a2, a3). Then

x3 = a0 + a1(x− 1) + a2(x− 1)2 + a3(x− 1)3.

We have to find a0 to a3. There are several ways to do so. Here is one. Let y = x − 1.Then x = y+ 1 and (y+ 1)3 = a0 + a1y + a2y

2 + a3y3. Now (y+ 1)3 = 1 + 3y+ 3y2 + y3.

Thus 1 + 3y + 3y2 + y3 = a0 + a1y + a2y2 + a3y

3. So, by comparing coefficients of powers

of y, we have a0 = 1, a1 = 3, a2 = 3, a3 = 1. Thus [x3]E = (1, 3, 3, 1), which is our answer.

Once we have a basis B = (b1,b2, . . . ,bn) in a vector space V over the field F, wecan define a linear mapping T from V to Fn (or simply write T : V → Fn) by puttingTv = [v]B. The linearity of T means that the identity

[au + bv] = a[u] + b[v] (2.6.2)

holds for all vectors u, v in V and all scalars a, b in F; (for simplicity, we write [v]

for [v]B). To see this, write [u] = (u1, u2, . . . , un) and [v] = (v1, v2, . . . , vn). Then

u = u1b1 + u2b2 + · · · + unbn and v = v1b1 + v2b2 + · · · + vnbn. Hence

au + bv = a(u1b1 + u2b2 + · · · + unbn) + b(v1b1 + v2b2 + · · · + vnbn)= (au1 + bv1)b1 + (au2 + bv2)b2 + · · · + (aun + bvn)bn.

and hence

[au + bv] = (au1 + bv1, au2 + bv2, . . . , aun + bvn)

= a(u1, u2, . . . un) + b(v1, v2, . . . vn) = a[u] + b[v].

Notice that T is invertible. Its inverse T−1 simply sends any (u1, u2, . . . , un) in Fn to

u = u1b1 + u2b2 + · · · + unbn in V .

2.7. We have seen that, by introducing a basis to a vector space, we can “label” a

vector in this space by a bunch of numbers arranged in a column, giving us the column

22

representation of this vector. Next we explain how to “label” a linear mapping by a bunch

of numbers arranged into a rectangular array — of course here I mean a matrix. Let us

start with a linear transformation T from a vector space V to a vector space W . Suppose

that V = {v1,v2, . . . ,vn} is a basis of V and W = {w1,w2, . . . ,wm} is a basis of W . Wecan use a matrix to represent T . This matrix is called the representing matrix of T ,

or the matrix representing T , or just the matrix of T , relative to the bases V and W ,and is denoted by

[T ]VW , or simply [T ].

(The subscript W and the superscript V in [T ]VW emphasize the dependence of this matrixon these two bases and, when their presence is understood, we simply write [T ] for this

matrix) Matrix [T ] is constructed column by column in the following way. To find its first

column, we apply T to the first basis vector v1 in V to get Tv1. As Tv1 is a vector in W ,we can express it in a unique way as a linear combination of basis vectors in W , say

Tv1 = t11w1 + t21w2 + · · · + tm1wm.

The coefficients of this linear combination, namely, t11, t21 etc. will fill up the first column

of [T ]. To find the second column of [T ], we apply T to v2 to get Tv2 in W and express

it as a linear combination of vectors in W , say

Tv2 = t12w1 + t22w2 + · · · + tm2wm.

Then fill up the second column by coefficients of this linear combination. The other columns

of [T ] are obtained in the similar fashion. Thus, we come up with

[T ] =

t11 t12 · · · · · · t1nt21 t22 · · · · · · t2n...

......

tm1 tm2 · · · · · · tmn

,

where the jth column of [T] (consisting of t1j , t2j , . . . , tmj) comes from

Tvj = t1jw1 + t2jw2 + · · · + tmjwm.

In other words, the jth column is just [Tvj ] W , the column representing Tvj relative to

the basis W in W. In case V = W and V = W , T is a linear operator on V and the matrix[T ] ≡ [T ]VV is a square matrix. In this case we will write [T ]V instead of [T ]VV .

Example 2.7.1. (a) Let T be the operator on V = P2 sending p(x) to p(x+ 1). Find

the matrix [T ] of this operator relative to the standard basis B = {1, x, x2}. (b) Find the

23

matrix of differential operator D defined on P2 by D(p) = p′ (the derivative of p), relative

to the standard basis B. (c) Consider the linear map M from P1 to P2 defined by therecipe M(p(x)) = xp(x). In P1 we take the standard basis B = {1, x} but in P2 we takethe basis C = {1, 1 + x, (1 + x)2}. Find the matrix representation relative to these bases.

Solution: (a) and (b) Since T (1) = 1, T (x) = x + 1 = 1 + x, T (x2) = (x + 1)2 =

1 + 2x+ x2 and D(1) = 0, D(x) = 1, D(x2) = 2x, we have

[T ]B =

1 1 10 1 20 0 1

, [D]B =

0 1 00 0 20 0 0

.

(c) It is easy to get M(1) = x, M(x) = x2. But, in order to get the matrix [M ]BC , we

have to write M(1) = x and M(x) = x2 as linear combinations of 1, 1 + x, (1 + x)2,

in other words, we have to find out a0, a1, a2 and b0, b1, b2 in the following identities:

x = a0+a1(1+x)+a2(1+x)2, x2 = b0+b1(1+x)+b2(1+x)

2. Let s = 1+x so that x = s−1.Then x = −1 + s = −1 + (1 + x) and x2 = (s− 1)2 = 1− 2s+ s2 = 1− 2(1 + x) + (1 + x)2.The matrix [M ] ≡ [M ]BC can be read off from these identities:

[M ] =

−1 11 −20 1

.

Relative to a basis of V , for S, T ∈ L (V ), a, b ∈ F and v ∈ V , we have

[aS + bT ] = a[S] + b[T ], [ST ] = [S][T ], [Sv] = [S][v]. (2.7.1)

This tells us that the representation by matrices captures the essence of operators. But

we defer the proof of this fact to Chapter 3 when the summation symbol∑

will be

systematically used.

24

EXERCISE SET I.2.

Review Questions: What is the meaning of “the linear transformation induced by a

matrix”? What is the basic fact concerning such induced transformations? How to find

the matrix which induces a given linear transformation (in principle)? Do I understand

each of the following concepts?

basis, column representation of vectors, matrix representation of operators

Can I describe the procedure of finding column representation of vectors and matrix rep-

resentation of linear transformations to any first year student?

Drills

1. Write down the matrices which induce the following operators on F3:

(a) T , sending (x1, x2, x3) to (x1 + x2 − x3, x1 − x2, x2 + x3).(b) D, sending (x1, x2, x3) to (−x1, 2x2, x3).(c) R, sending (x1, x2, x3) to (x1 cos θ + x3 sin θ, x2, −x1 sin θ + x3 cos θ).(d) (with F = R) Q, sending (x1, x2, x3) to (1, 2, 3) × (x1, x2, x3).

2. Find the 2 × 2 real matrix A such that its induced operator MA projects each vectorv in R2 orthogonally to the line passing through the origin and (1, 1), as indicated in

the left figure below:

3. Find the 3 × 3 matrix A such that MA is the 120o rotation in R3 about the axisthrough the origin and the point (1, 1, 1), indicated in the right figure above.

4. In each of the following cases, write down the matrix A of the operator T ≡ MA onR2 satisfying the given conditions:

(a) Te1 = (1, 1) and T2 = O.

(b) Te1 = (2, 2) and T2 = T .

(c) Te1 = (cos θ, sin θ) and T2 = I, where the given angle θ satisfies 0 < θ < π/2.

25

6. Give a basis for each of the following vector spaces:

(a) P3, (b) R2, (c) C3, (d) P1 × P2, (e) M2,2.

7. In each of the following cases, write down a basis for the kernel ker(T ) and the range

T (V ) of the given linear transformation T :

(a) T : P2 → P2, Tp = p′, the derivative of p.(b) T : P2 → P3, T (p(x)) = xp(x).

(c) T : F2 → F2, T = MA, where A =[

1 11 1

]

.

(d) T : R2 → P2, T ([a, b]⊤) = a+ bx2.

(e) T : M2,2 →M2,2, T (X) = BX , where B =[

1 01 0

]

.

(f) T : M2,2 →M2,2, T (X) = XB, where B is the same as the one in (e) above.

(g) T : M2,2 →M2,2, T (X) = BX −XB, where B is the same as the one in (e).

8. In each of the following cases, find the column representation of the vector v relative

to the given basis B in V . (The justification of B to be a basis is not required.)(a) V = R2, v = (2, 3), B = {(1, 0), (1, 1)}.(b) V = C2, v = (2 + 2i, 0), B = {b1,b2} with b1 = (1 + i, 1 − i), b2 = (1 − i, 1 + i).(c) V = P1, v is the polynomial 2 + 3x and B = {1 + x, 1 − x}.(d) V = P2, v is 1 + x+ x

2, B = {1 + x, 1 + x2, x+ x2}.

(e) V = M2,2, v is

[

5 20 1

]

and B ={[

1 00 1

]

,

[

1 00 −1

]

,

[

0 10 0

]

,

[

0 01 0

]}

.

(d) V , W , B and C are the same as those in (c), T (p(x)) = p(x2).

(e) V = W = M2,2, T (X) = AX −XA with A =[

1 01 0

]

, B and C are the standardbasis of M2,2:

B = C ={[

1 00 0

]

,

[

0 10 0

]

,

[

0 01 0

]

,

[

0 00 1

]}

.

Exercises

1. Use Theorem 2.3.1 to describe (a) the correspondence between the vector spaces

L (Fn,F) and M1,n, and (b) the correspondence between the vector spaces L (F,Fn)and Mn,1.

26

2*. Show that a 2 × 2 matrix A induces a rotation of R2 if and only if it is of the form

A =

[

a −bb a

]

with det(A) ≡ a2 + b2 = 1.

Check that A−1 = A⊤.

3*. According to special relativity, a boost is a linear operator on R2 induced by a matrix

of the form [a bb a

]

with

∣

∣

∣

∣

a bb a

∣

∣

∣

∣≡ a2 − b2 = 1 and a > 0.

Show that the product of two boosts is a boost.

4*. Let v = (a, b) be a fixed nonzero vector in R2. Find the matrix A which induces the

projection P onto the one-dimensional subspace spanned by v, as indicated in the left

figure below:

5*. Let Lα be the line through the origin in the plane R2, so that the angle between

Lα and the horizontal axis is α. (a) Find the 2 × 2 matrix which induces the mirrorreflection Tα about Lα indicated in the right figure above. (b) Show that the product

TαTβ of two such reflections is a rotation Rθ. Find the angle θ of rotation in terms of

α and β.

6*. Let n = (n1, n2, n3) be a unit vector in R3 (|n|2 ≡ n21 + n22 + n23 = 1) and let H be

the plane in R3 through the origin and perpendicular to n. Let Tn be the mirror

reflection with respect to H, indicated in the following figure:

(a) Show that Tnv = v − 2(v · n)n for all v ∈ R3. (b) Write down the matrix whichinduces this reflection. (c) Suppose that m = (m1,m2,m3) is another unit vector and

27

Tm is the reflection defined in the similar fashion. Show that the vector w ≡ n ×mis invariant for Tm Tn, that is T m Tnw = w.

7*. Find the 3×3 matrix A which induces the 60o rotation T in R3 about the axis throughthe origin and the point (1, 1, 1). Compute A2. Explain why your answer for A2 is

the expected one.

28

§3. Linear Equations

3.1. Let V and W be vector spaces over the field F of scalars and let T : V →Wbe a linear mapping. Let b be a vector in W . Consider the equation

Tx = b (3.1.1)

We say that a vector v in V is a solution to this equation if Tv = b.

Example 3.1.1. System of linear equations

Let A be an m× n matrix over F and let b be a vector in Fm, say

A =

a11 a12 a1na21 a22 a2n...

......

am1 am2 amn

, b =

b1b2...

bm

.

Let T = MA, that is, the linear map T : Fn → Fm given by Tx = Ax. Then the

equation (3.1.1), that is, Tx = b, is the following system of linear equations

a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2

· · · · · · · · · · · · · · · · · · · · · · · ·am1x1 + am2x2 + · · · + amnxn = bm

(3.1.2)

The reader is assumed to be familiar with some general method to solve this system.

When the spaces V and W are finite dimensional, using coordinate systems, it is possible

to convert a general equation (3.1.1) in this form. Thus, (3.1.1) in principle can be solved.

Example 3.1.2. Interpolation Problem

Let us briefly recall Example 1.6.1. Let FS be a linear space of functions (allowed to takecomplex values) defined on a set S and let s1, s2, . . . , sn be some selected points in S.

Consider the linear transformation T : FS → Cn defined by

Tf = (f(s1), f(s2), . . . , f(sn)).

Given b = (b1, b2, . . . , bn), a solution to the equation Tf = b is a function f in FSsatisfying f(s1) = b1, f(s2) = b2, . . . , f(sn) = bn. Finding such a solution is called an

interpolation problem.

29

Example 3.1.3. Linear ordinary differential equations

Consider the space V = F(R,C) of differentiable functions and the operator T on Vgiven by Ty = dy

dt− ay, where a is a fixed complex number and t is the variable of the

“unknown” function y. For any f in V , the equation Ty = f (which has the same form as

(3.1.1)) is a linear ODE (ordinary differential equation):

dy

dt− ay = f(t). (3.1.3)

To solve this, we take two steps. First, consider the corresponding homogeneous equation

dy

dt− ay = 0. (3.1.4)

We can solve this equation by the method called separation of variables as follows. Multiply

(3.1.4) by dt to rewrite it as dy − aydt = 0, or dy = aydt. Next, divide both sides byy to get dy/y = adt. Now, put in integral sign on each side to arrive at

∫

dy/y =∫

adt.

Thus we have ln y = at+ c, where c is a constant, or y = eat+ c = Ceat, where C = ec

is also a constant. The expression

y = Ceat (3.1.5)

is the general solution of the homogeneous equation (3.1.4). The second step we take is

to look for a special solution of (3.1.3) by using a method called variation of constants

(or variation of parameters, according to some books). Following this method, we seek for

a solution of (3.1.3) of the form y = u(t)eat, which is obtained from (3.1.5) by changing

the constant C to a function u(t). For y = u(t)eat, we have

dy

dt− ay = u′(t) eat + u(t)(aeat) − au(t)eat = u′(t) eat.

Thus, (3.1.3) becomes u′(t) eat = f(t), or u′(t) = e−atf(t), which gives u(t) =∫

e−atf(t) dt.

A particular solution of (3.1.3) is y(t) =(∫ t

0e−asf(s) ds

)

eat =∫ t

0ea(t−s)f(s) ds. The

expression y = Ceat +∫ t

0ea(t−s)f(s) ds is the general solution of (3.1.3), according to a

general principle explained in the next subsection. Although this is a closed formula for

solving (3.1.3), finding the integral on its right hand side may still cause some difficulty.

In §3.5 we will describe a practical method for solving inhomogeneous equations.

Example 3.1.4. Systems of linear ordinary differential equations

We may extend equation (3.1.3) by changing a into a matrix A and y into a vector–valued

function y. Let A = [aij ] be an n × n matrix over C. Consider the vector spaceV = F(R,Cn) of (smooth) functions defined on the real line taking values in Cn. Let

30

f ∈ V be given. For convenience, we use letter t as the real variable. Thus, a “vector” inV can be written as y(t) = (y1(t), y2(t), . . . , yn(t)). For each j, the j–component of y(t)

is yj(t), which is a function of real variable t. Write y′j(t) for the derivative of yj(t) and let

y′(t) = (y′1(t), y′2(t), . . . , y

′n(t)). Define a linear operator T on V by putting Ty = y

′−Ay.Given f(t) = (f1(t), f2(t), . . . , fn(t)), the equation Ty = f becomes y

′ −Ay = f , or

y′(t) = Ay(t) + f(t). (3.1.6)

This is the “vector form” of the general system of linear ordinary differential equations

with constant coefficients. If we spell out all components of this equation, we have

y′1 = a11y1 + a12y2 + · · · + a1nyn + f1(t)y′2 = a21y1 + a22y2 + · · · + a2nyn + f2(t)

...

y′n = an1y1 + an2y2 + · · · + annyn + fn(t)

(3.1.7)

We may consider the more general case, where the matrix entries aij (also called coef-

ficients) are nonconstant, that is, they are allowed to be functions of t: aij = aij(t).

In this case A = [aij ] is a matrix-valued function of t and, to emphasize this, we write

A = A(t) = [aij(t)]. Equation (3.1.3) becomes y′(t) = A(t)y(t) + f(t). When the coeffi-

cients aij are nonconstant, the basic theory of linear ordinary differential equations makes

no big difference, but the solutions usually no longer can be expressed explicitly in terms

of elementary functions.

Example 3.1.5. Higher order linear ordinary differential equations

In practice we often deal with higher order equations of the form

y(n) + an−1y(n−1) + · · · + a2y(2) + a1y′ + a0y = f(t) (3.1.8)

where y(k) stands for kth order derivative of y. We can rewrite (3.1.8) in the form Ty = f

as follows. Let D be the operator of taking derivative; (we can write D = d/dt if we wish).

Introduce the polynomial p(λ) = λn + an−1λn−2 + · · · + a1λ + a0, where the coefficients

ak come from (3.1.8). Let

T = p(D) ≡ Dn + an−1Dn−1 + · · · + a1D + a0I (3.1.9)

Then (3.1.8) can be rewritten as Ty = f . Equation (3.1.8) can be rendered to a system of

first order equations. Thus in theory equations of the form (3.1.8) are covered by systems

of equations of the form (3.1.7). This can be seen by introducing the new functions

31

y1 = y, y2 = y′, . . . , yn = y

(n−1). We give an example to see how this can be done. The

general case will be given in Appendix C. Consider the equation

D3y − 2D2y + 3Dy − 5y = (t− 2)et. (3.1.10)

Introduce new functions y1 = y, y2 = Dy, y3 = D2y. Then Dy3 = D

3y = 5y − 3Dy +2D2y + (t− 2)et. Hence

Dy1 = y2, Dy2 = y3, and Dy3 = 5y1 − 3y2 + 2y3 + (t− 2)et

We can rewrite (3.1.10) as y′ = Ay + f , where

y =

y1y2y3

, A =

0 1 00 0 15 −3 2

, f =

00

(t− 2)et

Example 3.1.6. Linear difference equations

A general linear difference equation of order N can be written as

a0yk + a1yk + 1 + a2yk + 2 + · · · + aNyk + N = bk (3.1.11)

Recall from Example 1.6.6 that on the space of all sequences y = (y0, y1, y2, . . .), the

shift operator S is defined by (Sa)k = ak + 1. Notice that (Iy)k = yk, (Sy)k = yk + 1,

(S2y)k = (S(Sy))k = (Sy)k + 1 = yk + 2, (S3y)k = (S

2(Sy))k = (Sy)k + 2 = yk + 3, etc. In

general,

(Smy)k = yk + m. (3.1.12)

Let p(λ) = a0 + a1λ+ a2λ2 + · · · + aNλN (here we use λ as the variable) and

T = p(S) = a0I + a1S + a2S2 + · · · + aNSN . (3.1.13)

Then, in view of (3.1.12), we have

(Ty)k = a0(Iy)k + a1(Sy)k + a2(S2y)k + · · · + aN (SNy)k

= a0yk + a1yk + 1 + a2yk + 2 + · · · + aNyk + N

which is the left hand side of (3.1.11), Thus we have (Ty)k = bk for all k. Therefore

Ty = b, with y = (y0, y1, . . .) and b = (b0, b1, . . .). We have shown that (3.1.11) can be

written as Ty = b, where T is given in (3.1.13).

We mention that there are many other important types of linear equations which can be

put in the form of (3.1.1), not given here, such as linear partial differential equations.

32

3.2. Now we begin with some elementary theory of the equation

Tx = b. (3.2.1)

Here T is a linear mapping from V to W . If we replace the right hand side b by the

zero vector 0, we get the so–called the corresponding homogeneous equation. All

solutions to Tx = 0 form a vector space, called the kernel of T and is denoted by kerT .

Indeed, if u, v are in kerT , then Tu = 0 and Tv = 0, and hence

T (au + bv) = aTu + Tv = 0

(for any scalars a, b), which entails au + bv ∈ kerT . In mathematical symbols, we write

kerT = {x ∈ V : Tx = 0}.

Now suppose that v0 ∈ V is a solution to (3.2.1), that is, Tv0 = b. We claim: thesolution set of (3.2.1) is v0 + kerT ≡ {v0 + x : x ∈ kerT}; in other words, the generalsolution to (3.2.1) is the sum of the particular solution v0 and the general solution to the

homogeneous equation (3.2.2). Indeed, if x ∈ kerT , then

T (v0 + x) = Tv0 + Tx = b + 0 = b

showing that v0 + x is indeed a solution to (3.2.1). On the other hand, if v is another

solution of (3.2.1), then, letting x = v − v0, we have v = v0 + x and

Tx = T (v − v0) = Tv − Tv0 = b− b = 0,

telling us that v = v0 + x with x in kerT . This explains why sometimes (but not

always) we solve (3.2.1) in two steps: step one, find the general solution to its corresponding

homogeneous equation; step two, find a particular solution to (3.2.1).

Example 3.2.1. Consider the differential equation y′−y = x. A particular solution tothis equation is yp = 1−x, as we can check directly; (a method for finding such a particularsolution will be discussed in §3.5 below). The general solution to the homogeneous equationy′ − y = 0 can be found by the method of separation of variables: rewrite this equation asdy/y = dx and integrate:

∫

dy/y =∫

dx, which gives ln y = x + c, or y = Cex, where

C = ec. From our above discussion we see that y = Cex + 1 − x is the general solution toy′ − y = x.

Whether equation (3.2.1) has a solution depends on the vector b on the right hand

side. We say that b is in the range of T if (3.2.1) has a solution. Thus the range of T

33

is the set of all vectors b in W for which (3.2.1) has a solution, that is, there exists v

in V such that Tv = b. We denote by T (V ) the range of T . Thus, in math symbols,

T (V ) = {y ∈W : there exists v ∈ V such that Tv = y} = {Tv : v ∈ V }.

Notice that T (V ) is a subspace of W . To see this, take y1, y2 in T (V ). Then there

exist v1 and v2 such that Tv1 = y1 and Tv2 = y2. For any scalars a1, a2, we have

a1y1 + a2y2 = a1Tv1 + a2Tv2 = T (a1v1 + a2v2)

showing that a1y1 + a2y2 is indeed in T (V ).

In the rest of the present section, we discuss some methods of solving some linear

equations of types mentioned in various examples in the subsection §3.1.

3.3. Denote by P the space of all polynomials and let s1, s2, . . . , sn be distinctpoints in the complex plane. Let T : P → Cn be the map given by

Tp = (p(s1), p(s2), . . . , p(sn)).

We are asked to solve Tp = b for given b = (b1, b2, . . . , bn). The corresponding

homogeneous equation is Tp = 0, or (p(s1), p(s2), . . . , p(sn)) = 0. Thus, p is a solution

to Tp = 0 if and only if s1, s2, . . . , sn are roots of p, in other words,

Q(x) ≡ (x− s1)(x− s2) · · · (x− sn)

is a factor of p(x). Thus the general solution to Tp = 0 is p(x) = Q(x)f(x), where f(x)

is any polynomial. Denote by Qk(x) the polynomial obtained from Q(x) by deleting

the factor x− sk; in other words, Qk(x) is that polynomial such that the identity

Q(x) = (x− sk)Qk(x)

holds. Notice that Qk(sj) = 0 for all j �= k and Qk(sk) �= 0. Let Lk(x) = Qk(x)/Qk(sk).Then we have Lk(sj) = 0 for j �= k and Lk(sk) = 1. (Using the Kronecker delta δjk,we can write Lk(sj) = δjk.) Recall the standard basis of C

n:

e1 = (1, 0, 0, . . . , 0, 0), e2 = (0, 1, 0, . . . , 0, 0), . . . en = (0, 0, 0, . . . , 0, 1).

We have proved that TLk = ek for all k. Thus,

b =∑n

k = 1bkek =

∑n

k = 1bkTLk = T

(∑n

k = 1bkLk

)

34

showing that the Lagrange polynomial L(x) =∑n

k = 1bkLk(x) is a special solution to

the interpolation problem Tp = b. The general solution to this problem can be written

as p(x) = L(x) +Q(x)f(x), where f(x) is an arbitrary polynomial and L(x), Q(x) are

polynomials as given above.

Example 3.3.1. Solve the following Lagrange interpolation problem: find a

polynomial p of degree 2 such that p(1) = 3, p(2) = 6, p(3) = 13.

Solution. Using the above notation, we have Tp = (p(1), p(2), p(3)) and b =

(3, 6, 13). We are asked to solve Tp = b. Now Q(x) = (x − 1)(x − 2)(x − 3), Q1(x) =(x − 2)(x − 3), Q2(x) = (x − 1)(x − 3) and Q3(x) = (x − 1)(x − 2), with Q1(1) = 2,Q2(2) = −1 and Q3(3) = 2. So L1(x) = 12(x − 2)(x − 3), L2(x) = −(x − 1)(x − 3) andL3(x) =

12(x− 1)(x− 2). The Lagrangian polynomial is

L(x) =3

2(x− 2)(x− 3) − 6(x− 1)(x− 3) + 13

2(x− 1)(x− 2) = 2x2 − 3x+ 4.

So the general solution of our problem is 2x2 − 3x+ 4 + (x− 1)(x− 2)(x− 3)f(x), wheref(x) is any polynomial.

3.4. A common situation occurs in a homogeneous equation Tx = 0 is that T = p(S),

where p is a polynomial and S is an operator considerably simpler than T . Example 3.1.4

with the operator in (3.1.6) and Example 3.1.5 with the operator in (3.1.10) are typical

in such situation. Let λ1, λ2, . . . , λr be all roots of the polynomial p with multiplicities

m1, m2, . . . , mr respectively. Then we have

p(x) = a(x− λ1)m1(x− λ2)m2 · · · (x− λr)mr .

and hence T = p(S) = a(S − λ1I)m1(S − λ2I)m2 · · · (S − λrI)mr . We claim that solving

p(S)x = 0 (3.4.1)

can be reduced to solving each of

(S − λ1I)m1x = 0, (S − λ2I)m2x = 0, . . . , (S − λrI)mrx = 0 (3.4.2)

More precisely, we have

Proposition 3.4.1. With the above notation, the general solution to p(S)x = 0 can

be written as x = x1 + x2 + · · · + xr, where xk (1 ≤ k ≤ r) is the general solution to thekth equation in (3.4.2), namely (S − λkI)mkx = 0.

35

Before proving the above proposition, we give some examples to illustrate this.

Example 3.4.2. Find the general solution to differential equation y′′ − y′ − 2y = 0.

Solution. Rewrite the equation as p(D)y = 0, where p(x) = x2−x−2 = (x−2)(x+1)and D is the differentiation operator given by Dy = y′. Reduce p(D)y = 0 to two

equations (D − 2I)y = 0 and (D + I)y = 0, or y′ − 2y = 0 and y′ + y = 0, which havegeneral solutions y = Ce2t and y = Ce−t respectively; (here we use t for the variable of

the function y). Hence the general solution to y′′ − y′ − 2y = 0 is y = C1e2t + C2e−t.

Example 3.4.3. Find the general solution to differential equation y′′ + y = 0.

Solution. Rewrite the equation as p(D)y = 0, where p(x) = x2 + 1 = (x− i)(x+ i)and D is the differentiation operator given by Dy = y′. Reduce p(D)y = 0 to two

equations (D − iI)y = 0 and (D + iI)y = 0, which have general solutions y = Ceit andy = Ce−it respectively. The general solution to y′′ + y = 0 is y = C1e

it +C2e−it. Using

Euler’s formula eit = cos t+ i sin t and e−it = cos t− i sin t, we can put it in another form:y = A cos t+B sin t.

Example 3.4.4. Solve the difference equation yn + 2 − yn+ 1 − 2yn = 0.

Solution. Rewrite the equation as p(S)y = 0, where p(x) = x2−x−2 = (x−2)(x+1)and S is the shift operator given by Syn = yn + 1. Reduce p(S)y = 0 to two equations

(S − 2I)y = 0 and (S + I)y = 0, or yn + 1 − 2yn = 0 and yn + 1 + yn = 0, which havegeneral solutions yn = a2

n and y = a(−1)n respectively. Hence the general solution toyn+ 2 − yn+ 1 − 2yn = 0 is y = a2n + b(−1)n.

Now we return to the proof of Proposition 3.4.1. Since (S − λkI)mk is a factor ofT = p(S), solutions to (S−λkI)mkx = 0 are also solutions to p(S)x = 0. Hence expressionsof the form x = x1 + x2 + · · · + xr described in the proposition are solutions to Tx = 0.Next we prove that all solutions to Tx = 0 can be put in the form x = x1 + x2 + · · · + xras described in the proposition (this is the hard part). Let

p1(x) = (x− λ1)m1 , p2(x) = (x− λ2)m2 · · · (x− λr)mr .

Then p(x) and p2(x) are coprime (because they don’t have any root in common) and hence

there are polynomials q1(x) and q2(x) such that p1(x)q1(x)+p2(x)q2(x) = 1. Suppose that

v is a solution to Tx = 0. Let v1 = p2(S)q2(S)v and v2 = p1(S)q1(S)v. Then

v = Iv = (p1(S)q1(S) + p2(S)q2(S))v = v2 + v1 = v1 + v2

36

with p1(S)v1 = p1(S)p2(S)q2(S)v = p(S)q2(S)v = q2(S)p(S)v = q2(S)Tv = 0 and

similarly we have p2(S)v2 = 0. Thus v = v1 + v2, where v1 is a solution to p1(S)x = 0

and v2 is a solution to p2(S)x = 0. Now the proof can be completed by induction on r.

3.5. For a linear operator T on a linear space V , sometimes we can find a particular

solution of Tx = b by taking a judicial choice of a subspace M such that b belongs to M ,

and M is invariant under T in the sense that T (M) ⊆ M , that is, for all v in M , Tv isalso in M . This method will be clear by going through some examples. (Our choice of M

in these examples will be illuminated in §1.3 of the next chapter.)

Example 3.5.1. Find the indefinite integral∫

ex sinx dx by using functions of the

form aex sinx+ bex cosx.

Solution. This integral is a solution to the equation Du = ex sinx, where D = d/dx.

Our choice of subspace M consists of functions of the form u = aex sinx+ bex cosx. Now

Du = D(aex sinx+ bex cosx)

= a(ex sinx+ ex cosx) + b(ex cosx− ex sinx)= (a− b)ex sinx+ (a+ b)ex cosx.

Thus, if u is indeed a solution to Du = ex sinx, then we set a− b = 1 and a+ b = 0, whichgives a = 1/2 and b = −1/2. So u = 12ex sinx − 12ex cosx is a solution to Du = ex sinxand ∫

ex sinxdx =ex

2(sinx− ex cosx) +C.

This is originally a question in calculus, usually answered by integration by parts. As you

see, it is easier to answer this by using linear algebra.

Example 3.5.2. Find a particular solution to y′′ +y′ + y = x2 by using the subspace

P2, which consists of functions of the form a+ bx+ cx2.

Solution. Here our choice of the subspace is P2. Suppose that y = a+ bx + cx2 is a

particular solution. Then

x2 = y′′ + y′ + y

= 2c+ (b+ 2cx) + (a+ bx+ cx2)

= (2c+ b+ a) + (2c+ b)x+ cx2

which gives 2c+ a+ b = 0, 2c+ b = 0 and c = 1. So c = 1, b = −1/2 and a = −3/2. Weconclude that y = x2 − 12x− 32 is a particular solution.

37

The method described here is not limited to finding a particular solution to a nonhomo-

geneous equation, as shown in the following example.

Example 3.5.3. Find the general solution to y′′ − 2y′ + y = x2.

Solution. Use the method described in the last example, we can find a particular

solution to this equation yp = x2 + 4x + 6. Next we solve the homogeneous equation

y′′ − 2y′ + y = 0, which can be rewritten as (D2 − 2D + I)y = 0, or (D − I)2y = 0.Let z = (D − I)y. Then we have (D − I)z = 0 with z = cex as its general solution.Next we write (D − I)y = z as y′ − y = cex. Choose the subspace M of functionsof the form axex + bex. If y = axex + bex is a solution to y′ − y = cex, then, fromy′ − y = (aex + axex + bex) − (axex + bex) = ax we get aex = cex and hence a = c. Thusy = axex + bex (b arbitrary) is a solution to y′ − y = aex and hence is also a solution toy′′ − 2y′ + y = 0. Thus the general solution to the original equation y′′ − 2y′ + y = x2 isy = axex + bex + x2 + 4x+ 6.

Once we know in which subspace we should look for a solution, a complicated expres-

sion for the operator T in the equation Tx = b does not intimidate us, as shown in the

following example.

Example 3.5.4. Let T be the operator on the space of functions given by Tf(x) =

xf ′(x) + f(x+ 1) + f(0)x2. Find a (special) solution to Tf(x) = 1 − 2x+ 5x2.

Solution. Since the right hand side is in P2 and since, as we can check, P2 is an

invariant subspace of T , it is natural to look for a solution in P2. So we set f(x) =

a+ bx+ cx2. Then

Tf(x) = x(b+ 2cx) + a+ b(x+ 1) + c(x+ 1)2 + ax2

= (a+ b+ c) + (2b+ 2c)x+ (3c+ a)x2.

In order to satisfy the equation Tf(x) = 1 − 2x+ 5x2, we set a+ b+ c = 1, 2b+ 2c = −2and 3c + a = 5, which gives a = 2, b = −2 and c = 1. Hence f(x) = 2 − 2x + x2 is asolution to the equation Tf(x) = 1 − 2x+ 5x2.

In the next chapter we provide some clue to “right guess” for solving Tx = b under certain

circumstances by the dimensional consideration. In chapter 3 we describe a general method

of solving system of equations called diagonalization.

38

EXERCISE SET I.3.

Review Questions: What are linear equations and their solutions? What are the major

examples of linear equations? What is the relation between solutions of Tx = b and solu-

tions of its homogeneous equation Tx = 0? How do we use factorization of a polynomial

p(x) to solve a higher order linear differential equation p(D)y = 0 or difference equation

p(S)y = 0?

Drills

1. Convert each of the following differential equations into a system of first order equa-

tions of the form y′ = Ay + f in vector notation.

(a) y′′ + 2y′ − 3y = 1 + t.(b) y′′ + y = sin t.

(c) y(3) − 2y(2) + 3y′ − y = 0.(d) y′′ + ty′ + (cos t)y = sin t.

2. Convert each of the following difference equations into a system of first order equations

of the form yn + 1 = Ayn + f in vector notation.

(a) yn + 2 + 2yn+ 1 − 3yn = 1 + t.(b) yn + 2 + yn = sin t.

(c) yn + 3 − 2yn+ 2 + 3yn + 1 − yn = 1.(c) yn + 2 − 2nyn+ 1 + 3e−nyn = n2.

3. Solve each of the following Lagrange interpolation problems

(a) p ∈ P1, p(−1) = 3, p(2) = 9.(b) p ∈ P1, p(−1) = 2 − i, p(i) = 4 − i.(c) p ∈ P2, p(0) = 1, p(1) = 3, p(2) = 7.(d) p ∈ P3, p(−1) = −4, p(0) = −1, p(1) = 0, p(2) = 5.

4. Solve each of the following homogeneous linear differential equations.

(a) y′′ − 2y′ − 3y = 0.(b) y′′ − 2y′ + 2y = 0.(c) y′′ − 4y′ + 4y = 0.

39

(d) y(4) − y = 0.

5. Solve each of the following homogeneous linear difference equations.

(a) yn + 2 − 2yn+ 1 − 3yn = 0.(b) yn + 2 − 2yn+ 1 + 2yn = 0.(c) yn + 2 − 4yn+ 1 + 4yn = 0.(d) yn + 4 − yn = 0.

6. Use the method of linear algebra as suggested in the present section to find the fol-

lowing indefinite integrals.

(a)

∫

(4 + x + 5x22x3)ex dx. (Hint: Use the linear space of functions which can be

expressed in the form aex + bxex + cx2ex + dx3ex.)

(b)

∫

(2 sin2 x − cos2 x + sinx cosx)ex dx. (Hint: Use the linear space of functionswhich can be expressed in the form aex sin2 x+ bex cos2 x+ cex sinx cosx.)

7. Use the method suggested in the present section solve each of the following linear

differential equations.

(a) y′′ − 2y′ − 3y = 8 − 4t− 3t2. (Hint: Use the linear space of functions which canbe expressed in the form a+ bt+ ct2.)

(b) y′′−2y′+2y = et. (Hint: Use the linear space of functions which can be expressedin the form aet.)

(c) y′′ − 4y′ + 4y = tet − 4et. (Hint: Use the linear space of functions which can beexpressed in the form aet + btet.)

(d) y(4) − y = cos t − sin t. (Hint: Use the linear space of functions which can beexpressed in the form a sin t+ b cos t.)

8. Use the method suggested in the present section solve each of the following linear

difference equations.

(a) yn + 2 − 2yn+ 1 − 3yn = 2 + 4n− 4n2.(b) yn + 2 − 2yn+ 1 + 2yn = n+ 1.(c) yn + 2 − 4yn+ 1 + 4yn = 2n + 2.(d) yn + 4 − yn = 1. (Hint: Try yn = an+ b.)

40

Appendices for Chapter I

Appendix A*: Axioms for vector spaces

By a vector space V over a field F we mean a set V with an algebraic structure endowed

by the following two operations

1. Addition, allowing us to add two vectors u and v to obtain their sum u + v.

2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av.

such that the following 8 axioms are satisfied:

(V1) u + v = v + u (for all u,v in V .)

(V2) u + (v + w) = (u + v) + w, (for all u,v,w in V .)

(V3) There is a unique object 0 called zero vector such that u + 0 = u for all u.

(V4) For each u in V , there is a v in V such that u + v = 0. (We will call v the

negative of u and denote it by −u.)(V5) (a+ b)u = au + bu (for all u in V and for all scalars a, b.)

(V6) a(u + v) = au + av (for all u,v in V and for all scalars a.)

(V7) a(bu) = (ab)u (for all u in V and for all scalars a, b.)

(V8) 1u = u (for all u in V .)

There is a systematic way to read these rules governing the operations (addition and scalar

multiplication) of a vector space. (V1) to (V4) concern addition only. (V1) and (V2) are

commutative law and associative law respectively, same as those governing addition of

numbers. (V3) and (V4) concern the existence of the zero vector and the “negative” to a

vector. Any algebraic system with an addition operation satisfying (V1) to (V4) is called

an abelian group. (V5) and (V6) involves both operations; they resemble the distributive

law. (V7) and (V8) concern scalar multiplication only. (V7) resembles the associative law

and (V8) is a rule for normalizing this operation.

We can derive elementary properties of vector space operations from (V1)–(V8) such as

(1) a0 = 0. (Here 0 is the zero vector given in (V3).)

(2) 0v = 0.

(3) If av = 0, then either a = 0 or v = 0.

(4) (−1)v = −v. (Here −v is the negative of v described in (V4).)

41

To prove (1), notice that from (V3) we have 0+0 = 0 (u+0 = u holds for u = 0). Hence

a0 = a (0 + 0) = a0 + a0

by (V6). Adding −(a0) to both sides and apply the associative law (V2) on the right-handside, we obtain 0 = a0, giving (1). The proof of (2) goes in the same manner:

0v = (0 + 0)v = 0v + 0v

by (V5), and then add −(0v) to both sides. The proof of (3) is a bit more subtle. In casea = 0, there is nothing to prove. So we may (and we do) assume that a �= 0. Thus it islegitimate to consider a−1 and multiply both sides of av = 0 by a−1. Thus

a−1(av) = a−10 = 0

in view of (1) that we have proved a minute ago. On the other hand, by (V7),

a−1(av) = (a.a−1)v = 1v,

which is v by (V8). Hence v = 0, as desired. Assertion (4) says (−1)v is the negative ofv. This means, according to (V4), v + (−1)v = 0. The last identity can be checked asfollows:

v + (−1)v = 1v + (−1)v = (1 + (−1))v = 0v = 0.

Hence (4) is valid.

We only consider natural examples of vector spaces. But we have to be aware that

weird looking vector spaces do exist. Here is a example of weird spaces suggested by special

relativity. Let V consist of real numbers v strictly between −1 and 1: −1 < v < 1. We use⊕ and ⊙ to indicate the two basic operations in V defined as follows to distinguish themfrom the usual addition and multiplication of real numbers: for u, v ∈ V and a ∈ R,

u⊕ v = u+ v1 + uv

, a⊙ u = (1 + u)a − (1 − u)a

(1 + u)a + (1 − u)a .

For checking (V1) to (V8), you need to have a large piece of paper, write small, and a lot

of patience.

Appendix B*: Fields

Besides R and C, we briefly mention other fields. First, the smallest field in R is the

field of all rational numbers, denoted by Q. Recall that a rational number is a number

42

which can be written as a fraction of integers, that is, it can be expressed as m/n, where

m and n are integers with n �= 0. Between Q and C there are many so called algebraicnumber fields. A simple example of algebraic number fields is Q(

√2), consisting of

numbers of the form a + b√

2, where a and b are rational numbers. Algebraic number

theory is a big industry having many mathematicians as workers, so to speak. Beyond

C there is the field of rational functions. Recall that a rational function is a function

which can be expressed as p(x)/q(x), where p(x) and q(x) are polynomials. Going further

are so called algebraic function fields. They can be defined by pure algebraic means via

“field extensions” or by analytic means so that they appear as the field of “meromorphic

functions on Riemann surfaces”.

Now we describe finite fields, which find many practical use in recent years, such as

coding and cryptography. The simplest finite field is F2, which consists of two elements

denoted by 0 and 1. The addition and multiplication in F2 are given by 0 + 0 = 0,

0 + 1 = 1 + 0 = 1, 1 + 1 = 0, 1 × 0 = 0 × 1 = 0 × 0 = 0, 1 × 1 = 1. More generally,for any prime number p, we can define the finite field Fp, which has p elements. Every

finite field can be written as Fq, where q = pn, where p is a prime number and n is a

positive integer. An interesting aspect of the theory is, Fq can be regarded as a linear

space over Fp and the basic theory of linear algebra is a good guide to the understanding

the finite field Fq. Because of the multiplication operation of a finite field, each element

in Fq naturally associates with a linear operator on this vector space and a large portion

of operator theory in linear algebra applies.

A fancy way to do number theory is to introduce the so–called “valuation fields”,

such as the field of p-adic numbers. Mathematically it is one of the most fascinating and

challenging research area. It seems to me that, though greatly respected, practically it is

completely useless. However, some experts in number theory start to speculate its link to

some deepest myths of our universe.

Appendix C*: Converting higher order linear ODEs to systems of first order ODEs

In Example 3.1.4 we consider higher order equations of the form

y(n) + an−1y(n−1) + · · · + a2y(2) + a1y′ + a0y = f(t) (3.1.5)

Introduce the new functions y1 = y, y2 = y′, . . . , yn = y

(n−1). Then we have

y′1 = y2, y′2 = y3, . . . , y

′n−1 = yn,

y′n = −a0y1 − a1y2 − · · · − a(n−1)yn.

43

We can rewrite (3.1.5) as y′ = Ay + f , where

A =

0 1 0 0 · · · 00 0 1 0 · · · 00 0 0 1 · · · 0

0 0 0 0 · · · 1−a0 −a1 −a2 −a3 · · · −an−1

. (3.1.7)

with

y =

y1y2...yn

=

yy′

...y(n−1)

, and f =

0...0f

(As we shall see, matrixA has many interesting properties.) In practice there is no apparent

advantage of converting a higher order equation into a system of first order equations. But,

in developing the theory, a system of first order equations is easier to deal with than a

higher order equation.

In the same vein we can convert a higher order linear difference equation to a system

of first order linear difference equations.

44

chapter i linearity: basic concepts and examplesckfong/la1.pdf · linearity: basic concepts and...

Documents