chapter i linearity: basic concepts and examplesckfong/la1.pdf · linearity: basic concepts and...

44
CHAPTER I LINEARITY: BASIC CONCEPTS AND EXAMPLES In this chapter we start with the concept of general linear spaces with elements in it called vectors, for “setting up the stage”. Then we introduce “actors” called linear mappings, which act upon vectors. In the mathematical literature, “vector spaces” is synonymous to “linear spaces” and these words will be used exchangeably. Also, “linear transformations” and “linear mappings” or simply “linear maps”, are also synonymous. §1. Linear Spaces and Linear Maps 1.1. A vector space is an entity containing objects called vectors. A vector is usually conceived to be something which has a magnitude and direction, so that it can be drawn as an arrow: You can add or subtract two vectors: You can also multiply vectors by scalars: Such things should be familiar to you. However, we should not be so narrow-minded to think that only those objects repre- 1

Upload: others

Post on 24-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • CHAPTER I

    LINEARITY: BASIC CONCEPTS AND EXAMPLES

    In this chapter we start with the concept of general linear spaces with elements in

    it called vectors, for “setting up the stage”. Then we introduce “actors” called linear

    mappings, which act upon vectors. In the mathematical literature, “vector spaces” is

    synonymous to “linear spaces” and these words will be used exchangeably. Also, “linear

    transformations” and “linear mappings” or simply “linear maps”, are also synonymous.

    §1. Linear Spaces and Linear Maps

    1.1. A vector space is an entity containing objects called vectors. A vector is usually

    conceived to be something which has a magnitude and direction, so that it can be drawn

    as an arrow:

    You can add or subtract two vectors:

    You can also multiply vectors by scalars:

    Such things should be familiar to you.

    However, we should not be so narrow-minded to think that only those objects repre-

    1

  • sented geometrically by arrows in a 2D or 3D space can be regarded as vectors. As long

    as we have a collection of objects among which two algebraic operations called addition

    and scalar multiplication can be performed, so that certain rules of such operations are

    obeyed, we may regard this collection as a vector space and call the objects in this col-

    lection vectors. Our definition of vector spaces should be so general that we encounter

    vector spaces almost everyday and almost everywhere. Examples of vector spaces include

    many spaces of functions, spaces of polynomials, spaces of sequences etc. (in addition to

    the well-known 3D space in which vectors are represented as arrows.) The universality

    and the omnipresence of vector spaces is one good reason for placing linear algebra in the

    position of paramount importance in basic mathematics.

    1.2. After this propaganda, we come to the technical side of the definition. We start

    with a collection V of objects called vectors, designated by block letters u, v, etc. Suppose

    that V is equipped with two algebraic operations:

    1. Addition, allowing us to add two vectors u and v to obtain their sum u + v.

    2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av.

    Then V will be legitimately called a vector space if they obey a set of “natural” axioms.

    The complete set of axioms will be listed in Appendix A at the end of the present chapter.

    There is no need to memorize them, but here we mention some to show that they are

    indeed very natural.

    u + v = v + u (addition is commutative)

    u + (v + w) = (u + v) + w (addition is associative)

    u + 0 = u a(u + v) = au + av

    We are a bit vague about “scalars” in the above definition. What are scalars? The

    obvious answer is: they are numbers. But what sort of numbers are they?

    If we allow scalars to be complex numbers, then we say that V is a complex vector

    space, or a vector space over (the complex field) C. If we restrict scalars to real numbers,

    then V is called a real vector space, or a vector space over (the real field) R.

    More generally, scalars are taken from something called a field. If the “field of scalars”

    is denoted by F, we call V a vector space over F. In recent years, finite fields become an

    important subject due to its vast applications such as cryptography and coding theory.

    We only briefly describe about fields other than R and C in Appendix B at the end of the

    present chapter. In this course we only consider vector spaces over R or C.

    2

  • 1.3. Examples. The best way to understand the concept of vector spaces is to go

    through a large number of examples and work on them in the future.

    Example 1.3.1. Rn, a real vector space.

    The vectors in Rn are n-tuples of real numbers. A typical vector may be written as

    v = (v1, v2, . . . , vn),

    where v1, v2, . . . , vn are real numbers. Two n-tuples are equal only when their correspond-

    ing components are equal: for u = (u1, . . . , un) and v = (v1, . . . , vn) in Rn, u = v only

    when u1 = v1, u2 = v2, . . . , un = vn. The addition and scalar multiplication in this space

    are defined in the componentwise manner: for u = (u1, u2 . . . , un), v = (v1, v2, . . . , vn)

    in Rn and a in R (that is, a is a real number),

    u + v = (u1 + v1, u2 + v2, . . . , un + vn), au = (au1, au2, . . . , aun).

    These two algebraic operations are quite simple and natural. Notice that, when n = 1, we

    have the vector space R1, which can be identified with R. This shows that R itself can

    be considered as a real vector space.

    Example 1.3.2. Cn, a complex vector space.

    The space Cn consists of all n-tuples of complex numbers. Everything in this space does

    in the same way as the space Rn of the previous example: all we have to do is replace real

    scalars by complex numbers. Sometimes we would like to make a statement for both spaces

    Rn and Cn. To avoid repetition, we use the letter F for R or C, or even a general field of

    scalars. We write Fn for the space of all n–tuples of scalars in F. Using the identification

    of F1 with F, we see that scalars can be regarded as vectors, if we wish.

    Example 1.3.3. Mmn(F) (the space of m× n matrices over F.)

    A vector in Fn is formed by arranging n scalars in a row. A variant of Fn is to form a

    vector by arranging mn scalars in a m×n matrix. Denote by Mmn(F) the set of all m×nmatrices with entries in F. With the usual addition and scalar multiplication, Mmn(F)

    is a vector space. A vector in this space is actually a matrix. In the future, to simplify

    our notation, we will write Mmn for Mmn(F), if which field F we are working with is

    understood.

    Example 1.3.4. Direct product.

    This example tells us how to construct a new vector space from given vector spaces. Let

    V1, V2, . . . , Vn be vector spaces over the same field F. Consider the set V of all n-tuples

    3

  • of the form v = (v1,v2, . . . ,vn) where v1 is a vector in V1, v2 as a vector in V2, etc.

    The addition and the scalar multiplication for V is defined in the same fashion as the

    corresponding operations of Rn described in Example 1.3.1: for u = (u1,u2, . . . ,un) and

    v = (v1,v2, . . . ,vn) in V , and for a ∈ F,

    u + v = (u1 + v1, u2 + v2, . . . . . . , un + vn)

    au = (au1, au2, . . . . . . , aun).

    This space V is a generalization of the previous example of Fn by replacing numbers in

    the entries of n-tuples by vectors. If we take V1 = V2 = · · · = Vn = F, then we recover Fn.The vector space V constructed in this way is called the direct product of V1, V2 . . . , Vnand the usual notation to express this is V = V1 × V2 × · · · × Vn, or V = Πnk = 1Vk.

    Example 1.3.5. Space of functions.

    This example is a space of functions with a common domain, say X (some nonempty set).

    A function f on X is just a way to assign to each point x in X a value denoted by f(x).

    This value could be a scalar or a vector. Since scalars can be regarded as vectors, so it

    is enough to consider the vector–valued case. Take any vector space V with F as its field

    of scalars. Denote by F(X,V ) the set of all functions f from X to V : f assigns to eachpoint x in X to a vector denoted by f(x) in V . Given “vectors” (which are functions in

    this case) f and g in F(X,V ), the sum f + g and the scalar multiple af of f a scalar aare formally defined as follows

    (f + g)(x) = f(x) + g(x), (af)(x) = a.f(x), x ∈ X.

    If we treat x as a variable symbol so that f is written as f(x), then the sum of “vectors”

    f(x) and g(x) in F(X,V ) is simply f(x) + g(x). In the special case with V = C and

    X = N = {0, 1, 2, 3, . . .},

    each f ∈ F(N,C) represents a sequence (of complex numbers):

    {f(n)}n≥0 ≡ (f(0), f(1), f(2), · · · )

    Conversely, each sequence {an}n≥0 of complex numbers determines a function f on N,given by f(n) = an. Thus we can identify the space F(N,C) with the space of sequences,which will be denoted by S. For a = {an}n≥0 and b = {bn}n≥0 in S, we define thesum a+ b and the scalar multiple λa of a by a complex number λ as follows:

    a+ b = (a0, a1, a2, . . . ) + (b0, b1, b2, . . . ) = (a0 + b0, a1 + b1, a2 + b2, . . . )

    λa = λ(a0, a1, a2, . . . ) = (λa0, λa1, λa2, . . . )

    4

  • We will need the sequence space S to study difference equations and recursion relations.

    Example 1.3.6. The space of all polynomials.

    We specialize the previous example by taking X = C, the complex plane, and V = C,

    considered as a complex vector space. In some sense, the space F(C,C) is too big. Weshould look at something smaller inside. Recall that a polynomial is a function p on R

    which can be written in the form

    p(x) = a0 + a1x+ a2x2 + a3x

    3 + · · · + anxn,

    where a0, a1 . . . , an are certain complex numbers and n is certain positive integer. It is clear

    that the sum of two polynomials is also a polynomial, and scalar multiples of polynomials

    are polynomials. Thus, if we denote by P the set of all polynomials, we can define thesum of two polynomials and a scalar multiple of a polynomial in the same way as we do

    for functions to give a linear structure to P.

    1.4. A smaller vector space “sitting” inside a bigger one, such as P in F(C,C) inthe last example, is a very common phenomenon. To describe this in formal language we

    introduce the following:

    Definition 1.4.1. A (nonempty) subsetM of a vector space V satisfying the following

    condition is called a subspace of V :

    (S) For all u and v in M , and for all scalars a and b, au + bv are in M .

    According to this definition, P is a subspace of F(C,C). A subspace of some vector spaceis a vector space on its own right.

    Example 1.4.2. Pn, the space of polynomials of degree ≤ n.

    Recall that a polynomial of degree n is a function of the form

    p(x) = a0 + a1x+ a2x2 + · · · + anxn (P )

    with an �= 0. If we drop the condition an �= 0 here, we cannot tell the exact degree ofp(x) – all we can say is that the degree of p(x) is at most n. Denote by Pn the subset of

    P consisting of polynomials of degrees at most n, that is, polynomials of the form givenin (P ). It is clear that condition (S) in the definition of subspace above is satisfied for

    M = Pn and V = P. Therefore Pn is a subspace of P and hence itself is a vector space.

    1.5. Now we consider the concept of linear mappings, or linear transformations

    By a mapping (or simply a map) or a transformation we usually mean a way of sending

    5

  • objects from one space into another. A transformation T from one vector space V to

    another W (over the same field) is linear if the following identity holds:

    T (αx + βy) = αTx + βTy (LT )

    for all vectors x and y in V and all scalars α and β. Notice that the author deliberately

    use letters x, y instead of u, v as vectors, and α, β instead of a, b for scalars, in order to

    broaden the scope of our notation. Also, following the usual custom in linear algebra, we

    omit the round brackets in T (x) and simply write T (x) as Tx.)

    The linearity condition (LT ) can be split into two:

    T (x + y) = Tx + Ty (LT1)

    for all x and y in V , and

    T (αx) = αTx (LT2)

    for all vectors x in V and for all scalars α. Identity (LT1) is a special case of (LT ) obtained

    by setting α = β = 1 in (LT ). It says: the transformation T preserves the operation of

    addition. The following figure helps to illuminate its meaning:

    Identity (LT2) is another special case of (LT ) obtained by setting β = 0. It says: T

    preserves the scalar multiplication. You should draw a figure of arrows similar to the

    above one to clarify its meaning. Condition (LT ) can be replaced by conditions (LT1) and

    (LT2) together, (although we don’t see any advantage of doing so). Indeed, we can check

    that (LT ) is a consequence of this pair of conditions:

    T (αx + βy) = T (αx) + T (βy) (by (LT1))

    = αTx + βTy (applying (LT2) twice.)

    By mathematical induction, we can establish the following extension of (LT ):

    T (α1x1+α2x2+ · · · +αnxn) = α1Tx1+α2Tx2+ · · · +αnTxn

    6

  • We note that (LT ) is just the case n = 2 written in a different way.

    1.6. Examples of linear transformations are so many that you can find them almost

    everywhere, almost any time. Here we consider a few.

    Example 1.6.1. Sampling functions

    Let FS be a linear space of functions (allowed to take complex values) defined on a setS. Pick some points s1, s2, . . . , sn in S as “observation sites”. The observed values of a

    function in FS are arranged in a row as a vector in Cn and is denoted by Tf :

    Tf = (f(s1), f(s2), . . . , f(sn)).

    Then the transformation T sending f (from FS) to Tf (in Cn) is linear. To show thelinearity of T , we need to check the identity T (af + bg) = aTf + bTg for all f , g in FSand scalars a, b. The left hand side of this identity is, according to the definition of T ,

    T (af + bg) = ((af + bg)(s1), (af + bg)(s2), . . . , (af + bg)(sn))

    = (af(s1) + bg(s1), af(s2) + bg(s2), . . . , af(sn) + bg(sn))

    = a(f(s1), f(s2), . . . , f(sn)) + b(g(s1), g(s2), . . . , g(sn))

    which is aTf + bTg, that is, the right hand side. This proves the linearity of T .

    Example 1.6.2. Tax = x · a

    Recall that the inner product (or the scalar product, or the dot product) of two vectors

    x = (x1, x2, · · · , xn) and y = (y1, y2, · · · , yn) in Rn is given by

    x · y =∑n

    k = 1xkyk (= x1y1 + x2y2 + · · · + xnyn).

    Fix an arbitrary vector a in Rn. Then the transformation Ta from Rn to R sending a vector

    x in Rn to x · a is linear. To prove this, we have to check Ta(αx + βy) = αTax + βTay.This can be done as follows:

    Ta(αx + βy) = (αx + βy) · a =∑n

    k = 1(αxk + βxk)ak

    = α∑n

    k = 1xkak + β

    ∑n

    k = 1ykak = α x · a + β y · a = αTax + βTay.

    Aside: As you can see, checking things like this is rather routine. The important thing to

    learn is: how to present it neatly and correctly.

    Example 1.6.3. Let a be a fixed vector in R3. Then the transformation Ca from R3

    into R3 itself given by Ca(x) = x×a (the cross product of x and a) is linear. The linearityof Ca can be proved in the same fashion as that of the previous examples.

    7

  • Example 1.6.4. A differential operator

    Recall that Pn stands for the space of all polynomials of degrees at most n. For p ∈ Pn,let Dp = dp

    dx, the derivative of p. Then D : Pn → Pn is linear. Why? Well, the linearity

    of D is manifested by the identity D(ap + bq) = aDp + bDq, where p, q ∈ Pn and a, bare scalars. But this identity is nothing but another way to write down the well-known

    equalityd

    dx(ap(x) + bq(x)) = a

    d

    dxp(x) + b

    d

    dxq(x).

    Example 1.6.5. Translation

    Same space as the previous example: Pn. Let h be a fixed real number. Consider the

    transformation T : Pn → Pn sending a polynomial p(x) ∈ Pn to the polynomial p(x+ h).Here p(x+ h) is of course obtained by replacing x in p(x) by x+ h. In other words, it is p

    evaluated at x+h. For instance, if p(x) is 1+x+x3 and h = 1, the T (p) is the polynomial

    1 + (x+ 1) + (x+ 1)3 = 1 +x+ 1 + (x3 + 3x2 + 3x+ 1) = 3 + 4x+ 3x2 +x3. We claim that

    T is linear. To prove this, we have to verify T (ap+ bq) = aTp+ bTq. What is T (ap+ bq)?

    It is ap + bq evaluated at x + h, namely, ap(x + h) + bq(x + h). Now Tp and Tq are the

    polynomials p(x+h) and q(x+h) respectively, so aTp+ bTq is also ap(x+h) + bq(x+h).

    Hence T is linear.

    Example 1.6.6. Shift S

    This is the operator denoted by S on the space S of sequences, defined by

    S(a0, a1, a2, . . . ) = (a1, a2, a3, . . . ).

    If we write a = (a0, a1, a2, . . . ), then (Sa)n = an+ 1. In many books on difference

    equations, the last equality is written as San = an + 1. Strictly speaking, this is incorrect.

    But we accept this and interpret it as the correct one, namely (Sa)n = an + 1. It is not

    hard to check that S is indeed linear.

    “Monexample” 1.6.7. Let V = P, the space of all polynomials, and let T be thetransformation from V into itself, sending a polynomial to its square: Tp = p2. Then T is

    not linear. One way to see this is by noticing T (−x) = (−x)2 = x2. If T were linear, thenT (−x) should be −T (x) = −x2, instead of x2. Aside: This is by no means the only wayto prove that T given here is nonlinear. You may also argue that T (2 · 1) = T (2) = 22 = 4,which is not the same as 2T (1) = 2. Another argument: T (1+x) = (1+x)2 = 1+2x+x2,

    which is not the same as T (1) + T (x) = 1 + x2. There are many ways to tell that T is not

    linear. One is as good as others. Don’t waste your time by presenting more than one.

    8

  • We should mention a term widely used in scientific literature: linear operator. By

    a linear operator, or simply an operator, on a vector space V we mean a linear transfor-

    mation from V into V itself. Thus Examples 1.6.3, 1.6.4, 1.6.5 and 1.6.6 above are linear

    operators. Another term also widely used: linear functional, (or covector in physics

    and engineering literature). If V is a vector space over F, by a linear functional of V we

    mean a linear transformation from V into F1 ≡ F. Thus a linear functional of V is afunction φ : V → F such that

    φ(a1v1 + a2v2) = a1φ(v1) + a2φ(v2)

    for all v1,v2 ∈ V and a1, a2 ∈ F. The mapping Ta in Example 1.6.2 is an example of alinear functional of Rn.

    1.7. Given vector spaces U and V over the same field, we denote by L (U, V ) theset of all linear mappings from U to V . For S, T in L (U, V ) and a scalar a, we define thesum S + T and scalar multiple aS by putting

    (S + T )x = Sx + Tx, (aS)x = aSx. x ∈ U (1.7.1)

    We check that both S + T and aS are linear. To simplify our presentation, we let R =

    aS + bT and check its linearity. Take u1,u2 ∈ U and scalars c1 and c2. We have to showR(c1u1 + c2u2) = c1Ru1 + c2Ru2. Indeed

    R(c1u1 + c2u2) = (aS + bT )(c1u1 + c2u2)

    = a S(c1u1 + c2u2) + b T (c1u1 + c2u2) [because of (1.7.1)]

    = a(c1Su1 + c2Su2) + b(c1Tu1 + c2Tu2) [because S, T are linear]

    = c1(aSu1 + bTu1) + c2(aSu2 + bTu2) = c1Ru1 + c2Ru2.

    With addition and scalar multiplication defined here, L (U, V ) becomes a vector space.

    According to our notation introduced above, given a vector space V , the symbol

    L (V, V ) stands the set of all linear operators on V . This symbol looks a bit clumsy andhence we will rewrite it simply as L (V ). On the other hand, L (V,F), the set of all linearfunctionals, will be denoted by V ′, (or V ∗ in some books), called the dual space of V . We

    summarize our notation here:

    L (U, V ) =the set of all linear transformation from U to V.L (V ) =the set of all linear operators on V.V ′ =the set of all linear functionals of V = the dual space of V.

    9

  • We have seen that L (U, V ) is a vector space under some natural way to define additionand scalar multiplication. A fortiori, L (V ) and V ′ are vector spaces. Actually, L (V ) ismore than just a vector space. Besides addition and scalar multiplication, it has a third

    operation: composition. The composite, or the product ST , of operators S, T ∈ L (V ), isdefined by putting

    (ST )v = S(Tv). (1.7.2)

    Aside: To apply ST to a vector v, we apply T to v first to get Tv, followed by applying

    S. Writing (1.7.2) as (ST )(v) = S(v)T (v) is a horrendous mistake.

    To give quick examples: let V = P, the space of all polynomials, and let D,M,Ton V be given by D(p(x)) = p′(x) ≡ d

    dxp(x), M(p(x)) = xp(x) and T (p(x)) = p(x+1).

    Then (MD)(p(x)) = M(p′(x)) = xp′(x), (DM)(p(x)) = D(xp(x)) = p(x) + xp′(x),

    (TM)(p(x)) = T (xp(x)) = (x + 1)p(x + 1), (MT )(p(x)) = M(p(x + 1)) = xp(x + 1),

    (DT )(p(x)) = D(p(x+ 1)) = p′(x+ 1) and (TD)(p(x)) = T (p′(x)) = p′(x+ 1).

    As you can see, MD is not the same as DM (also MT �= TM). Thus, in general, theproduct ST and TS are different. In case ST and TS are the same, i.e. ST = TS, then

    we say T, S commute, or T commutes with S. For example, the operators D and T

    given above commute.

    To justify our definition of the product ST for S, T ∈ L (V ) given above, we mustcheck its linearity. Again, it is a routine matter. For v1,v2 ∈ V and a1, a2 ∈ F,

    (ST )(a1v1 + a2v2) = S(T (a1v1 + a2v2)) = S(a1Tv1 + a2Tv2)

    = a1S(Tv1) + a2S(Tv2) = a1(ST )v1 + a2(ST )v2.

    We have the following elementary properties concerning three operations (addition, scalar

    multiplication, composition) among operators: for R, S, T ∈ L (V ) and scalar a,

    (RS)T = R(ST )

    R(S + T ) = RS +RT

    (R+ S)T = RT + ST

    a(ST ) = (aS)T = S(aT ).

    They are verified in a routine manner, e.g., to show (R+ S)T = RT +RS, we compute:

    ((R+ S)T )x = (R+ S)(Tx) = R(Tx) + S(Tx)

    = RTx + STx = (RT + ST )x.

    10

  • There are two special linear operators on V worth mention: the zero operator O and the

    identity operator I: O sends every vector to the zero vector and I sends every vector to

    itself, that is, for all v ∈ V , Ov = 0 and Iv = v.

    We say that a linear mapping T from a vector space V to a vector space W is invertible

    if there is a linear mapping S from W to V such that ST = IV and TS = IW . Here IVstands for the identity operator on V . In the future we simply write I for IV so that the

    identities ST = IV and TS = IW becomes ST = I and TS = I. The linear map S in this

    case is uniquely determined by T and will be denoted by T−1. Thus ST = I and TS = I

    become T−1T = I and TT−1 = I. To give a quick example, consider the linear operator

    T on the space P of all polynomials given by T (p(x)) = p(x + h), where h is a constant.Then T is invertible and its inverse T−1 is given by T−1(p(x)) = p(x− h). The followingfact is basic:

    Proposition 1.7.1. If linear operators S and T on a vector space V are invertible,

    then ST is also invertible with (ST )−1 = T−1S−1.

    To prove this, we check directly that T−1S−1 is the inverse of ST :

    (ST )(T−1S−1) = STT−1S−1 = SIS−1 = SS−1 = I,

    (T−1S−1)(ST ) = T−1S−1ST = T−1IT = T−1T = I.

    The order reversing of S and T in the identity (ST )−1 = T−1S−1 has the following analogy

    attributed to a famous mathematician named Herman Weyl: we put on socks first before

    putting on shoes; but taking them off, we remove shoes first.

    11

  • EXERCISE SET I.1.

    Review Questions: Do I grasp the basic concepts of vector spaces and linear mappings?

    Am I able to write down properly the formal definitions of linear mappings, linear operators

    and linear functionals (as a professional mathematician does, at which I would not be

    embarrassed if it were in print)? What are the major examples of vector spaces and linear

    maps described in this section? Do I recognize the following symbols and understand

    perfectly well what they stand for?

    R3, C2, F5, M4,5, P3, f + g, 2f − 3g.L (V,W ), S + T , ST , ST − TS, L (V ), V ′

    Why do we need the abstract conceptual framework of vector spaces and linear mappings?

    What would we miss if we restrict ourselves to the standard spaces Rn and Cn instead of

    working with this abstract notion?

    Drills

    1. Write down the general form of a vector in each of the following vector spaces:

    (a) F4 (F = R or C), (b) P2, (c) P4, (d) R3 × P1.

    2. Find u + (−2)v, where u,v ∈ V , in each of the following cases:(a) V = C2, u = (2 + 3i, 3 − 2i) and v = (1 + i, 1 − i).(b) V = P2; u and v are polynomials 1 − 2x+ x2 and 1 + x− x2 respectively.(c) V = F(R,C); u and v are functions cosx+ i sinx and cosx− i sinx respectively.

    3. True or false:

    (a) A set {0} consisting of a single element 0 with addition and scalar multiplicationdefined in the following way is a vector space: 0 + 0 = 0, a.0 = 0.

    (b) All polynomials of the form a+(1+a)x+bx2 form a vector space under the usual

    addition and scalar multiplication.

    (c) P2 is a subspace of P3.

    (d) If U is a subspace of V and if V is a subspace of U , then U = V .

    (e) If U is a subspace of V and if W is a subspace of U , then W is a subspace of V .

    4. Let p(x) = x2 − x+ 2 and q(x) = 2x2 + 1. Find: p(x+ 1), p(1), p(q(x)), xp(x), p(x)2,p(x2), (p+ q)(x), q(x+ p(x)).

    5. In each of the following cases, is the given transformation from one vector space into

    another linear? Why? (Questions like this should be answered in the following way:

    12

  • if your answer is “Yes”, then you have to prove it. If your anwer is “No”, you should

    disprove it by pointing out one instance where the linearity fails; (only one is needed

    — don’t waste your time to find or write down another.)

    (a) T : P → P given by T (p(x)) = p(x2).(b) T : P → P given by T (p(x)) = p(x)2.(c) T : P → R given by T (p(x)) = p(1); (here, of course, p(1) stands for the value of

    p at 1.)

    (d) T : P → R given by T (p(x)) = p(1) + 1.(e) The map T : M2,2 → M3,3 (Reminder: Mm,n stands for the vector space of all

    m × n matrices) given by T (X) = AXB, where A and B are fixed matrices ofsizes 3 × 2 and 2 × 3 respectively.

    (f) The map T : M2,2 → M2,2 given by T (X) = X + A, where A is a fixed 2 × 2matrix. (♠ Caution: Be careful about the way you put down your answer.)

    (g) T : M2,2 →M2,2 given by T (X) = X2.(h) V is a vector space and v is a fixed vector in V ; T : L (V ) → V is given by

    T (X) = X(v); (here, of course X ∈ L (V ), i.e. X is an operator on V , and X(v)is the vector in V obtained by applying X to v.)

    Exercises

    1. Let R, S, and T be linear operators on a vector space V . Simplify the following

    expressions:

    (a) R(S + T ) − S(T +R) − (R− S)T .(b) S(S−1T + TS−1) + (S−1T + TS−1)S + S−1(ST − TS) − (ST − TS)S−1. (S is

    assumed to be invertible.)

    (c) [[R, S], T ] + [[S, T ], R] + [[T,R], S]; (for linear operators P , Q on V , their commu-

    tator [P,Q] is defined to be the linear operator PQ−QP .)

    2. On the linear space P of all polynomials, define the operators M , D and U by puttingM(p(x)) = xp(x), D(p(x)) = p′(x) (the derivative of p(x)), U(p(x)) = p(x + 1).

    Compute

    (a) [M,D] (= MD −DM).(b) UMU−1 −M ; (notice that U−1(p(x)) = p(x− 1)).

    3. Let S and T be two linear operators satisfying the relation STS = S. (In this situation

    we may call T a generalized inverse of S.) Let P = ST , Q = TS and T0 = QTP .

    13

  • Verify that (a) P 2 = P , (b) Q2 = Q, (c) PS = S, (d) SQ = S, (e) ST0S = S, (f)

    T0 = TST and (g) T0ST0 = T0.

    4. In each of the following cases, find S2, T 2 ST and TS for S, T ∈ L (V ).(a) V = R2; S((x1, x2)) = (x2, x1), T ((x1, x2)) =

    12(x1 + x2, x1 + x2).

    (b) V = P1; S(p(x)) = p′(x), T (p(x)) = p(0).

    (c) V = M2,2 (the space of all 2 × 2 matrices); S(X) = AX , T (X) = XB, where

    A =

    [

    0 10 0

    ]

    , B =

    [

    0 00 1

    ]

    .

    Problems

    1*. Prove that, for linear operators S, T on a vector space V , if S, T and S + T are

    invertible, then S−1 + T−1 is also invertible.

    2*. (This is a very hard problem) Let S and T be linear operators on a vector space V .

    Prove that I − ST is invertible if and only if I − TS is invertible.

    14

  • §2. Linear Transformations and Matrices

    2.1. The main goal of the present section is to study the relation between linear

    mappings and matrices. We show that every m × n A matrix naturally gives rise to alinear mapping MA : F

    n → Fm and vice versa. Then we show that, given coordinatesystems in vector spaces V andW , every linear mapping T : V →W can be represented bya matrix. The device set up here tells us that matrices can be regarded as linear mappings

    and problems about linear mappings can often be reduced to problems in matrices.

    To understand linear transformations better, let us look at the simplest situation in

    which both the domain and the range are the one-dimensional space F1 ≡ F; here F iseither R or C. Let T : F → F be linear. Then, for every x ∈ F, Tx = T (x·1) = xT1.Certainly T1 ∈ F, i.e. T1 is just a scalar. Why don’t we give it a name—let’s call ita. Thus Tx = ax. On the other hand, it is easy to check that a transformation T from

    F to F of the form Tx = ax is linear, where a is a fixed scalar. We have shown that

    linear transformations from F to F are exactly those of the form T (x) = ax. This gives a

    thorough description of such transformations. Nothing more we can say!

    Once we succeed in tackling a special case, we should move to a more interesting

    general situation in which the domain of a given linear transformation is Fn and its range

    is in Fm. The problem now is to describe all such transformations. As you will see,

    the answer turns out to be similar to the one for our special case: the real number a is

    replaced by an m × n matrix, say A, and x is replaced by an n× 1 column vector x andthe “outcome” Tx is the m×1 column vector Ax. Thus linear transformations from Fn toFm are exactly those T which can be put in the form Tx = Ax, where A is a fixed m× nmatrix.

    2.2. But the above answer seems to have a serious conflict with the previous conven-

    tion: a vector in Fn is used to be a row, i.e. something like x = (x1, x2, . . . , xn) instead of

    a column:

    x =

    x1x2...xn

    . (2.2, 1).

    The rationale of our previous convention is clear: the column in (2.2.1) is awkward to write.

    Worse: its outstanding look draws unwarranted attention! This prompts us to adopt the

    following rule:

    Something in a row surrounded by the round brackets “(” and “)” is the same thing

    as the transpose of this row (which becomes a column) surrounded by the square

    15

  • brackets “[” and “]”, such as

    (cow, pig, dog, cat) =

    cowpigdogcat

    .

    In order to work under this rule, we have to be very careful about the brackets. This

    rule works well, but is too “brutal”. We try to avoid applying it directly as much as

    possible. Certainly we may regard the column (2.2.1) as the transpose of a row and put

    x = [x1 x2 · · · xn]⊤ or x = [x1, x2, . . . , xn]⊤ with commas for clarity, and sometimeswe shall do so in the future. But this still looks rather clumsy.

    2.3. We proclaim:

    Every matrix associates with a God-given linear transformation.

    In details, given a m× n matrix A with entries in the field F, we define a linear transfor-mation MA from F

    n to Fm simply by putting

    MAx = Ax (M)

    for all x = [x1, x2, . . . , xn]⊤ ∈ Fn. The transformation MA defined in this way may be

    called the multiplication by A. (That is why we choose the symbol MA to denote it.) The

    linearity of MA follows immediately from some well-known properties of matrix algebra:

    for all u and v in Fn and a, b ∈ F, we have

    MA(au + bv) = A(au + bv) = aAu + bAv = aMAu + bMAv.

    To give a quick example, suppose A =

    [

    2 34 5

    ]

    . Then

    MAx = MA

    [

    x1x2

    ]

    =

    [

    2 34 5

    ] [

    x1x2

    ]

    =

    [

    2x1 + 3x24x1 + 5x2

    ]

    .

    Hence MA is a linear operator on R2 sending (x1, x2) to (2x1 + 3x2, 4x1 + 5x2).

    The converse of the above proclamation is also true:

    Theorem 2.3.1. Every linear transformation T from Fn to Fm is of the form MAfor some m× n matrix A, i.e. there exists an m× n matrix A such that Tx = Ax for allx ∈ Fn. Furthermore, the matrix A here is uniquely determined by T .

    16

  • This theorem tells us that there is a one-to-one correspondence between the set

    L (Fn,Fm) of all linear transformations from Fn to Fm and the set Mm,n(F) of all m× nmatrices with entries in F:

    A ∈Mm,n(F) ←→ T = MA ∈ L (Fn,Fm).

    Before we embark on the proof of this theorem, let us take a look at the linear trans-

    formations Ta : Rn → R and Ca : R3 → R3 of Examples 1.6.2 and 1.6.3 in the previous

    section, defined by Ta(x) = a · x and Ca(x) = a × x respectively. According to theabove theorem, there are matrices A and B of sizes 1× n and 3× 3 respectively such thatTax = Ax and Cax = Bx. What are A and B? Well, let us write

    Tax = a1x1 + a2x2 + · · · anxn = [a1 a2 · · · an]

    x1x2...xn

    .

    Cax =

    a2x3 − a3x2a3x1 − a1x3a1x2 − a2x1

    =

    0 −a3 a2a3 0 −a1

    −a2 a1 0

    x1x2x3

    .

    Therefore A = [a1 a2 · · · an], while B is the skew-symmetric matrix given as

    B =

    0 −a3 a2a3 0 −a1

    −a2 a1 0

    .

    Notice that B⊤ = −B; (here B⊤ is the transpose of B).

    2.4. Now we turn to the proof of Theorem 2.3.1 above. There are two parts in the

    conclusion of the theorem: first, there is a matrix A with the property that Tx = Ax, and

    second, such a matrix is completely determined by T . The first part concerns the existence

    of A and the second part its uniqueness. To prove the existence part, we have to find A

    (which seems to be hiding somewhere.) To prove the uniqueness part, it is enough to find

    out in what way does T determines A. Normally we prove the existence part first, because

    normally we think: if it didn’t exist, what would be the point of proving (or even talking

    about) its uniqueness? However, logically they are independent entities and it doesn’t

    matter which comes first. Here we prove the uniqueness part first—this strategy of proof

    is at odds with our normal thinking. There are two reasons for taking this strategy. First,

    the uniqueness part is easier. Second, the proof of uniqueness part actually points out a

    way to find A. It helps us to figure out how to prove the existence part! (You may not be

    17

  • used to this unusual way of thinking. But people usually become smart when they begin

    to think in an unusual way.)

    Proof of Theorem 2.3.1. Suppose that Tx = Ax and we have to prove that A is

    uniquely determined by T . Let A1, A2, . . . , An be columns of A so that

    A = [A1 A2 · · · An].

    It is easy to check that, for x = (x1, x2, · · · , xn) ≡ [x1 x2 · · · xn]⊤ ∈ Rn, T (x) is given by

    Ax = [A1 A2 · · · An]

    x1x2...xn

    = A1x1 +A2x2 + · · · +Anxn. (2.4.1)

    Let us find the result of T acting on the standard basis vectors:

    e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1).

    Putting x = e1 = (1, 0, 0, . . . , 0) in (2.4.1), i.e. x1 = 1, x2 = 0, x3 = 0 etc., we obtain

    Te1 = A1. In the same way we can obtain Te2 = A2 etc. Now T completely determines

    Te1 = A1, Te2 = A2, . . . , Ten = An

    which in turn determines the matrix A = [A1 A2 · · · An].

    Next we prove the “existence part”. Let T : Fm → Fn be a linear transformation.We have to find a matrix A such that Tx = Ax for all x ∈ Fn. Let A be the m×n matrixwith Tek as its kth column (1 ≤ k ≤ n). In other words, A = [A1 A2 · · · An], whereAk = Tek for 1 ≤ k ≤ n. We have to check Tx = Ax in order to show that A obtained inthis way will do the job. Let us “recycle” the computation given in (2.4.1) and write

    Ax = A1x1 +A2x2 + · · · +Anxn := x1A1 + x2A2 + · · · + xnAn,

    (x1A1 is the usual way for a scalar multiple of a column vector, and A1x1 is the correct

    way x when x1 is regarded as a 1 × 1 matrix) from which it follows that

    Ax = x1Te1 + x2Te2 + · · · + xnTen= T (x1e1 + x2e2 + · · · + xnen) = Tx.

    Here we have used the linearity of T and the following elementary manipulation

    x1e1 + x2e2 + · · · + xnen = x1(1, 0, . . . , 0) + x2(0, 1, . . . , 0) + · · · + xn(0, 0, . . . , 1)= (x1, x2, . . . , xn) = x.

    18

  • Hence T = MA. The proof of Theorem A is complete. From the proof here, we see that

    Fact. The columns of an m × n matrix A are MAe1, MAe2, . . . , MAen, wheree1, e2, . . . , en are standard basis vectors.

    2.5. Now we use the fact stated at the end of the last subsection to decide the matrix

    of some linear operators on vector spaces of the form Fn.

    Example 2.5.1. Permutation matrices

    Let π be a permutation of the set [n] ≡ {1, 2, . . . , n}; in other words, π is a bijection of[n]. Consider the map Tπ : F

    n → Fn given by

    Tπ(x1, x2, . . . , xn) = (xπ(1), xπ(2), . . . , xπ(n)).

    Then it is routine to check that Tπ is linear. Hence, by Theorem A, there is a unique n×nmatrix P such that Tπx = Px, called the permutation matrix associated with π. Now

    we want to give a specific description of P . According to the fact stated above, the first

    column of P is given by Tπe1. For x = e1, we have x1 = 1, x2 = 0, x3 = 0 etc. Its image

    is Tπx = (xπ(1), xπ(2), . . . , xπ(n)), where the kth entry xπ(k) is 1 precisely when π(k) = 1,

    or k = π−1(1). Thus, except the π−1(1) entry, which is 1, all of the other entries of Tπe1is 0. Hence Tπe1 = eπ − 1(1). In the same way we have Tπe2 = eπ − 1(2), Tπe3 = eπ − 1(3) etc.

    So we have

    P = [eπ − 1(1) eπ − 1(2) · · · eπ − 1(n)].

    To give a specific example, let n = 3 and let π be given by π(1) = 2, π(2) = 3 and π(3) = 1.

    Then π−1(1) = 3, π−1(2) = 1, and π−1(3) = 2. Thus T (x1, x2, x3) = (x2, x3, x1) and the

    permutation matrix for Tπ is

    P = [eπ − 1(1) eπ − 1(2) eπ − 1(3)] = [e3 e1 e2] =

    0 1 00 0 11 0 0

    .

    Notice that each row and each column has exactly one entry equal to 1 and the rest entries

    are zeros. (Remark: Permutation matrices play an important role in doubly stochastic

    matrices, which are useful for establishing some matrix inequalities. It turns out that all

    doubly stochastic matrices form a convex set and the permutation matrices turn out to be

    exactly the so called “extreme points” of this convex set.)

    Example 2.5.2. Rotations

    For a fixed real number θ, let T ≡ Tθ be the operator on R2 sending an arbitrary vector

    19

  • v to Tθv is obtained by turning v through the angle θ in the anticlockwise direction:

    It is not too hard to see that Tθ is a linear operator on R2. The question is: what is the

    2 × 2 matrix inducing this linear operator? Let

    A =

    [

    a bc d

    ]

    .

    be the matrix inducing T ≡ Tθ: Tv = Av for all v ∈ R2. Letting e1 = (1, 0) (= [1, 0]⊤)and e2 = (0, 1), we have Te1 = (a, c) and Te2 = (b, d). On the other hand, from the

    following figure, we see that Te1 = (cos θ, sin θ), Te2 = (− sin θ, cos θ):

    We conclude that the matrix which induces Tθ is

    Aθ ≡[

    cos θ − sin θsin θ cos θ

    ]

    . (2.5.1)

    It is a priori clear that the result of rotating a vector by an angle α, followed by a rotation

    through an angle β, is the same as a single rotation with angle α + β. Putting this in

    mathematical symbols, we have TαTβ = Tα + β . Those matrices inducing operators in this

    identity have the same relation: Aα + β = AαAβ. From this relation and (2.5.1) above, it

    follows immediately that

    cos(α+ β) = cosα cosβ − sinα sinβ,sin(α+ β) = cosα sinβ + sinα cosβ,

    which are well-known (but by no means obvious) identities in trigonometry.

    20

  • 2.6. In this section we discuss a book-keeping device to label vectors by columns of

    numbers and to represent linear transformations by matrices, relative to some coordinate

    system, or more precisely, a basis. By means of such a device, a problem about vectors

    or linear transformations is converted to the corresponding problem about columns and

    matrices, which is in general easier to manipulate. Bases in linear algebra serve the same

    purpose as frames of reference in physics.

    Definition 2.6.1. An ordered set of vectors b1,b2, . . . ,bn in a vector space V is

    called a basis if each vector v in V can be written in a unique way as

    v = v1b1 + v2b2 + . . .+ vnbn (2.6.1)

    for some scalars v1, v2, . . . , vn. (Remark: Notice that there are two ingredients in this

    definition: the possibility to write (2.6.1) and the uniqueness of such an expression.)

    With a fixed basis B = {b1,b2, . . . ,bn} in a vector space, each vector v in V determines (ina unique manner) the scalars v1, v2, . . . , vn, via (2.6.1) above. These scalars are called the

    coordinates of v relative to the basis B. We will arrange them into a matrix with a singlecolumn, denoted by [v]B, and call it the column representation or the coordinate

    vector of v relative to B:

    [v]B = (v1, v2, . . . , vn) ≡ [v1 v2 . . . vn]⊤.

    The coordinates (or the column representation) of a vector depend on our choice of basis

    at the outset. The subscript B in [v]B emphasizes this dependence. For conveniencesometimes we drop this subscript and write [v] if our choice of basis is understood.

    Example 2.6.2. Standard examples of bases:

    1. The standard basis for Fn. The vectors

    e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1)

    form a basis of Fn, called the standard basis (or the natural basis, or the usual basis)

    of Fn. The kth vector ek in this basis, has 1 in the kth entry and 0 elsewhere. With

    respect to this basis, the column representation of a vector v = (v1, v2, . . . , vn) in Fn is

    [v] = [v1, v2, . . . , vn]⊤; (convince yourself this is the case.)

    2. The standard basis for Pn. The monomials

    1, x, x2, . . . , xn

    21

  • form a basis of Pn. With respect to this basis, the column representing a polynomial

    p(x) = a0 + a1x+ a2x2 + · · · + anxn in Pn is given by [ p ] = [a0, a1, a2, . . . , an]⊤.

    Example 2.6.3. (a) Find the column representation of (1, 3) in R2 relative to the

    basis B = {(1, 1), (1,−1)}. (b) Find the column representation of x3 relative to the basisE = {1, x− 1, (x− 1)2, (x− 1)3} in P3.

    Solution. (a) Suppose [(1, 3)]B = [a, b]⊤. Then we have (1, 3) = a(1, 1) + b(1,−1),

    which gives 1 = a+b, 3 = a−b. Thus a = 2, b = −1. So the answer is [(1, 3)]B = [2,−1]⊤.(b) Let [x3] E = (a0, a1, a2, a3). Then

    x3 = a0 + a1(x− 1) + a2(x− 1)2 + a3(x− 1)3.

    We have to find a0 to a3. There are several ways to do so. Here is one. Let y = x − 1.Then x = y+ 1 and (y+ 1)3 = a0 + a1y + a2y

    2 + a3y3. Now (y+ 1)3 = 1 + 3y+ 3y2 + y3.

    Thus 1 + 3y + 3y2 + y3 = a0 + a1y + a2y2 + a3y

    3. So, by comparing coefficients of powers

    of y, we have a0 = 1, a1 = 3, a2 = 3, a3 = 1. Thus [x3]E = (1, 3, 3, 1), which is our answer.

    Once we have a basis B = (b1,b2, . . . ,bn) in a vector space V over the field F, wecan define a linear mapping T from V to Fn (or simply write T : V → Fn) by puttingTv = [v]B. The linearity of T means that the identity

    [au + bv] = a[u] + b[v] (2.6.2)

    holds for all vectors u, v in V and all scalars a, b in F; (for simplicity, we write [v]

    for [v]B). To see this, write [u] = (u1, u2, . . . , un) and [v] = (v1, v2, . . . , vn). Then

    u = u1b1 + u2b2 + · · · + unbn and v = v1b1 + v2b2 + · · · + vnbn. Hence

    au + bv = a(u1b1 + u2b2 + · · · + unbn) + b(v1b1 + v2b2 + · · · + vnbn)= (au1 + bv1)b1 + (au2 + bv2)b2 + · · · + (aun + bvn)bn.

    and hence

    [au + bv] = (au1 + bv1, au2 + bv2, . . . , aun + bvn)

    = a(u1, u2, . . . un) + b(v1, v2, . . . vn) = a[u] + b[v].

    Notice that T is invertible. Its inverse T−1 simply sends any (u1, u2, . . . , un) in Fn to

    u = u1b1 + u2b2 + · · · + unbn in V .

    2.7. We have seen that, by introducing a basis to a vector space, we can “label” a

    vector in this space by a bunch of numbers arranged in a column, giving us the column

    22

  • representation of this vector. Next we explain how to “label” a linear mapping by a bunch

    of numbers arranged into a rectangular array — of course here I mean a matrix. Let us

    start with a linear transformation T from a vector space V to a vector space W . Suppose

    that V = {v1,v2, . . . ,vn} is a basis of V and W = {w1,w2, . . . ,wm} is a basis of W . Wecan use a matrix to represent T . This matrix is called the representing matrix of T ,

    or the matrix representing T , or just the matrix of T , relative to the bases V and W ,and is denoted by

    [T ]VW , or simply [T ].

    (The subscript W and the superscript V in [T ]VW emphasize the dependence of this matrixon these two bases and, when their presence is understood, we simply write [T ] for this

    matrix) Matrix [T ] is constructed column by column in the following way. To find its first

    column, we apply T to the first basis vector v1 in V to get Tv1. As Tv1 is a vector in W ,we can express it in a unique way as a linear combination of basis vectors in W , say

    Tv1 = t11w1 + t21w2 + · · · + tm1wm.

    The coefficients of this linear combination, namely, t11, t21 etc. will fill up the first column

    of [T ]. To find the second column of [T ], we apply T to v2 to get Tv2 in W and express

    it as a linear combination of vectors in W , say

    Tv2 = t12w1 + t22w2 + · · · + tm2wm.

    Then fill up the second column by coefficients of this linear combination. The other columns

    of [T ] are obtained in the similar fashion. Thus, we come up with

    [T ] =

    t11 t12 · · · · · · t1nt21 t22 · · · · · · t2n...

    ......

    tm1 tm2 · · · · · · tmn

    ,

    where the jth column of [T] (consisting of t1j , t2j , . . . , tmj) comes from

    Tvj = t1jw1 + t2jw2 + · · · + tmjwm.

    In other words, the jth column is just [Tvj ] W , the column representing Tvj relative to

    the basis W in W. In case V = W and V = W , T is a linear operator on V and the matrix[T ] ≡ [T ]VV is a square matrix. In this case we will write [T ]V instead of [T ]VV .

    Example 2.7.1. (a) Let T be the operator on V = P2 sending p(x) to p(x+ 1). Find

    the matrix [T ] of this operator relative to the standard basis B = {1, x, x2}. (b) Find the

    23

  • matrix of differential operator D defined on P2 by D(p) = p′ (the derivative of p), relative

    to the standard basis B. (c) Consider the linear map M from P1 to P2 defined by therecipe M(p(x)) = xp(x). In P1 we take the standard basis B = {1, x} but in P2 we takethe basis C = {1, 1 + x, (1 + x)2}. Find the matrix representation relative to these bases.

    Solution: (a) and (b) Since T (1) = 1, T (x) = x + 1 = 1 + x, T (x2) = (x + 1)2 =

    1 + 2x+ x2 and D(1) = 0, D(x) = 1, D(x2) = 2x, we have

    [T ]B =

    1 1 10 1 20 0 1

    , [D]B =

    0 1 00 0 20 0 0

    .

    (c) It is easy to get M(1) = x, M(x) = x2. But, in order to get the matrix [M ]BC , we

    have to write M(1) = x and M(x) = x2 as linear combinations of 1, 1 + x, (1 + x)2,

    in other words, we have to find out a0, a1, a2 and b0, b1, b2 in the following identities:

    x = a0+a1(1+x)+a2(1+x)2, x2 = b0+b1(1+x)+b2(1+x)

    2. Let s = 1+x so that x = s−1.Then x = −1 + s = −1 + (1 + x) and x2 = (s− 1)2 = 1− 2s+ s2 = 1− 2(1 + x) + (1 + x)2.The matrix [M ] ≡ [M ]BC can be read off from these identities:

    [M ] =

    −1 11 −20 1

    .

    Relative to a basis of V , for S, T ∈ L (V ), a, b ∈ F and v ∈ V , we have

    [aS + bT ] = a[S] + b[T ], [ST ] = [S][T ], [Sv] = [S][v]. (2.7.1)

    This tells us that the representation by matrices captures the essence of operators. But

    we defer the proof of this fact to Chapter 3 when the summation symbol∑

    will be

    systematically used.

    24

  • EXERCISE SET I.2.

    Review Questions: What is the meaning of “the linear transformation induced by a

    matrix”? What is the basic fact concerning such induced transformations? How to find

    the matrix which induces a given linear transformation (in principle)? Do I understand

    each of the following concepts?

    basis, column representation of vectors, matrix representation of operators

    Can I describe the procedure of finding column representation of vectors and matrix rep-

    resentation of linear transformations to any first year student?

    Drills

    1. Write down the matrices which induce the following operators on F3:

    (a) T , sending (x1, x2, x3) to (x1 + x2 − x3, x1 − x2, x2 + x3).(b) D, sending (x1, x2, x3) to (−x1, 2x2, x3).(c) R, sending (x1, x2, x3) to (x1 cos θ + x3 sin θ, x2, −x1 sin θ + x3 cos θ).(d) (with F = R) Q, sending (x1, x2, x3) to (1, 2, 3) × (x1, x2, x3).

    2. Find the 2 × 2 real matrix A such that its induced operator MA projects each vectorv in R2 orthogonally to the line passing through the origin and (1, 1), as indicated in

    the left figure below:

    3. Find the 3 × 3 matrix A such that MA is the 120o rotation in R3 about the axisthrough the origin and the point (1, 1, 1), indicated in the right figure above.

    4. In each of the following cases, write down the matrix A of the operator T ≡ MA onR2 satisfying the given conditions:

    (a) Te1 = (1, 1) and T2 = O.

    (b) Te1 = (2, 2) and T2 = T .

    (c) Te1 = (cos θ, sin θ) and T2 = I, where the given angle θ satisfies 0 < θ < π/2.

    25

  • 6. Give a basis for each of the following vector spaces:

    (a) P3, (b) R2, (c) C3, (d) P1 × P2, (e) M2,2.

    7. In each of the following cases, write down a basis for the kernel ker(T ) and the range

    T (V ) of the given linear transformation T :

    (a) T : P2 → P2, Tp = p′, the derivative of p.(b) T : P2 → P3, T (p(x)) = xp(x).

    (c) T : F2 → F2, T = MA, where A =[

    1 11 1

    ]

    .

    (d) T : R2 → P2, T ([a, b]⊤) = a+ bx2.

    (e) T : M2,2 →M2,2, T (X) = BX , where B =[

    1 01 0

    ]

    .

    (f) T : M2,2 →M2,2, T (X) = XB, where B is the same as the one in (e) above.

    (g) T : M2,2 →M2,2, T (X) = BX −XB, where B is the same as the one in (e).

    8. In each of the following cases, find the column representation of the vector v relative

    to the given basis B in V . (The justification of B to be a basis is not required.)(a) V = R2, v = (2, 3), B = {(1, 0), (1, 1)}.(b) V = C2, v = (2 + 2i, 0), B = {b1,b2} with b1 = (1 + i, 1 − i), b2 = (1 − i, 1 + i).(c) V = P1, v is the polynomial 2 + 3x and B = {1 + x, 1 − x}.(d) V = P2, v is 1 + x+ x

    2, B = {1 + x, 1 + x2, x+ x2}.

    (e) V = M2,2, v is

    [

    5 20 1

    ]

    and B ={[

    1 00 1

    ]

    ,

    [

    1 00 −1

    ]

    ,

    [

    0 10 0

    ]

    ,

    [

    0 01 0

    ]}

    .

    (d) V , W , B and C are the same as those in (c), T (p(x)) = p(x2).

    (e) V = W = M2,2, T (X) = AX −XA with A =[

    1 01 0

    ]

    , B and C are the standardbasis of M2,2:

    B = C ={[

    1 00 0

    ]

    ,

    [

    0 10 0

    ]

    ,

    [

    0 01 0

    ]

    ,

    [

    0 00 1

    ]}

    .

    Exercises

    1. Use Theorem 2.3.1 to describe (a) the correspondence between the vector spaces

    L (Fn,F) and M1,n, and (b) the correspondence between the vector spaces L (F,Fn)and Mn,1.

    26

  • 2*. Show that a 2 × 2 matrix A induces a rotation of R2 if and only if it is of the form

    A =

    [

    a −bb a

    ]

    with det(A) ≡ a2 + b2 = 1.

    Check that A−1 = A⊤.

    3*. According to special relativity, a boost is a linear operator on R2 induced by a matrix

    of the form [a bb a

    ]

    with

    a bb a

    ∣≡ a2 − b2 = 1 and a > 0.

    Show that the product of two boosts is a boost.

    4*. Let v = (a, b) be a fixed nonzero vector in R2. Find the matrix A which induces the

    projection P onto the one-dimensional subspace spanned by v, as indicated in the left

    figure below:

    5*. Let Lα be the line through the origin in the plane R2, so that the angle between

    Lα and the horizontal axis is α. (a) Find the 2 × 2 matrix which induces the mirrorreflection Tα about Lα indicated in the right figure above. (b) Show that the product

    TαTβ of two such reflections is a rotation Rθ. Find the angle θ of rotation in terms of

    α and β.

    6*. Let n = (n1, n2, n3) be a unit vector in R3 (|n|2 ≡ n21 + n22 + n23 = 1) and let H be

    the plane in R3 through the origin and perpendicular to n. Let Tn be the mirror

    reflection with respect to H, indicated in the following figure:

    (a) Show that Tnv = v − 2(v · n)n for all v ∈ R3. (b) Write down the matrix whichinduces this reflection. (c) Suppose that m = (m1,m2,m3) is another unit vector and

    27

  • Tm is the reflection defined in the similar fashion. Show that the vector w ≡ n ×mis invariant for Tm Tn, that is T m Tnw = w.

    7*. Find the 3×3 matrix A which induces the 60o rotation T in R3 about the axis throughthe origin and the point (1, 1, 1). Compute A2. Explain why your answer for A2 is

    the expected one.

    28

  • §3. Linear Equations

    3.1. Let V and W be vector spaces over the field F of scalars and let T : V →Wbe a linear mapping. Let b be a vector in W . Consider the equation

    Tx = b (3.1.1)

    We say that a vector v in V is a solution to this equation if Tv = b.

    Example 3.1.1. System of linear equations

    Let A be an m× n matrix over F and let b be a vector in Fm, say

    A =

    a11 a12 a1na21 a22 a2n...

    ......

    am1 am2 amn

    , b =

    b1b2...

    bm

    .

    Let T = MA, that is, the linear map T : Fn → Fm given by Tx = Ax. Then the

    equation (3.1.1), that is, Tx = b, is the following system of linear equations

    a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2

    · · · · · · · · · · · · · · · · · · · · · · · ·am1x1 + am2x2 + · · · + amnxn = bm

    (3.1.2)

    The reader is assumed to be familiar with some general method to solve this system.

    When the spaces V and W are finite dimensional, using coordinate systems, it is possible

    to convert a general equation (3.1.1) in this form. Thus, (3.1.1) in principle can be solved.

    Example 3.1.2. Interpolation Problem

    Let us briefly recall Example 1.6.1. Let FS be a linear space of functions (allowed to takecomplex values) defined on a set S and let s1, s2, . . . , sn be some selected points in S.

    Consider the linear transformation T : FS → Cn defined by

    Tf = (f(s1), f(s2), . . . , f(sn)).

    Given b = (b1, b2, . . . , bn), a solution to the equation Tf = b is a function f in FSsatisfying f(s1) = b1, f(s2) = b2, . . . , f(sn) = bn. Finding such a solution is called an

    interpolation problem.

    29

  • Example 3.1.3. Linear ordinary differential equations

    Consider the space V = F(R,C) of differentiable functions and the operator T on Vgiven by Ty = dy

    dt− ay, where a is a fixed complex number and t is the variable of the

    “unknown” function y. For any f in V , the equation Ty = f (which has the same form as

    (3.1.1)) is a linear ODE (ordinary differential equation):

    dy

    dt− ay = f(t). (3.1.3)

    To solve this, we take two steps. First, consider the corresponding homogeneous equation

    dy

    dt− ay = 0. (3.1.4)

    We can solve this equation by the method called separation of variables as follows. Multiply

    (3.1.4) by dt to rewrite it as dy − aydt = 0, or dy = aydt. Next, divide both sides byy to get dy/y = adt. Now, put in integral sign on each side to arrive at

    dy/y =∫

    adt.

    Thus we have ln y = at+ c, where c is a constant, or y = eat+ c = Ceat, where C = ec

    is also a constant. The expression

    y = Ceat (3.1.5)

    is the general solution of the homogeneous equation (3.1.4). The second step we take is

    to look for a special solution of (3.1.3) by using a method called variation of constants

    (or variation of parameters, according to some books). Following this method, we seek for

    a solution of (3.1.3) of the form y = u(t)eat, which is obtained from (3.1.5) by changing

    the constant C to a function u(t). For y = u(t)eat, we have

    dy

    dt− ay = u′(t) eat + u(t)(aeat) − au(t)eat = u′(t) eat.

    Thus, (3.1.3) becomes u′(t) eat = f(t), or u′(t) = e−atf(t), which gives u(t) =∫

    e−atf(t) dt.

    A particular solution of (3.1.3) is y(t) =(∫ t

    0e−asf(s) ds

    )

    eat =∫ t

    0ea(t−s)f(s) ds. The

    expression y = Ceat +∫ t

    0ea(t−s)f(s) ds is the general solution of (3.1.3), according to a

    general principle explained in the next subsection. Although this is a closed formula for

    solving (3.1.3), finding the integral on its right hand side may still cause some difficulty.

    In §3.5 we will describe a practical method for solving inhomogeneous equations.

    Example 3.1.4. Systems of linear ordinary differential equations

    We may extend equation (3.1.3) by changing a into a matrix A and y into a vector–valued

    function y. Let A = [aij ] be an n × n matrix over C. Consider the vector spaceV = F(R,Cn) of (smooth) functions defined on the real line taking values in Cn. Let

    30

  • f ∈ V be given. For convenience, we use letter t as the real variable. Thus, a “vector” inV can be written as y(t) = (y1(t), y2(t), . . . , yn(t)). For each j, the j–component of y(t)

    is yj(t), which is a function of real variable t. Write y′j(t) for the derivative of yj(t) and let

    y′(t) = (y′1(t), y′2(t), . . . , y

    ′n(t)). Define a linear operator T on V by putting Ty = y

    ′−Ay.Given f(t) = (f1(t), f2(t), . . . , fn(t)), the equation Ty = f becomes y

    ′ −Ay = f , or

    y′(t) = Ay(t) + f(t). (3.1.6)

    This is the “vector form” of the general system of linear ordinary differential equations

    with constant coefficients. If we spell out all components of this equation, we have

    y′1 = a11y1 + a12y2 + · · · + a1nyn + f1(t)y′2 = a21y1 + a22y2 + · · · + a2nyn + f2(t)

    ...

    y′n = an1y1 + an2y2 + · · · + annyn + fn(t)

    (3.1.7)

    We may consider the more general case, where the matrix entries aij (also called coef-

    ficients) are nonconstant, that is, they are allowed to be functions of t: aij = aij(t).

    In this case A = [aij ] is a matrix-valued function of t and, to emphasize this, we write

    A = A(t) = [aij(t)]. Equation (3.1.3) becomes y′(t) = A(t)y(t) + f(t). When the coeffi-

    cients aij are nonconstant, the basic theory of linear ordinary differential equations makes

    no big difference, but the solutions usually no longer can be expressed explicitly in terms

    of elementary functions.

    Example 3.1.5. Higher order linear ordinary differential equations

    In practice we often deal with higher order equations of the form

    y(n) + an−1y(n−1) + · · · + a2y(2) + a1y′ + a0y = f(t) (3.1.8)

    where y(k) stands for kth order derivative of y. We can rewrite (3.1.8) in the form Ty = f

    as follows. Let D be the operator of taking derivative; (we can write D = d/dt if we wish).

    Introduce the polynomial p(λ) = λn + an−1λn−2 + · · · + a1λ + a0, where the coefficients

    ak come from (3.1.8). Let

    T = p(D) ≡ Dn + an−1Dn−1 + · · · + a1D + a0I (3.1.9)

    Then (3.1.8) can be rewritten as Ty = f . Equation (3.1.8) can be rendered to a system of

    first order equations. Thus in theory equations of the form (3.1.8) are covered by systems

    of equations of the form (3.1.7). This can be seen by introducing the new functions

    31

  • y1 = y, y2 = y′, . . . , yn = y

    (n−1). We give an example to see how this can be done. The

    general case will be given in Appendix C. Consider the equation

    D3y − 2D2y + 3Dy − 5y = (t− 2)et. (3.1.10)

    Introduce new functions y1 = y, y2 = Dy, y3 = D2y. Then Dy3 = D

    3y = 5y − 3Dy +2D2y + (t− 2)et. Hence

    Dy1 = y2, Dy2 = y3, and Dy3 = 5y1 − 3y2 + 2y3 + (t− 2)et

    We can rewrite (3.1.10) as y′ = Ay + f , where

    y =

    y1y2y3

    , A =

    0 1 00 0 15 −3 2

    , f =

    00

    (t− 2)et

    Example 3.1.6. Linear difference equations

    A general linear difference equation of order N can be written as

    a0yk + a1yk + 1 + a2yk + 2 + · · · + aNyk + N = bk (3.1.11)

    Recall from Example 1.6.6 that on the space of all sequences y = (y0, y1, y2, . . .), the

    shift operator S is defined by (Sa)k = ak + 1. Notice that (Iy)k = yk, (Sy)k = yk + 1,

    (S2y)k = (S(Sy))k = (Sy)k + 1 = yk + 2, (S3y)k = (S

    2(Sy))k = (Sy)k + 2 = yk + 3, etc. In

    general,

    (Smy)k = yk + m. (3.1.12)

    Let p(λ) = a0 + a1λ+ a2λ2 + · · · + aNλN (here we use λ as the variable) and

    T = p(S) = a0I + a1S + a2S2 + · · · + aNSN . (3.1.13)

    Then, in view of (3.1.12), we have

    (Ty)k = a0(Iy)k + a1(Sy)k + a2(S2y)k + · · · + aN (SNy)k

    = a0yk + a1yk + 1 + a2yk + 2 + · · · + aNyk + N

    which is the left hand side of (3.1.11), Thus we have (Ty)k = bk for all k. Therefore

    Ty = b, with y = (y0, y1, . . .) and b = (b0, b1, . . .). We have shown that (3.1.11) can be

    written as Ty = b, where T is given in (3.1.13).

    We mention that there are many other important types of linear equations which can be

    put in the form of (3.1.1), not given here, such as linear partial differential equations.

    32

  • 3.2. Now we begin with some elementary theory of the equation

    Tx = b. (3.2.1)

    Here T is a linear mapping from V to W . If we replace the right hand side b by the

    zero vector 0, we get the so–called the corresponding homogeneous equation. All

    solutions to Tx = 0 form a vector space, called the kernel of T and is denoted by kerT .

    Indeed, if u, v are in kerT , then Tu = 0 and Tv = 0, and hence

    T (au + bv) = aTu + Tv = 0

    (for any scalars a, b), which entails au + bv ∈ kerT . In mathematical symbols, we write

    kerT = {x ∈ V : Tx = 0}.

    Now suppose that v0 ∈ V is a solution to (3.2.1), that is, Tv0 = b. We claim: thesolution set of (3.2.1) is v0 + kerT ≡ {v0 + x : x ∈ kerT}; in other words, the generalsolution to (3.2.1) is the sum of the particular solution v0 and the general solution to the

    homogeneous equation (3.2.2). Indeed, if x ∈ kerT , then

    T (v0 + x) = Tv0 + Tx = b + 0 = b

    showing that v0 + x is indeed a solution to (3.2.1). On the other hand, if v is another

    solution of (3.2.1), then, letting x = v − v0, we have v = v0 + x and

    Tx = T (v − v0) = Tv − Tv0 = b− b = 0,

    telling us that v = v0 + x with x in kerT . This explains why sometimes (but not

    always) we solve (3.2.1) in two steps: step one, find the general solution to its corresponding

    homogeneous equation; step two, find a particular solution to (3.2.1).

    Example 3.2.1. Consider the differential equation y′−y = x. A particular solution tothis equation is yp = 1−x, as we can check directly; (a method for finding such a particularsolution will be discussed in §3.5 below). The general solution to the homogeneous equationy′ − y = 0 can be found by the method of separation of variables: rewrite this equation asdy/y = dx and integrate:

    dy/y =∫

    dx, which gives ln y = x + c, or y = Cex, where

    C = ec. From our above discussion we see that y = Cex + 1 − x is the general solution toy′ − y = x.

    Whether equation (3.2.1) has a solution depends on the vector b on the right hand

    side. We say that b is in the range of T if (3.2.1) has a solution. Thus the range of T

    33

  • is the set of all vectors b in W for which (3.2.1) has a solution, that is, there exists v

    in V such that Tv = b. We denote by T (V ) the range of T . Thus, in math symbols,

    T (V ) = {y ∈W : there exists v ∈ V such that Tv = y} = {Tv : v ∈ V }.

    Notice that T (V ) is a subspace of W . To see this, take y1, y2 in T (V ). Then there

    exist v1 and v2 such that Tv1 = y1 and Tv2 = y2. For any scalars a1, a2, we have

    a1y1 + a2y2 = a1Tv1 + a2Tv2 = T (a1v1 + a2v2)

    showing that a1y1 + a2y2 is indeed in T (V ).

    In the rest of the present section, we discuss some methods of solving some linear

    equations of types mentioned in various examples in the subsection §3.1.

    3.3. Denote by P the space of all polynomials and let s1, s2, . . . , sn be distinctpoints in the complex plane. Let T : P → Cn be the map given by

    Tp = (p(s1), p(s2), . . . , p(sn)).

    We are asked to solve Tp = b for given b = (b1, b2, . . . , bn). The corresponding

    homogeneous equation is Tp = 0, or (p(s1), p(s2), . . . , p(sn)) = 0. Thus, p is a solution

    to Tp = 0 if and only if s1, s2, . . . , sn are roots of p, in other words,

    Q(x) ≡ (x− s1)(x− s2) · · · (x− sn)

    is a factor of p(x). Thus the general solution to Tp = 0 is p(x) = Q(x)f(x), where f(x)

    is any polynomial. Denote by Qk(x) the polynomial obtained from Q(x) by deleting

    the factor x− sk; in other words, Qk(x) is that polynomial such that the identity

    Q(x) = (x− sk)Qk(x)

    holds. Notice that Qk(sj) = 0 for all j �= k and Qk(sk) �= 0. Let Lk(x) = Qk(x)/Qk(sk).Then we have Lk(sj) = 0 for j �= k and Lk(sk) = 1. (Using the Kronecker delta δjk,we can write Lk(sj) = δjk.) Recall the standard basis of C

    n:

    e1 = (1, 0, 0, . . . , 0, 0), e2 = (0, 1, 0, . . . , 0, 0), . . . en = (0, 0, 0, . . . , 0, 1).

    We have proved that TLk = ek for all k. Thus,

    b =∑n

    k = 1bkek =

    ∑n

    k = 1bkTLk = T

    (∑n

    k = 1bkLk

    )

    34

  • showing that the Lagrange polynomial L(x) =∑n

    k = 1bkLk(x) is a special solution to

    the interpolation problem Tp = b. The general solution to this problem can be written

    as p(x) = L(x) +Q(x)f(x), where f(x) is an arbitrary polynomial and L(x), Q(x) are

    polynomials as given above.

    Example 3.3.1. Solve the following Lagrange interpolation problem: find a

    polynomial p of degree 2 such that p(1) = 3, p(2) = 6, p(3) = 13.

    Solution. Using the above notation, we have Tp = (p(1), p(2), p(3)) and b =

    (3, 6, 13). We are asked to solve Tp = b. Now Q(x) = (x − 1)(x − 2)(x − 3), Q1(x) =(x − 2)(x − 3), Q2(x) = (x − 1)(x − 3) and Q3(x) = (x − 1)(x − 2), with Q1(1) = 2,Q2(2) = −1 and Q3(3) = 2. So L1(x) = 12(x − 2)(x − 3), L2(x) = −(x − 1)(x − 3) andL3(x) =

    12(x− 1)(x− 2). The Lagrangian polynomial is

    L(x) =3

    2(x− 2)(x− 3) − 6(x− 1)(x− 3) + 13

    2(x− 1)(x− 2) = 2x2 − 3x+ 4.

    So the general solution of our problem is 2x2 − 3x+ 4 + (x− 1)(x− 2)(x− 3)f(x), wheref(x) is any polynomial.

    3.4. A common situation occurs in a homogeneous equation Tx = 0 is that T = p(S),

    where p is a polynomial and S is an operator considerably simpler than T . Example 3.1.4

    with the operator in (3.1.6) and Example 3.1.5 with the operator in (3.1.10) are typical

    in such situation. Let λ1, λ2, . . . , λr be all roots of the polynomial p with multiplicities

    m1, m2, . . . , mr respectively. Then we have

    p(x) = a(x− λ1)m1(x− λ2)m2 · · · (x− λr)mr .

    and hence T = p(S) = a(S − λ1I)m1(S − λ2I)m2 · · · (S − λrI)mr . We claim that solving

    p(S)x = 0 (3.4.1)

    can be reduced to solving each of

    (S − λ1I)m1x = 0, (S − λ2I)m2x = 0, . . . , (S − λrI)mrx = 0 (3.4.2)

    More precisely, we have

    Proposition 3.4.1. With the above notation, the general solution to p(S)x = 0 can

    be written as x = x1 + x2 + · · · + xr, where xk (1 ≤ k ≤ r) is the general solution to thekth equation in (3.4.2), namely (S − λkI)mkx = 0.

    35

  • Before proving the above proposition, we give some examples to illustrate this.

    Example 3.4.2. Find the general solution to differential equation y′′ − y′ − 2y = 0.

    Solution. Rewrite the equation as p(D)y = 0, where p(x) = x2−x−2 = (x−2)(x+1)and D is the differentiation operator given by Dy = y′. Reduce p(D)y = 0 to two

    equations (D − 2I)y = 0 and (D + I)y = 0, or y′ − 2y = 0 and y′ + y = 0, which havegeneral solutions y = Ce2t and y = Ce−t respectively; (here we use t for the variable of

    the function y). Hence the general solution to y′′ − y′ − 2y = 0 is y = C1e2t + C2e−t.

    Example 3.4.3. Find the general solution to differential equation y′′ + y = 0.

    Solution. Rewrite the equation as p(D)y = 0, where p(x) = x2 + 1 = (x− i)(x+ i)and D is the differentiation operator given by Dy = y′. Reduce p(D)y = 0 to two

    equations (D − iI)y = 0 and (D + iI)y = 0, which have general solutions y = Ceit andy = Ce−it respectively. The general solution to y′′ + y = 0 is y = C1e

    it +C2e−it. Using

    Euler’s formula eit = cos t+ i sin t and e−it = cos t− i sin t, we can put it in another form:y = A cos t+B sin t.

    Example 3.4.4. Solve the difference equation yn + 2 − yn+ 1 − 2yn = 0.

    Solution. Rewrite the equation as p(S)y = 0, where p(x) = x2−x−2 = (x−2)(x+1)and S is the shift operator given by Syn = yn + 1. Reduce p(S)y = 0 to two equations

    (S − 2I)y = 0 and (S + I)y = 0, or yn + 1 − 2yn = 0 and yn + 1 + yn = 0, which havegeneral solutions yn = a2

    n and y = a(−1)n respectively. Hence the general solution toyn+ 2 − yn+ 1 − 2yn = 0 is y = a2n + b(−1)n.

    Now we return to the proof of Proposition 3.4.1. Since (S − λkI)mk is a factor ofT = p(S), solutions to (S−λkI)mkx = 0 are also solutions to p(S)x = 0. Hence expressionsof the form x = x1 + x2 + · · · + xr described in the proposition are solutions to Tx = 0.Next we prove that all solutions to Tx = 0 can be put in the form x = x1 + x2 + · · · + xras described in the proposition (this is the hard part). Let

    p1(x) = (x− λ1)m1 , p2(x) = (x− λ2)m2 · · · (x− λr)mr .

    Then p(x) and p2(x) are coprime (because they don’t have any root in common) and hence

    there are polynomials q1(x) and q2(x) such that p1(x)q1(x)+p2(x)q2(x) = 1. Suppose that

    v is a solution to Tx = 0. Let v1 = p2(S)q2(S)v and v2 = p1(S)q1(S)v. Then

    v = Iv = (p1(S)q1(S) + p2(S)q2(S))v = v2 + v1 = v1 + v2

    36

  • with p1(S)v1 = p1(S)p2(S)q2(S)v = p(S)q2(S)v = q2(S)p(S)v = q2(S)Tv = 0 and

    similarly we have p2(S)v2 = 0. Thus v = v1 + v2, where v1 is a solution to p1(S)x = 0

    and v2 is a solution to p2(S)x = 0. Now the proof can be completed by induction on r.

    3.5. For a linear operator T on a linear space V , sometimes we can find a particular

    solution of Tx = b by taking a judicial choice of a subspace M such that b belongs to M ,

    and M is invariant under T in the sense that T (M) ⊆ M , that is, for all v in M , Tv isalso in M . This method will be clear by going through some examples. (Our choice of M

    in these examples will be illuminated in §1.3 of the next chapter.)

    Example 3.5.1. Find the indefinite integral∫

    ex sinx dx by using functions of the

    form aex sinx+ bex cosx.

    Solution. This integral is a solution to the equation Du = ex sinx, where D = d/dx.

    Our choice of subspace M consists of functions of the form u = aex sinx+ bex cosx. Now

    Du = D(aex sinx+ bex cosx)

    = a(ex sinx+ ex cosx) + b(ex cosx− ex sinx)= (a− b)ex sinx+ (a+ b)ex cosx.

    Thus, if u is indeed a solution to Du = ex sinx, then we set a− b = 1 and a+ b = 0, whichgives a = 1/2 and b = −1/2. So u = 12ex sinx − 12ex cosx is a solution to Du = ex sinxand ∫

    ex sinxdx =ex

    2(sinx− ex cosx) +C.

    This is originally a question in calculus, usually answered by integration by parts. As you

    see, it is easier to answer this by using linear algebra.

    Example 3.5.2. Find a particular solution to y′′ +y′ + y = x2 by using the subspace

    P2, which consists of functions of the form a+ bx+ cx2.

    Solution. Here our choice of the subspace is P2. Suppose that y = a+ bx + cx2 is a

    particular solution. Then

    x2 = y′′ + y′ + y

    = 2c+ (b+ 2cx) + (a+ bx+ cx2)

    = (2c+ b+ a) + (2c+ b)x+ cx2

    which gives 2c+ a+ b = 0, 2c+ b = 0 and c = 1. So c = 1, b = −1/2 and a = −3/2. Weconclude that y = x2 − 12x− 32 is a particular solution.

    37

  • The method described here is not limited to finding a particular solution to a nonhomo-

    geneous equation, as shown in the following example.

    Example 3.5.3. Find the general solution to y′′ − 2y′ + y = x2.

    Solution. Use the method described in the last example, we can find a particular

    solution to this equation yp = x2 + 4x + 6. Next we solve the homogeneous equation

    y′′ − 2y′ + y = 0, which can be rewritten as (D2 − 2D + I)y = 0, or (D − I)2y = 0.Let z = (D − I)y. Then we have (D − I)z = 0 with z = cex as its general solution.Next we write (D − I)y = z as y′ − y = cex. Choose the subspace M of functionsof the form axex + bex. If y = axex + bex is a solution to y′ − y = cex, then, fromy′ − y = (aex + axex + bex) − (axex + bex) = ax we get aex = cex and hence a = c. Thusy = axex + bex (b arbitrary) is a solution to y′ − y = aex and hence is also a solution toy′′ − 2y′ + y = 0. Thus the general solution to the original equation y′′ − 2y′ + y = x2 isy = axex + bex + x2 + 4x+ 6.

    Once we know in which subspace we should look for a solution, a complicated expres-

    sion for the operator T in the equation Tx = b does not intimidate us, as shown in the

    following example.

    Example 3.5.4. Let T be the operator on the space of functions given by Tf(x) =

    xf ′(x) + f(x+ 1) + f(0)x2. Find a (special) solution to Tf(x) = 1 − 2x+ 5x2.

    Solution. Since the right hand side is in P2 and since, as we can check, P2 is an

    invariant subspace of T , it is natural to look for a solution in P2. So we set f(x) =

    a+ bx+ cx2. Then

    Tf(x) = x(b+ 2cx) + a+ b(x+ 1) + c(x+ 1)2 + ax2

    = (a+ b+ c) + (2b+ 2c)x+ (3c+ a)x2.

    In order to satisfy the equation Tf(x) = 1 − 2x+ 5x2, we set a+ b+ c = 1, 2b+ 2c = −2and 3c + a = 5, which gives a = 2, b = −2 and c = 1. Hence f(x) = 2 − 2x + x2 is asolution to the equation Tf(x) = 1 − 2x+ 5x2.

    In the next chapter we provide some clue to “right guess” for solving Tx = b under certain

    circumstances by the dimensional consideration. In chapter 3 we describe a general method

    of solving system of equations called diagonalization.

    38

  • EXERCISE SET I.3.

    Review Questions: What are linear equations and their solutions? What are the major

    examples of linear equations? What is the relation between solutions of Tx = b and solu-

    tions of its homogeneous equation Tx = 0? How do we use factorization of a polynomial

    p(x) to solve a higher order linear differential equation p(D)y = 0 or difference equation

    p(S)y = 0?

    Drills

    1. Convert each of the following differential equations into a system of first order equa-

    tions of the form y′ = Ay + f in vector notation.

    (a) y′′ + 2y′ − 3y = 1 + t.(b) y′′ + y = sin t.

    (c) y(3) − 2y(2) + 3y′ − y = 0.(d) y′′ + ty′ + (cos t)y = sin t.

    2. Convert each of the following difference equations into a system of first order equations

    of the form yn + 1 = Ayn + f in vector notation.

    (a) yn + 2 + 2yn+ 1 − 3yn = 1 + t.(b) yn + 2 + yn = sin t.

    (c) yn + 3 − 2yn+ 2 + 3yn + 1 − yn = 1.(c) yn + 2 − 2nyn+ 1 + 3e−nyn = n2.

    3. Solve each of the following Lagrange interpolation problems

    (a) p ∈ P1, p(−1) = 3, p(2) = 9.(b) p ∈ P1, p(−1) = 2 − i, p(i) = 4 − i.(c) p ∈ P2, p(0) = 1, p(1) = 3, p(2) = 7.(d) p ∈ P3, p(−1) = −4, p(0) = −1, p(1) = 0, p(2) = 5.

    4. Solve each of the following homogeneous linear differential equations.

    (a) y′′ − 2y′ − 3y = 0.(b) y′′ − 2y′ + 2y = 0.(c) y′′ − 4y′ + 4y = 0.

    39

  • (d) y(4) − y = 0.

    5. Solve each of the following homogeneous linear difference equations.

    (a) yn + 2 − 2yn+ 1 − 3yn = 0.(b) yn + 2 − 2yn+ 1 + 2yn = 0.(c) yn + 2 − 4yn+ 1 + 4yn = 0.(d) yn + 4 − yn = 0.

    6. Use the method of linear algebra as suggested in the present section to find the fol-

    lowing indefinite integrals.

    (a)

    (4 + x + 5x22x3)ex dx. (Hint: Use the linear space of functions which can be

    expressed in the form aex + bxex + cx2ex + dx3ex.)

    (b)

    (2 sin2 x − cos2 x + sinx cosx)ex dx. (Hint: Use the linear space of functionswhich can be expressed in the form aex sin2 x+ bex cos2 x+ cex sinx cosx.)

    7. Use the method suggested in the present section solve each of the following linear

    differential equations.

    (a) y′′ − 2y′ − 3y = 8 − 4t− 3t2. (Hint: Use the linear space of functions which canbe expressed in the form a+ bt+ ct2.)

    (b) y′′−2y′+2y = et. (Hint: Use the linear space of functions which can be expressedin the form aet.)

    (c) y′′ − 4y′ + 4y = tet − 4et. (Hint: Use the linear space of functions which can beexpressed in the form aet + btet.)

    (d) y(4) − y = cos t − sin t. (Hint: Use the linear space of functions which can beexpressed in the form a sin t+ b cos t.)

    8. Use the method suggested in the present section solve each of the following linear

    difference equations.

    (a) yn + 2 − 2yn+ 1 − 3yn = 2 + 4n− 4n2.(b) yn + 2 − 2yn+ 1 + 2yn = n+ 1.(c) yn + 2 − 4yn+ 1 + 4yn = 2n + 2.(d) yn + 4 − yn = 1. (Hint: Try yn = an+ b.)

    40

  • Appendices for Chapter I

    Appendix A*: Axioms for vector spaces

    By a vector space V over a field F we mean a set V with an algebraic structure endowed

    by the following two operations

    1. Addition, allowing us to add two vectors u and v to obtain their sum u + v.

    2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av.

    such that the following 8 axioms are satisfied:

    (V1) u + v = v + u (for all u,v in V .)

    (V2) u + (v + w) = (u + v) + w, (for all u,v,w in V .)

    (V3) There is a unique object 0 called zero vector such that u + 0 = u for all u.

    (V4) For each u in V , there is a v in V such that u + v = 0. (We will call v the

    negative of u and denote it by −u.)(V5) (a+ b)u = au + bu (for all u in V and for all scalars a, b.)

    (V6) a(u + v) = au + av (for all u,v in V and for all scalars a.)

    (V7) a(bu) = (ab)u (for all u in V and for all scalars a, b.)

    (V8) 1u = u (for all u in V .)

    There is a systematic way to read these rules governing the operations (addition and scalar

    multiplication) of a vector space. (V1) to (V4) concern addition only. (V1) and (V2) are

    commutative law and associative law respectively, same as those governing addition of

    numbers. (V3) and (V4) concern the existence of the zero vector and the “negative” to a

    vector. Any algebraic system with an addition operation satisfying (V1) to (V4) is called

    an abelian group. (V5) and (V6) involves both operations; they resemble the distributive

    law. (V7) and (V8) concern scalar multiplication only. (V7) resembles the associative law

    and (V8) is a rule for normalizing this operation.

    We can derive elementary properties of vector space operations from (V1)–(V8) such as

    (1) a0 = 0. (Here 0 is the zero vector given in (V3).)

    (2) 0v = 0.

    (3) If av = 0, then either a = 0 or v = 0.

    (4) (−1)v = −v. (Here −v is the negative of v described in (V4).)

    41

  • To prove (1), notice that from (V3) we have 0+0 = 0 (u+0 = u holds for u = 0). Hence

    a0 = a (0 + 0) = a0 + a0

    by (V6). Adding −(a0) to both sides and apply the associative law (V2) on the right-handside, we obtain 0 = a0, giving (1). The proof of (2) goes in the same manner:

    0v = (0 + 0)v = 0v + 0v

    by (V5), and then add −(0v) to both sides. The proof of (3) is a bit more subtle. In casea = 0, there is nothing to prove. So we may (and we do) assume that a �= 0. Thus it islegitimate to consider a−1 and multiply both sides of av = 0 by a−1. Thus

    a−1(av) = a−10 = 0

    in view of (1) that we have proved a minute ago. On the other hand, by (V7),

    a−1(av) = (a.a−1)v = 1v,

    which is v by (V8). Hence v = 0, as desired. Assertion (4) says (−1)v is the negative ofv. This means, according to (V4), v + (−1)v = 0. The last identity can be checked asfollows:

    v + (−1)v = 1v + (−1)v = (1 + (−1))v = 0v = 0.

    Hence (4) is valid.

    We only consider natural examples of vector spaces. But we have to be aware that

    weird looking vector spaces do exist. Here is a example of weird spaces suggested by special

    relativity. Let V consist of real numbers v strictly between −1 and 1: −1 < v < 1. We use⊕ and ⊙ to indicate the two basic operations in V defined as follows to distinguish themfrom the usual addition and multiplication of real numbers: for u, v ∈ V and a ∈ R,

    u⊕ v = u+ v1 + uv

    , a⊙ u = (1 + u)a − (1 − u)a

    (1 + u)a + (1 − u)a .

    For checking (V1) to (V8), you need to have a large piece of paper, write small, and a lot

    of patience.

    Appendix B*: Fields

    Besides R and C, we briefly mention other fields. First, the smallest field in R is the

    field of all rational numbers, denoted by Q. Recall that a rational number is a number

    42

  • which can be written as a fraction of integers, that is, it can be expressed as m/n, where

    m and n are integers with n �= 0. Between Q and C there are many so called algebraicnumber fields. A simple example of algebraic number fields is Q(

    √2), consisting of

    numbers of the form a + b√

    2, where a and b are rational numbers. Algebraic number

    theory is a big industry having many mathematicians as workers, so to speak. Beyond

    C there is the field of rational functions. Recall that a rational function is a function

    which can be expressed as p(x)/q(x), where p(x) and q(x) are polynomials. Going further

    are so called algebraic function fields. They can be defined by pure algebraic means via

    “field extensions” or by analytic means so that they appear as the field of “meromorphic

    functions on Riemann surfaces”.

    Now we describe finite fields, which find many practical use in recent years, such as

    coding and cryptography. The simplest finite field is F2, which consists of two elements

    denoted by 0 and 1. The addition and multiplication in F2 are given by 0 + 0 = 0,

    0 + 1 = 1 + 0 = 1, 1 + 1 = 0, 1 × 0 = 0 × 1 = 0 × 0 = 0, 1 × 1 = 1. More generally,for any prime number p, we can define the finite field Fp, which has p elements. Every

    finite field can be written as Fq, where q = pn, where p is a prime number and n is a

    positive integer. An interesting aspect of the theory is, Fq can be regarded as a linear

    space over Fp and the basic theory of linear algebra is a good guide to the understanding

    the finite field Fq. Because of the multiplication operation of a finite field, each element

    in Fq naturally associates with a linear operator on this vector space and a large portion

    of operator theory in linear algebra applies.

    A fancy way to do number theory is to introduce the so–called “valuation fields”,

    such as the field of p-adic numbers. Mathematically it is one of the most fascinating and

    challenging research area. It seems to me that, though greatly respected, practically it is

    completely useless. However, some experts in number theory start to speculate its link to

    some deepest myths of our universe.

    Appendix C*: Converting higher order linear ODEs to systems of first order ODEs

    In Example 3.1.4 we consider higher order equations of the form

    y(n) + an−1y(n−1) + · · · + a2y(2) + a1y′ + a0y = f(t) (3.1.5)

    Introduce the new functions y1 = y, y2 = y′, . . . , yn = y

    (n−1). Then we have

    y′1 = y2, y′2 = y3, . . . , y

    ′n−1 = yn,

    y′n = −a0y1 − a1y2 − · · · − a(n−1)yn.

    43

  • We can rewrite (3.1.5) as y′ = Ay + f , where

    A =

    0 1 0 0 · · · 00 0 1 0 · · · 00 0 0 1 · · · 0

    0 0 0 0 · · · 1−a0 −a1 −a2 −a3 · · · −an−1

    . (3.1.7)

    with

    y =

    y1y2...yn

    =

    yy′

    ...y(n−1)

    , and f =

    0...0f

    (As we shall see, matrixA has many interesting properties.) In practice there is no apparent

    advantage of converting a higher order equation into a system of first order equations. But,

    in developing the theory, a system of first order equations is easier to deal with than a

    higher order equation.

    In the same vein we can convert a higher order linear difference equation to a system

    of first order linear difference equations.

    44