linear algebra 03

8/8/2019 Linear Algebra 03

http://slidepdf.com/reader/full/linear-algebra-03 1/86

Vector Algebra and Elements of Linear Algebra

Werner Stulpe



Contents

1 Vector Algebra and Geometry 11.1 Points, Vectors, and Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 The Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4 Straight Lines and Planes in Space . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Elements of Linear Algebra 262.1 Systems of Linear Equations I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3 Linear Independence, Bases, and Dimension . . . . . . . . . . . . . . . . . . . . . 342.4 Linear Maps and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.5 Kernel, Image, and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.6 Systems of Linear Equations II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.7 Remarks on the Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.8 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652.9 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

References 80

0



Chapter 1

Vector Algebra and Geometry

1.1 Points, Vectors, and Coordinate Systems

Since the elementary concept of vectors is closely related to geometry, we recall some fundamentalgeometrical notions intuitively. Everything happens somewhere in space; we denote the latter

by P 3. The space P 3 consists of points P ; set-theoretically speaking, the points are the elementsof the set P 3, P ∈ P 3. Points have no extension and no dimension. A line segment P Q withthe end points P and Q, P = Q, is realized by the “shortest” junction of P and Q, we haveP Q = QP ⊂ P 3. A straight line L through the points P and Q is an infinite extension of theline segment P Q, P Q ⊂ L ⊂ P 3. We assume that we know what the distance d ≥ 0 betweentwo points P and Q is (resp., the length l > 0 of the line segment P Q where P = Q). Moreover,we take the concept of the angle α between two straight lines, half-lines (rays), or line segmentsfor granted where 0◦ ≤ α ≤ 180◦, respectively, 0 ≤ φ ≤ π. Finally, according to our experiencefrom everyday life we say that the space P 3 is three-dimensional. The set P 3 with the indicatedstructures is called the three-dimensional affine-Euclidean space.—It is rather obvious that inP 3 the Pythagoras theorem for right triangles holds, later we shall prove this characteristic

statement on the basis of vectors.A plane H ⊂ P 3 is determined by two different intersecting or parallel straight lines or by

three (different) points not lying on one straight line. In P 3 two different non-parallel straightlines do in general not intersect. To study plane figures and curves like triangles, quadrilaterals,polygons, circles, discs, etc., it is sufficient to consider these as subsets of one plane that is keptfixed, distinguished by definition, and called the two-dimensional affine-Euclidean space P 2; thecorresponding aspect of geometry is called plane geometry. Spatial geometry moreover dealswith the study of solids, (curved) surfaces, and spatial curves, like cubes, balls, spheres, helices,etc. where the spatial objects are considered as subsets of P 3.

Definition 1.1

(a) A vector a = P Q-

is given by an ordered pair of points P, Q ∈ P , P denoting P 3 or P 2.Two pairs of points represent the same vector, i.e.,

a = P Q-

= RS -

,

if and only if the quadrilateral PQSR is a (possibly degenerated) parallelogram (Figure1.1).

A vector can also be seen as a quantity determined by its length (magnitude) and its(oriented) direction. In every case a vector can be represented by an arrow.

(b) We denote the set of all vectors a acting in

P 3 by

E 3; thus, a

∈ E 3. Analogously, if a acts

in P 2, a ∈ E 2. The denotation E means E 3 or E 2.

1



P

R

Q

a

a

S

Figure 1.1: Two representatives of the same vector

(c) The vector 0 := P P -

, P ∈ P , is called the zero vector , where in general the arrow is

omitted, i.e., 0 =: 0. The inverse of a ∈ E , a = P Q-

, is defined to be the vector −a := QP -

.

(d) The length of a vector a is denoted by |a|. A vector e is called a unit vector if |e| = 1.

(e) We have the following algebraic operations with vectors:

(i) The addition of two vectors a, b ∈ E results in the sum a + b ∈ E that is definedaccording to the parallelogram law, respectively, by

a + b = P Q-

+ QR-

:= P R-

where a = P Q-

and b = QR-

(Figure 1.2).

(ii) The (scalar) multiplication of a vector a by a real number λ ∈ R results in the product λa ∈ E that is defined to be the vector of length λ|a| in the direction of a if λ > 0,resp., the vector of length |λ||a| in the opposite direction of a if λ < 0, resp., the zerovector if λ = 0 (Figure 1.3).

(f) The sets E 3 and E 2, equipped with the structures of addition of vectors, multiplicationof vectors by numbers, length of a vector, and angle between two vectors, are called thethree- and the two-dimensional Euclidean vector space, respectively.

a

b a + b

a

b a +

b

P Q

R

Figure 1.2: Addition of two vectors

x

x

l 0

l 0

a

l a

a

a

l a

x

( 1 ) a =

a

Figure 1.3: Multiplication of a vec-tor by a number

2



The algebraic operations with vectors satisfy a lot of rules, summarized in the followingtheorem.

Theorem 1.2 Let a, b, c ∈ E be vectors and λ, µ ∈ R real numbers. The following statementshold:

(a) Vector-space axioms:

(i) a + b = b + a commutative law

(ii) (a + b) + c = a + ( b + c) associative law

(iii) a + 0 = a zero vector

(iv) a + (−a) = 0 inverse vector

(v) λ(a + b) = λa + λ b

(vi) (λ + µ)a = λa + µadistributive laws

(vii) λ(µa) = (λµ)a =: λµa mixed associative law

(viii) 1a = a

(b)

(i) a + x = b ⇐⇒ x = b + (−a) =: b − a

(ii) λx = 0 ⇐⇒ λ = 0 or x = 0

(iii) (−1)a = −a

(c)

(i) |a| ≥ 0

(ii) |a| = 0 ⇐⇒ a = 0

(iii)|−

a|

=|a|(iv) |λa| = |λ||a|

(v) a|a| = 1, a = 0.

All these rules are obvious consequences of our geometrical definition of vectors and theiralgebraic operations. The statements of (a) are called the axioms of vector space since they havea more general meaning in mathematics and are the basis of many important conclusions, as weshall see in Chapter 2.

To understand rule (v) of (c), we have to define the division of a vector by a number λ = 0,namely, a

λ:= 1

λa. Rule (v) then means that every nonzero vector a divided by its length gives

a unit vector e in the direction of a, e = a

|a|; equivalently, a =

|a

|e.—Note that the division by

a vector is not defined.

Proof of 1.2: The vector-space axioms can be concluded only by means of geometrical evidence,whereas the statements of group (b) are implied by (a). The associative law is clear by Figure

1.4. The distributive law (v) follows from Figure 1.5; in fact, we have that AC -

= λa + λ b as well

as AC -

= λ(a + b), hence, λ(a + b) = λa + λ b. All the other vector-space axioms are obvious.To show the first of (b), consider the equation

a + x = b (1.1)

and add−

a to both sides:−a + (a + x) = −a + b.

3



a

b

c

a + b b + c

( a+ b )+ c= a+ ( b+ c )

Figure 1.4: Associative law

b

la

l b

a

a+ b

A B

C D

Figure 1.5: Distributive law

Taking account of rules (ii), (iv), (iii), and (i) of (a), we obtain

x = −a + b = b + (−a) =: b − a. (1.2)

That is, Eq. (1.1) has always the unique solution (1.2) (Figure 1.6). Now we prove the secondstatement of (b). According to rules (viii) and (vi) of (a), we can write

x = 1x = (1 + 0)x = 1x + 0x,

i.e.,x = x + 0x.

It follows that 0x = x − x = x + (−x) = 0, i.e., 0x = 0. Similarly,

λa = λ(a + 0) = λa + λ 0

which implies λ 0 = λa − λa = 0, i.e., λ 0 = 0. Conversely, let

λx = 0. (1.3)

If λ = 0, there is nothing to prove. If λ = 0, we multiply both sides of (1.3) by 1λ

and obtain

1

λ(λx) =

1

λ 0

where we already know that the right-hand side is equal to 0. The left-hand side can be writtenas

1λ

λ

x = 1x = x; hence, x = 0. Finally, to show (iii) of (b), observe that

a + (−1)a = 1a + (−1)a = (1 + (−1))a = 0a = 0,

from which it follows that (−1)a = 0 + (−a) = −a.The statements of (c) are clear by definition, (v) is also a consequence of (iv). 2

Example 1.3 Show that the diagonals of a parallelogram divide each other into two halves of equal length.

Consider a parallelogram in P 2 and denote its vertices counterclockwise by A, B, C , and D

(Figure 1.7). Defining a = AB-

and b = AD-

as well as x = AM -

and y = BM -

where M is theintersection point of the diagonals of the parallelogram, we have that

x = λ(a + b) (1.4)

y = µ( b − a) (1.5)x = a + y. (1.6)

4



b

a

a+ b

a

b

a

a

b

Figure 1.6: Difference of two vectors

b

a A B

C D

x y

a + ba

b

M

Figure 1.7: The diagonals of a parallelogram

We have to show that λ = 12 = µ. Inserting Eqs. (1.4) and (1.5) into (1.6), we obtain

λ(a + b) = a + µ( b − a),

resp.,(λ − 1 + µ)a = (µ − λ) b.

Since the vectors a, b

= 0 neither have the same nor the opposite direction (otherwise they would

not span a real parallelogram), it follows that

λ − 1 + µ = 0

µ − λ = 0.

Hence, λ = 12 = µ and

x =1

2(a + b)

y =1

2( b − a),

showing that the point M is in fact the midpoint of each of the two diagonals.

We remark that the system of the three equations (1.4)–(1.6) determines the four unknownsx, y, λ, and µ uniquely. The reason is that a vector of E 2 corresponds to two real numbers(namely, to its components w.r.t. a coordinate system, as we shall see next); therefore, thethree vectorial equations are equivalent to a system of six real equations in six real unknowns.—Although the solution presented in the example is quite instructive, the problem can be solvedeasier. Namely, let z be the vector from the point B to the midpoint of the diagonal from A toC . Then

z =1

2(a + b) − a.

The latter implies that z = 12

b

−12a = 1

2 ( b

−a), i.e., z is also the vector from B to the midpoint

of the other diagonal.

Definition 1.4 A right-handed Cartesian coordinate system in the affine-Euclidean space P 3 isgiven by a quadruple (O; e1, e2, e2) (Figure 1.8) where

(i) O ∈ P 3 is a fixed point, called the origin of the coordinate system

(ii) e1, e2, e3 ∈ E 3 are mutually orthogonal unit vectors (i.e., e1, e2, e3 are unit vectors perpen-dicular to each other)

(iii) e1, e2, e3 constitute a right-handed system .

Similarly, a “right-handed” Cartesian coordinate system in

P 2 is given by a triple (O; e1, e2)

consisting of a fixed point O ∈ P 2 and a positively oriented system e1, e2 of two orthogonalvectors of E 2.

5



e

e

e

1

2

3

O

e

e

e

1

2

3

O

e

e

e

1

2

3

O e

e

1

2

Oe1

e2

O

Figure 1.8: Two right-handed Cartesian coordinate systems in

P 3 and two positively oriented

ones in P 2

It is obvious that, w.r.t. a coordinate system, every vector a ∈ E 3 can uniquely be decomposedaccording to

a = v1 + v2 + v3 = a1e1 + a2e2 + a3e3 (1.7)

where the vectorial components of a, v1, v2, v3 ∈ E 3, are parallel to the coordinate axes and the(scalar) components a1, a2, a3 ∈ R of a are, up to the sign, just the respective lengths of v1, v2, v3

(Figure 1.9). For vectors a ∈ E 2, Eq. (1.7) reads

a = v1 + v2 = a1e1 + a2e2

(Figure 1.10).—In the following of this chapter, we formulate our statements in most cases onlyfor E 3, resp., P 3. Next we summarize the observation (1.7) and two other ones.

e

e

e

v

v

v

a

d

1

2

3

1

2

3

O

Figure 1.9: Components of a vector a ∈ E 3

v

v

a

1

2

e

e

1

2

O

Figure 1.10: Components of a vector a ∈ E 2

Observation/Definition 1.5 Let a fixed coordinate system (O; e1, e2, e3) in P 3 be given.

(a) Every vector a ∈ E 3 is uniquely characterized by its components a1, a2, a3 ∈ R:

a = a1e1 + a2e2 + a3e3 =:

a1

a2

a3

.

(b) Every point P ∈ P 3 corresponds uniquely to a vector r ∈ E 3 such that

r = OP -

.

Such a vector r is called a position vector (Figure 1.11).

(c) For every point P ∈ P 3, we have

r = OP -

= x1e1 + x2e2 + x3e3 =

x1

x2

x3

=:

x

y

z

.

The components x1 = x, x2 = y, x3 = z of the position vector r ∈ E 3 are called thecoordinates of the point P .

6



e

e

e

1

2

3

O

P

x

y

z

x

y

z r

Figure 1.11: Position vector and coordinatesof a point

X

X

X

X

OO

P

P

P

P

1

2

r

r

r r 11

2

2

Figure 1.12: The sum of the position vectorsof two points depends on the origin

Again we emphasize the difference between points and vectors. A vector a ∈ E is representedby an ordered pair of points; given any point P

∈ P , there is exactly one point Q

∈ P such

that a = P Q-

. One can say that the vector a transforms the point P into Q; in this sense, thevectors of E act on the points of P as translations. Whereas vectors can be added, the additionof two points is not defined. There is no meaningful interpretation of the sum of the two positionvectors of two points, in particular, the result depends on the coordinate system (Figure 1.12).

Next we present several rules and formulas which in particular express the operations withvectors in terms of their components.

Observation 1.6 With respect to a given coordinate system, we have that

(i)

a + b = a1

a2

a3

+ b1

b2

b3

= a1 + b1

a2 + b2

a3 + b3

where a, b ∈ E 3

(ii)

λa = λ

a1

a2

a3

=

λa1

λa2

λa3

where λ

∈R and a

∈ E 3

(iii) the length of the vector a ∈ E 3 is given by

|a| =

a21 + a2

2 + a23

where a =

a1a2a3

(iv) the distance d of a point P ∈ P 3 from the origin O of the coordinate system is

d = |r| =

x2 + y2 + z2

where r = OP -

= xy

z

7



(v) the distance d of any two points P 1, P 2 ∈ P 3 is

d = |r1 − r2| =

(x1 − x2)2 + (y1 − y2)2 + (z1 − z2)2

where r1 = OP 1-

=

x1y1z1

and r2 = OP 2

-

=

x2y2z2

(Figure 1.13).

Proof: The rule (i) follows according to

a+ b = (a1e1+a2e2+a3e3)+(b1e1+b2e2+b3e3) = (a1+b1)e1+(a2+b2)e2+(a3+b3)e3 =

a1 + b1

a2 + b2

a3 + b3

where we have used some of the rules stated in Theorem 1.2. Similarly we obtain

λa = λ(a1e1 + a2e2 + a3e3) = λa1e1 + λa2e2 + λa3e3 =

λa1

λa2

λa3

.

The vector a = a1

e1

+ a2

e2

+ a3

e3

can be considered as one of the spatial diagonals of arectangular box whose sides are given by the vectorial components a1e1, a2e2, and a3e3 (Figure1.9). The lengths of the sides are |a1|, |a2|, and |a3|, and a twofold application of the Pythagorastheorem yields

|a|2 = (|a1|2 + |a2|2) + |a3|2 = a21 + a2

2 + a23,

i.e., statement (iii). Statement (iv) follows from d = |r| and (iii), and statement (v) is impliedby

d = |r1 − r2| =

x1

y1

z1

− x2

y2

z2

=

x1 − x2

y1 − y2

z1 − z2

and (iii) again; moreover, (iv) is a consequence of (v). 2

O

x

y

z

P

P

1

2

r 1

r 2

r 1

r 2

Figure 1.13: Distance of two points

M

x

x

r 1

r 2

r M

r

r 1

2

O

Figure 1.14: Midpoint of a line segment

W.r.t. a given coordinate system, we can identify the points with their position vectors;instead of using the precise formulation “the point with the position vector r,” we say briefly“the point r ”.

Example 1.7 Determine the length of the line segment with the end points r1 =

13

−1

and

r2 = −452

as well as the coordinates of its midpoint (Figure 1.14).

8



The length is

d = |r1 − r2| =

1

3−1

− −4

52

=

5

−2−3

=√

25 + 4 + 9 =√

38,

and the midpoint is given by

rM = r2 +1

2(r1 − r2) =

1

2(r1 + r2) =

1

2

13

−1

+

−452

=1

2

−381

=

−32412

.

The concepts point, vector, length, area, volume are defined without respect to a coordinatesystem, whereas the components of a vector, the position vector of a point, and the coordinatesof a point depend on the chosen coordinate system; the components of a vector depend on thedirection of the coordinate axes, the position vector of a point depends on the origin, and thecoordinates of a point depend on the origin as well as on the direction of the axes. In physics,scalar quantities are those that are given by their magnitude, i.e., by a number and a unit, and donot depend on the coordinate system (e.g., length, area, volume, mass, temperature); vectorial quantities are determined by their magnitude and their direction and do consequently not dependon the coordinate system (e.g., velocity, force, momentum, electrical field strength), providedthat their definition does not involve the position vector (e.g., torque or angular momentum).The components of a vectorial physical quantity or the coordinates of a point are given by theirvalues with respect to a coordinate system, so these are nonscalar physical quantities.—Finally,we remark that, although a vector is invariant under translation, in physics the beginning pointis often essential (e.g., where a force applies).

1.2 The Scalar Product

In the remaining sections of this chapter, we need, besides the concept of vectors, only to know

(i) the elementary trigonometric definition of sin φ and cos φ in the context of a right triangle,0 ≤ φ ≤ π

2

(ii) sin φ = sin(π − φ) and cos φ = − cos(π − φ) where 0 ≤ φ ≤ π

(iii) the definition of the angle φ between two vectors a, b = 0, in particular, 0 ≤ φ ≤ π.

Definition 1.8 The scalar product (inner product, dot product) of two vectors a, b ∈ E is definedby

a · b := |a|| b| cos φ

where φ is the angle between a and b. (Clearly, if a = 0 or b = 0, it is understood that a · b = 0although the angle between the vectors is not uniquely defined.)

b

f a

p

b

fa

pp-f

Figure 1.15: Scalar product of two vectors including an acute, resp., obtuse angle

9



Remark 1.9 We have that

(i) if 0 < φ < π2 , a · b = |a|| b| cos φ = |a| p where p = | b| cos φ is the length of the projection of

b onto a (Figure 1.15)

(ii) if π2 < φ < π, a · b = |a|| b| cos φ = −|a| p where p = | b| cos(π − φ) = −| b| cos φ is the length

of the projection of b onto the straight line determined by a

(iii) if φ = 0, a · b = |a|| b|(iv) if φ = π

2 , a · b = 0

(v) if φ = π, a · b = −|a|| b|.

The following theorem summarizes the algebraic properties of the scalar product.

Theorem 1.10 Let a, b, c ∈ E and λ ∈ R. Then

(i) a · b = b · a commutative law

(ii) a · ( b + c) = a ·

b + a · c

(a + b) · c = a · c + b · cdistributive laws

(iii) a · (λ b) = λ(a · b) = (λa) · b =: λa · b

(iv) a · a = |a|2 ≥ 0

(v) a · a = 0 ⇐⇒ a = 0

(vi) a · b = 0 ⇐⇒ a = 0 or b = 0 or ( a, b = 0 and a ⊥ b).

Proof: The commutative law is clear. To show the first distributive law, assume that a and b

as well as a and c include an acute angle (Figure 1.16). Let p be the length of the projection of b

onto a and q the length of the projection of c onto a. It is geometrically evident that the length

of the projection of b + c is just the sum p + q. According to statement (i) of the precedingremark, it follows that

a · ( b + c) = |a|( p + q) = |a| p + |a|q = a · b + a · c.

The cases where at least one of the considered angles is not acute, are treated similarly. By thecommutative law, the second distributive law is a consequence of the first.

If φ is the angle between a and b, we conclude that

a · (λ b) =

|a|λ| b| cos φ, λ ≥ 0

|a||λ|| b| cos(π − φ), λ < 0

= λ|a|| b| cos φ= λ(a · b),

p

q

a

b

c

b+ c

c

Figure 1.16: Distributive law for the scalar product

10



i.e.,a · (λ b) = λ(a · b). (1.8)

The latter implies(λa) · b = b · (λa) = λ( b · a) = λ(a · b); (1.9)

Eqs. (1.8) and (1.9) yield statement (iii) of the theorem.Finally, a

·a =

|a

|2 cos 0 =

|a

|2

≥0, a

·a = 0 if and only if

|a

|= 0, i.e., a = 0, and a

· b = 0 if

and only if |a|| b| cos φ = 0, i.e., a = 0 or b = 0 or φ = π2 . 2

Next we look at the component representation of the scalar product.

Observation 1.11 If a =

a1a2a3

and b =

b1b2b3

w.r.t. a coordinate system, then

a · b = a1b1 + a2b2 + a3b3.

Proof: Since e1, e2, e3 is an orthogonal system of unit vectors, we conclude that e1 · e1 = 1,e1

·e2 = 0, etc. Hence, a straight forward calculation yields

a · b =

a1

a2

a3

· b1

b2

b3

= (a1e1 + a2e2 + a3e3) · (b1e1 + b2e2 + b3e3)

= a1b1e1 · e1 + a1b2e1 · e2 + a1b3e1 · e3 + a2b1e2 · e1 + a2b2e2 · e2 + a2b3e2 · e3

+ a3b1e3 · e1 + a3b2e3 · e2 + a3b3e3 · e3

= a1b1 + a2b2 + a3b3.

2

Combining Definition 1.8 and Observation 1.11, we obtain

a · b = |a|| b| cos φ = a1b1 + a2b2 + a3b3 (1.10)

which allows to calculate the angle between two vectors from their components. Moreover,setting b = a in (1.10), it follows that |a|2 = a2

1 + a22 + a2

3, resp.,

|a| =

a21 + a2

2 + a23. (1.11)

Setting b = e1 =

100

in (1.10), it follows that |a| cos α1 = a1 where α1 is the angle between

a and the first coordinate axis. The choice b = e2 = 0

10 , resp., b = e3 = 0

01 yields two

analogous equations. Summarizing,

a1 = |a| cos α1

a2 = |a| cos α2 (1.12)

a3 = |a| cos α3.

Note that Eq. (1.11) was already obtained from the Pythagoras theorem as part (iii) of Ob-servation 1.6 and that Eqs. (1.12) are also geometrically evident. Finally, (1.12) and (1.11)imply

cos2 α1

+ cos2 α2

+ cos2 α3

= 1.

11



Example 1.12

(a) What is the angle between a =

123

and b =

−105

?

According to (1.10) and (1.11) we obtain

cos φ =

a

· b

|a|| b| =

a1b1 + a2b2 + a3b3 a21 + a2

2 + a23 b2

1 + b22 + b2

3(1.13)

=−1 + 0 + 15√

1 + 4 + 9√

1 + 25=

14√14

√26

=

14

26=

7

13.

Using a pocket calculator, we find φ ≈ 42.79◦.—The angle φ can also be calculated bymeans of the cosine theorem of elementary trigonometry. To that end, consider the trianglespanned by the vectors a and b (Figure 1.17). That is, two sides of this triangle are givenby a and b including the angle φ, and the third is given by b − a or a − b. Now the cosinetheorem yields

| b − a|2 = |a|2 + | b|2 − 2|a|| b| cos φ,

resp.,

cos φ =|a|2 + | b|2 − |a − b|2

2|a|| b|. (1.14)

The use of formula (1.14) requires more calculation work than the use of (1.13). Moreimportantly, the cosine theorem itself is a consequence of vector algebra, as the nextexample shows.

(b) Proof of the cosine theorem

Let a, b, and c be the lengths of the sides of a triangle and let γ be the angle oppositethe side of length c. Introduce vectors a, b, and c for the corresponding sides such that

a + b = −c (Figure 1.18); note that the angle between a and

b is π − γ . We conclude that

|c|2 = c · c = (−c) · (−c) = (a + b) · (a + b)

= a · a + a · b + b · a + b · b

= |a|2 + 2a · b + | b|2

= |a|2 + | b|2 + 2|a|| b| cos(π − γ )

= |a|2 + | b|2 − 2|a|| b| cos γ,

i.e.,c2 = a2 + b2 − 2ab cos γ

a b

a

b

f

Figure 1.17: Angle between two vectors

a b A

g

p-g

B

C

a b

c

Figure 1.18: Triangle

12



which is the cosine theorem. For γ = π2 , we obtain the Pythagoras theorem

c2 = a2 + b2

as a special case.

(c) Projection of b along a

The projection of b ∈ E along a ∈ E , a = 0, is the vector

p = | b| cos φ e

where φ is the angle between b and a and e is the unit vector satisfying a = |a|e (Figure1.19). It follows that

p = (e · b)e =

a

|a| · b

a

|a| ;

an expression like (e · b)e is sometimes written as e · be. The result for p is

p =

a

· b

|a|2 a =

a

· b

a · aa.

According to the decomposition b = p + q, the vector q = b − p must be orthogonal to a;in fact,

q · a = b · a − p · a = a · b − a · b

|a|2a · a = 0.

For instance, if a =

2−1

and b =

−11

, we obtain p = − 3

5

2

−1

and q = 1

5

12

.

f

e p

q

a

b

Figure 1.19: Projection of a vector along some other one

We conclude this section with several remarks. First, a product of the kind a · b · c cannotbe defined since, on the one hand, it should be equal to a( b ·c) and, on the other hand, equal to(a · b)c; however, a multiple of the vector a is in general different from a multiple of the vector b.Second, the square of a vector can be understood according to a2 := a ·a = |a|2—we do not usethis; higher powers of a are not defined. Third, as already indicated, the product a · bc (withouta second dot!) is defined, namely, a · bc := (a · b)c; it is, however, clearer to set the parentheses.Finally, one cannot divide a number λ by a vector a since the equation a · x = λ, a = 0, hasalways infinitely many solutions. In fact, we have |x| cos φ = λ

|a| ; that is, if λ > 0, every vector

x that includes an acute angle with a and whose projection onto a is of length λ|a| , is a solution

of a · x = λ.

13



1.3 The Vector Product

Definition 1.13 The vector product (cross product) of two vectors a, b ∈ E 3 is defined to bethat vector a × b ∈ E 3 that is determined as follows (Figure 1.20):

(i) the length of a × b is equal to the area of a parallelogram spanned by a and b, i.e.,

|a × b| = |a||

b| sin φ

where φ is the angle between a and b

(ii) a × b is perpendicular to a as well as to b

(iii) a, b, a × b constitute a right-handed system.

a

b

f h

a b x

Figure 1.20: Vector product of two vectors

We remark that the vec-tor product of two vectorsis defined only in the (ori-ented) three-dimensional Eu-clidean vector space

E 3. There

is no analog in the space E 2.—The following theorem sum-marizes the algebraic proper-ties of the vector product.

Theorem 1.14 Let a, b, c ∈ E 3 and λ ∈ R. Then

(i) a × b = −( b × a) =: − b × a anticommutative law

(ii) a × ( b + c) = a × b + a × c(a + b) × c = a × c + b × c

distributive laws

(iii) a × (λ b) = λ(a × b) = (λa) × b =: λa × b

(iv) a × a = 0

(v) a × b = 0 ⇐⇒ a = 0 or b = 0 or a = µ b, µ ∈ R⇐⇒ a = µ b or b = νa, µ, ν ∈ R.

Proof: The anticommutative law is a direct consequence of the preceding definition, in par-ticular, of the defining property (iii). To show the second distributive law, consider a planeperpendicular to c and project the vector a orthogonally onto that plane (see Figure 1.21). The

projected vector has the length |a| sin φ where φ is the angle between a and c. Stretching theprojected vector by the factor |c| and rotating it clockwise by a right angle around c, we obtain

14



| a | | c | s i n f

ax

a

b

c

c x

c

b

(

) x

a +

b

c

a +

b

f

Figure 1.21: Distributive law for the vector product

the vector a × c. The vectors b × c and (a + b) × c are constructed the same manner. Lookingat the parallelogram spanned by a × c and b × c, we conclude that

(a + b) × c = a × c + b × c.

The first distributive law is implied by the second and by (i) according to

a × ( b + c) = −(( b + c) × a) = −( b × a + c × a) = a × b + a × c.

Statement (iii) of the theorem is obvious. From |a ×a| = |a||a| sin 0 = 0 we obtain a ×a = 0.Finally, a

× b = 0 is equivalent to

|a

× b

|= 0, i.e., to

|a

|| b

|sin φ = 0. Hence, a = 0 or b = 0

or φ = 0 or φ = π; that is, a = 0 or b = 0 or a = µ b with µ = 0. Equivalently, a = µ b or b = νa. 2

Next we look at the component representation of the vector product.

Observation 1.15 If a =

a1a2a3

and b =

b1b2b3

w.r.t. a coordinate system, then

a × b =

a2b3 − a3b2

a3b1 − a1b3

a1b2 − a2b1

.

Proof: Since e1, e2, e3 is a right-handed orthogonal system of unit vectors, it follows thate1 × e1 = 0, e1 × e2 = e3, e2 × e1 = −e3, etc. In particular, ei × e j = ek for any cyclicpermutation of the indices i,j,k = 1, 2, 3. Hence, a straight forward calculation yields

a × b =

a1

a2

a3

× b1

b2

b3

= (a1e1 + a2e2 + a3e3) × (b1e1 + b2e2 + b3e3)

= a1b1e1 × e1 + a1b2e1 × e2 + a1b3e1 × e3 + a2b1e2 × e1 + a2b2e2 × e2 + a2b3e2 × e3

+ a3b1e3 × e1 + a3b2e3 × e2 + a3b3e3 × e3

= (a1b2 − a2b1)e3 + (a2b3 − a3b2)e1 + (a3b1 − a1b3)e2.

2

15



Combining Definition 1.13 and Observation 1.15, we obtain

a × b = |a|| b| sin φ e =

a2b3 − a3b2

a3b1 − a1b3

a1b2 − a2b1

where e is a unit vector in the direction of a × b. The relation

|a × b| = |a|| b| sin φ = a2b3 − a3b2

a3b1 − a1b3

a1b2 − a2b1

(1.15)

enables the calculation of the area of a parallelogram spanned by the vectors a and b directlyfrom the components of a and b.

Example 1.16

(a) What is the area of the parallelogram spanned by a =

123

and b =

−105

?

According to (1.15) we obtain

A = |a × b| = 1

23

× −1

05

= 10 − 0

−3 − 50 − (−2)

= 1

−82

=

√100 + 64 + 4 =

√168.

(b) Area of a parallelogram spanned by a =

a1a2

and b =

b1b2

We identify the two-dimensional Euclidean plane P 2 with a plane of the three-dimensionalEuclidean space P 3 and extend the coordinate system (O; e1, e2) in P 2 to a coordinatesystem (O; e1, e2, e3) in P 3. The parallelogram (Figure 1.22) is then spanned by the vectors

A = a1a20 and B =

b1b20 and its area is consequently

A = | A × B| =

0

0a1b2 − a2b1

=

0 + 0 + (a1b2 − a2b1)2,

i.e.,A = |a1b2 − a2b1|. (1.16)

(c) Proof of the sine theorem

The area of the triangle of Figure 1.18 is

A =1

2|a × b| =

1

2| b × c| =

1

2|c × a|

from which it follows that

1

2|a|| b| sin(π − γ ) =

1

2| b||c| sin(π − α) =

1

2|c||a| sin(π − β ).

Denoting the lengths of the sides of the triangle simply by a, b, and c, we obtain

ab sin γ = bc sin α = ac sin β

or, equivalently,a

c=

sin α

sin γ ,

b

c=

sin β

sin γ ,

a

b=

sin α

sin β which is known to be the sine theorem.

16



e

e

1

2

O

a

b

Figure 1.22: Area of a parallelogram in the plane P 2

Remark 1.17 A real 2 × 2 matrix is a quadratic scheme of four real numbers, e.g., a1 b1a2 b2 .

The determinant of such a matrix is denoted and defined by

det

a1 b1

a2 b2

=

a1 b1

a2 b2

:= a1b2 − a2b1.

According to Eq. (1.16) the determinant of a 2 × 2 matrix is, up to the sign, just the area of theparallelogram spanned by its column vectors.

Observation/Definition 1.18

(a) The volume of a parallelepiped spanned by the vectors a, b, c

∈ E 3 is given by

V = |(a × b) · c|;

in particular,V = (a × b) · c

if the system a, b, c is right-handed. The number (a × b) · c is called the box product of a, b, c ∈ E 3.

(b) The box product is positive if a, b, c is a right-handed system, it is negative if a, b, c isleft-handed. The box product is zero if and only if the vectors a, b, and c lie in one plane,the latter including the case that a, b, or c is zero.

(c) The box product is invariant under cyclic permutation of its factors:

(a × b) · c = ( b × c) · a = (c × a) · b.

(d) Representing the vectors a, b, c w.r.t. a coordinate system, we have

(a × b) · c = (a2b3 − a3b2)c1 + (a3b1 − a1b3)c2 + (a1b2 − a2b1)c3. (1.17)

Proof: Consider the parallelogram spanned by a and b as the base of the parallelepiped andlet h be the corresponding height (Figure 1.23). If a, b, c is a right-handed system, the angle φ

between c and a

× b is acute, and we have h =

|c

|cos φ. In consequence,

V = Ah = |a × b||c| cos φ = (a × b) · c.

17



a

b

f

a b x

c

h

A

Figure 1.23: Volume of a parallepiped

If a, b, c is left-handed, the angle φ between c and a

× b is obtuse, and h =

|c

|cos(π

−φ).

Consequently,

V = Ah = |a × b||c| cos(π − φ) = −|a × b||c| cos φ = −(a × b) · c.

Hence, if the vectors a, b, and c span a real parallelepiped, i.e., if they do not lie in one plane,the box product is positive in the right-handed case and negative in the left-handed case; inboth cases the volume of the parallelepiped is V = |(a × b) · c|. From

(a × b) · c = |a|| b| sin θ|c| cos φ

where θ is the angle between a and b, it follows that the box product is zero if and only if thethree vectors lie in one plane.

Since the systems a, b, c and b, c, a as well as c, a, b have the same orientation and their vectorsspan the same (possibly degenerated) parallelepiped, statement (c) is implied by (a) and (b).Finally, a straightforward calculation involving Observations 1.11 and 1.15 yields

(a × b) · c =

a2b3 − a3b2

a3b1 − a1b3

a1b2 − a2b1

· c1

c2

c3

= (a2b3 − a3b2)c1 + (a3b1 − a1b3)c2 + (a1b2 − a2b1)c3.

2

Remark 1.19

(a) A real 3 × 3 matrix is a quadratic scheme of nine real numbers, e.g.,

a1 b1 c1a2 b2 c2a3 b3 c3

. The

determinant of such a matrix is denoted and defined by

det

a1 b1 c1

a2 b2 c2

a3 b3 c3

=

a1 b1 c1

a2 b2 c2

a3 b3 c3

:= c1

a2 b2

a3 b3

− c2

a1 b1

a3 b3

+ c3

a1 b1

a2 b2

(1.18)

= c1(a2b3 − a3b2) − c2(a1b3 − a3b1) + c3(a1b2 − a2b1)= c1(a2b3 − a3b2) + c2(a3b1 − a1b3) + c3(a1b2 − a2b1)

18



(cf. Remark 1.17). Comparing the last expression with the right-hand side of Eq. (1.17),we see that the box product can be represented as a determinant:

(a × b) · c =

a1 b1 c1

a2 b2 c2

a3 b3 c3

.

In consequence, the determinant of a 3 × 3 matrix is, up to the sign, just the volume of the parallelepiped spanned by its column vectors.

(b) From(a × b) · c = −( b × a) · c

it follows that

det

a1 b1 c1

a2 b2 c2

a3 b3 c3

= − det

b1 a1 c1

b2 a2 c2

b3 a3 c3

.

More generally, using the invariance of the box product under cyclic permutation of itsfactors as well as the anticommutative law of the vector product, one can easily show that

a 3 × 3 determinant changes its sign if any two columns are interchanged.

(c) The following determinant contains the unit vectors e1, e2, e3 and is not a real determinant.Applying the definition (1.18) to this formal determinant, we obtain

a1 b1 e1

a2 b2 e2

a3 b3 e3

= e1

a2 b2

a3 b3

− e2

a1 b1

a3 b3

+ e3

a1 b1

a2 b2

= e1(a2b3 − a3b2) − e2(a1b3 − a3b1) + e3(a1b2 − a2b1)

= (a2b3 − a3b2)e1 + (a3b1 − a1b3)e2 + (a1b2 − a2b1)e3

= a2b3

−a3b2

a3b1 − a1b3

a1b2 − a2b1

= a × b.

The formal result

a × b =

a1 b1 e1

a2 b2 e2

a3 b3 e3

is often used to memorize the component representation of the vector product.

Example 1.20 What is the volume of the parallelepiped spanned by a = 1

23, b = −

1

05,

and c =

11

−2

?

According to part (a) of the preceding remark, we obtain

V =

det

1 −1 12 0 13 5 −2

=

1 −1 12 0 13 5 −2

=

2 03 5

− 1 −13 5

− 2

1 −12 0

= |10 − 8 − 4| = 2.

Finally, we prove the so-called “bac-cab” rule for twofold vector products which is often usedin physics.

19



Theorem 1.21 (bac-cab rule) For a, b, c ∈ E 3, we have

a × ( b × c) = (a · c) b − (a · b)c = b(a · c) − c(a · b). (1.19)

Proof: We calculate the first component of a × ( b ×c) w.r.t. a coordinate system and denote itby (a × ( b × c))1:

(a × (

b × c))1 = a2(

b × c)3 − a3(

b × c)2= a2(b1c2 − b2c1) − a3(b3c1 − b1c3)

= b1(a2c2 + a3c3) − c1(a2b2 + a3b3).

Adding a1b1c1 to the last expression and subtracting a1b1c1 from that, we obtain

(a × ( b × c))1 = b1(a2c2 + a3c3 + a1c1) − c1(a2b2 + a3b3 + a1b1) = (a · c)b1 − (a · b)c1.

Hence, the first component of the left-hand side of (1.19) is equal to the first component of theright-hand side of (1.19). The analogous calculations for the other two components then proveEq. (1.19). 2

1.4 Straight Lines and Planes in Space

We look at some ideas concerning the description of geometrical objects in terms of equationsinvolving vectors. A straight line L in the two- or three-dimensional space P is determined byone of its points and by its direction, i.e., if a coordinate system or at least an origin is given, bythe position vector r0 of that point and a vector v lying in the line (cf. Figure 1.24). Denotingthe position vector of any point of L by r, we have that

r = r0 + tv (1.20)

where t ∈ R. Eq. (1.20) is called a parametric equation of the straight line L. If r0 is the

position vector of any other given point of L and v any other direction vector, we obtain asecond parametric equation of the same straight line, namely,

r = r0 + sv , (1.21)

s ∈ R. According to (1.20) and (1.21), the same point r—again notice the abuse of language—ischaracterized by different values of the parameters t and s.

Introducing the coordinates of the points and the components of the vectors, r =

x

y

z

,

r0 =

x0y0z0

, v =

v1v2v3

, the vectorial equation (1.20) of a line L in the three-dimensional

spaceP

3 is equivalent to the three equations

x = x0 + v1t

y = y0 + v2t

z = z0 + v3t.

In the two-dimensional space P 2, we have only two equations:

x = x0 + v1t (1.22)

y = y0 + v2t. (1.23)

Using (1.22) to eliminate the parameter t, we obtain t = x−x0v1

and

y − y0 =v2

v1(x − x0), (1.24)

20



provided that v1 = 0. Eq. (1.24) can be written in the usual form

y = ax + b (1.25)

where a = v2v1

is the slope of the line. One can also eliminate t by means of (1.23), yielding

x

−x0 =

v1

v2

(y

−y0),

provided that v2 = 0.Similarly, a plane H in the three-dimensional space P 3 is determined by the position vector

r0 of one of its points and two nonzero vectors u and v lying in the plane, u and v having neitherthe same nor the opposite direction, i.e., v = λu (Figure 1.25). Any point r of the plane can berepresented according to

r = r0 + su + tv (1.26)

where s, t ∈ R. Eq. (1.26) is called a parametric equation of the plane H . Again, r0, u, and v

are not uniquely determined by the plane.The fact that a plane H is also determined by a fixed point r0 and a normal vector n (Figure

1.26) leads to the normal equation of H ; by a normal vector of H we understand a nonzerovector perpendicular to the plane, not necessarily a unit vector. Namely, if r is any point of H ,the vector r − r0 lies in the plane and consequently

n · (r − r0) = 0. (1.27)

Writing this equation as n · r − n · r0 = 0 and introducing p := n · r0, we obtain

n · r = p (1.28)

which is called the normal equation of H . Eq. (1.28) is a consequence of (1.27). Conversely,(1.28) implies (1.27). In fact, since a given point r0 of H satisfies (1.28), it follows p = n

·r0 and

thus n · (r − r0) = 0.—Representing the vectors by their components, n = n1n2n3

, r = xy

z

,

Eq. (1.28) readsn1x + n2y + n3z = p. (1.29)

If n3 = 0, (1.29) can be written in the form

z = −n1

n3x − n2

n3y +

p

n3(1.30)

or, with suitable abbreviations,z = ax + by + c. (1.31)

Eq. (1.31) for a plane in P 3 is the analog of the common representation (1.25) of a straight linein the two-dimensional space P 2.

Next we discuss the geometrical meaning of the constant p in (1.28). Let r1 be the positionvector of that point of the plane that is closest to the origin of the coordinate system (Figure1.27). If the plane does not pass through the origin and the normal vector n is not directedtowards the origin, we have that

n · r = p = n · r1 = |n||r1| cos 0 = |n||r1| > 0.

Therefore, p > 0 and the distance of the plane H from the origin O is d = |r1| = p|n| = | p|

|n| . If H

does not pass through O and n is directed towards O, then

n · r = p = n · r1 = |n||r1| cos π = −|n||r1| < 0.

21



In this case, p < 0 and the distance of H from O is d = |r1| = − p|n| = | p|

|n| . Finally, if the planepasses through the origin,

n · r = p = n · 0 = 0

holds because O is a point of H and the corresponding position vector 0 satisfies (1.28). Hence,

d = 0 as well as p = 0.Summarizing, if p > 0, the normal vector n is not directed towards the origin, if p < 0, n is

directed towards the origin, and if p = 0, the plane passes through O. In each case,

d =| p||n| (1.32)

is the distance of the plane from the origin. If n is a unit vector, then d = | p|.For vectors n, r ∈ E 2 and a number p ∈ R, n · r = p is the equation of a straight line in

P 2 with normal vector n and distance | p||n| from the origin; in terms of components the equation

reads n1x + n2y = p or, if n2 = 0, y = −n1n2

x + pn2

(cf. Eqs. (1.24), (1.25), and (1.30)).Now we consider several typical problems of analytical geometry:

1. Line through two points r1, r2

Choosing r0 = r1 and v = r2 − r1 in Eq. (1.20), we obtain

r = r1 + t(r2 − r1).

2. Plane through three points r1, r2, r3

Eq. (1.26) and the choice r0 = r1, u = r2 − r1, and v = r3 − r1 yield

r = r1 + s(r2 − r1) + t(r3 − r1).

3. Normal equation of a plane from its parametric equation

Multiplying each side of (1.26) in the sense of the scalar product by u × v, it follows that

(u × v) · r = (u × v) · r0.

Defining n := u ×v and p := (u ×v) ·r0 where n is a normal vector of the plane, we obtain

n · r = p.

4. Distance d of a point R from a plane n · r = p

Let r1 be that point of the plane that is closest to R. Since the vector R−r1 is perpendicularto the plane, we have that

n · r1 = p (1.33)

R − r1 = λn (1.34)

where λ is some real number. The scalar equation (1.33) and the vectorial equation (1.34)are equivalent to a system of four real equations in four real unknowns. Multiplying (1.34)in the sense of the dot product by n and taking account of (1.33), we can eliminate r1.Thus,

n · R − p = λ|n|2,

i.e.,

λ =n

· R

− p

|n|2 . (1.35)

22



Since the sought distance is just d = | R − r1|, it follows from (1.34) that d = |λ||n|. Theresult (1.35) now implies

d =|n · R − p|

|n| . (1.36)

For R = 0, formula (1.36) reduces to (1.32).—To calculate the point r1, use (1.34) andinsert the value of λ according to (1.35).

5. Distance d of a point R from a straight line r = r0 + tv

Let r1 be that point of the straight line that is closest to R. Since the vector R − r1 isperpendicular to the line, we have that

r1 = r0 + t1v (1.37)

( R − r1) · v = 0 (1.38)

where t1 is the parameter value corresponding to r1. Replacing r1 in (1.38) by the right-hand side of (1.37), we obtain an equation to determine t1. Inserting the solution t1 into(1.37), we can calculate r1 and then d = | R − r1|.—If we are only interested in d and

not in the result for r1, we can proceed differently. Taking only (1.37) into account andmultiplying R − r1 = R − r0 − t1v

in the sense of the cross product by v, it follows that

( R − r1) × v = ( R − r0) × v. (1.39)

Since |( R − r1) × v| = | R − r1||v| sin π2 = | R − r1||v|, (1.39) implies

| R − r1||v| = |( R − r0) × v|.From this and

| R

−r1

|= d we conclude that

d =|( R − r0) × v|

|v| . (1.40)

6. Distance d of two skew straight lines r = r0 + tv and r = R0 + t w

Let r1 be that point of the first straight line and r2 be that point of the second straightline such that the vector r1 − r2 is perpendicular to both lines. Then d = |r1 − r2| and

r1 = r0 + t1v (1.41)

r2 = R0 + t2 w (1.42)

r1 − r2 = λv × w (1.43)where λ is some real number. Eqs. (1.41)–(1.43) constitute a system of nine real equationsin nine real unknowns. To conclude a formula for the distance d, we proceed similarly asin the context of (1.40). Eliminating r1 and r2 in (1.43) by means of (1.41) and (1.42), weobtain

r0 − R0 + t1v − t2 w = λv × w.

The multiplication of both sides of this equation by v × w in the sense of the dot productyields

(r0 − R0) · (v × w) = λ|v × w|2,

i.e.,

λ = (r0 − R0) · (v × w)|v × w|2

. (1.44)

23



Note that |v × w| = 0 since we have supposed that the lines are not parallel. From (1.43)it follows that d = |r1 − r2| = |λ||v × w|. Hence, by (1.44),

d =|(r0 − R0) · (v × w)|

|v × w| .

7. Intersection of a straight line r = r0 + tv and a plane n

·r = p

An intersection point r1 satisfies the equations

r1 = r0 + t1v (1.45)

n · r1 = p (1.46)

with some parameter value t1. Eliminating r1 in (1.46) by means of (1.45), we obtain

n · r0 + t1n · v = p

and in consequence

t1 =

p

−n

·r0

n · v,

provided that n · v = 0. Inserting t1 into (1.45), one can calculate r1.—The case n · v = 0means that the line is parallel to the plane or lies in the plane.

1.5 Exercises

1.1 The center of gravity of n points in space described by the position vectors r1, . . . , rn isdefined by

rG :=r1 + . . . + rn

n=

1

n

n

i=1

ri.

a) Consider the straight lines through the vertices of a triangle and the midpoints of theopposite sides. Prove that these lines intersect in one point which divides the line segmentsbetween the vertices and the midpoints in ratio 2 : 1 and which is the center of gravity of the vertices, resp., the center of gravity of the triangle.

b) Consider the straight lines through the vertices of a triangle being perpendicular to theopposite sides. Show that these lines intersect in one point.

c) Consider a tetrahedron in three-dimensional space and the lines through the vertices andthe centers of gravity of the opposite faces. Show that these lines intersect in the centerof gravity of the tetrahedron, dividing the line segments between the vertices and the face

centers in ratio 3 : 1.

1.2

a) Prove that the midpoints of an arbitrary quadrilateral are the vertices of a parallelogram.

b) Show that the spatial diagonals of a parallelepiped intersect in one point which devideseach diagonal into two halves of equal length.

1.3 Determine the lengths of the sides and the angles of a triangle whose vertices are

r1 = 2

−11 , r2 =

1

−3−5 , r3 =

3

−4−4 .

24



1.4 Prove that

a) |a ± b|2 = |a|2 ± 2a · b + | b|2

b) |a ± b| ≤ |a| + | b| (triangle inequality)

c) |r1 − r2| ≤ |r1 − r3| + |r2 − r3| (triangle inequality)

d) |a + b|2

+ |a − b|2

= 2|a|2

+ 2| b|2

(parallelogram identity)

e) a · b = 14

|a + b|2 − |a − b|2

.

1.5 Let the points with the position vectors

r1 =

10

−1

, r2 =

21

−3

, r3 =

−121

, r4 =

0−2

1

be given. Find

a) the parametric as well as the normal equation of the plane through the points r1, r2, and r3

b) the distance of the point r4 from that plane as well as the point of the plane nearest to r4

c) the distance of the point r4 from the straight line through r1 and r2 as well as that pointof the line nearest to r4

d) the distance of the line through r1 and r2 from the line through r3 and r4.

1.6 Determine

a) the intersection of the straight line

r = 0

41+ t 1

−2−3

and the plane4x − 3y + 5z = 0

as well as the angle between these

b) the intersection of the two planes

2x − y + z = 0 and x + 2y − z = 1

as well as the angle between these

c) the distances of the planes

−x + 3y + z = 2 and − 3x + 9y + 3z = −1

from the origin and each other.

1.7

a) Show that the volume of the tetrahedron spanned by three vectors a, b, and c not lying inone plane is

V =1

6

(a × b) · c

.

b) Calculate the volume of the tetrahedron whose vertices are the four points of Exercise 1.5.

25



Chapter 2

Elements of Linear Algebra

2.1 Systems of Linear Equations I

For reasons of some preparation, we begin our discussion of linear algebra with a preliminaryinvestigation of simultaneous systems of linear equations.

Definition 2.1 A system of m linear equations in n unknowns x1, . . . , xn ∈ R is given by

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

...

am1x1 + am2x2 + . . . + amnxn = bm,

briefly,n

j=1

aijx j = bi, i = 1, . . . , m ,

where aij ∈ R and bi ∈ R are given numbers. A solution is an n-tupel x =

x1...

xn

whose

components satisfy the system.If all bi are zero, we have a homogeneous system :

n j=1

aijx j = 0, i = 1, . . . , m .

Example 2.2

(a) The system

x1 + x2 = 2

−x1 + 2x2 = 1

has the only solution x2 = 1, x1 = 1, i.e., x =

11

.

(b) Consider the system

x1 + x2 + x3 = 1 (2.1)

x1 + x2 − x3 = 0 (2.2)

26



of two linear equations in three unknowns. The subtraction of the two equations yieldsx3 = 1

2 . Inserting this value into (2.1), we obtain

x1 + x2 =1

2; (2.3)

Eq. (2.2) implies the same. In consequence, x1 = 12 − x2 where x2 can take any real value

t, so x2 = t and x1 =12 − t. The solution vectors are

x =

x1

x2

x3

=

12 − t

t12

=

12012

+

−t

t

0

,

that is,

x =

12012

+ t

−110

(2.4)

where t ∈ R. From (2.3) we can also conclude that x2 = 12 − x1 where x1 can take any

real value s; therefore,

x =

01212

+ s

1−1

0

. (2.5)

Eqs. (2.4) and (2.5) are different representations of the same set of solutions. In fact, settingt = −s+ 1

2 , (2.4) transforms into (2.5). Geometrically, (2.4) and (2.5) can be interpreted astwo different parametric equations of the same straight line.—We emphasize that, e.g., asystem of three linear equations in three unknowns can also have infinitely many solutions.

(c) The system

2x1 + 5x2 = 2−6x1 − 15x2 = −1

obviously has no solution; in fact, the second equation is equivalent to 2x1 + 5x2 = 13 and

thus contradicts the first.

Summarizing, a system of linear equations can have exactly one solution or many solutionsor none. In the second case there are always infinitely solutions which, geometrically interpreted,form a straight line or some plane in an n-dimensional space, as we shall see later. It is onecentral topic in linear algebra to give statements on the existence and structure of the solutionsof systems of linear equations. The results are of far-reaching significance, for instance, in the

theory of linear differential equations. The following theorem is our first essential step into therealm of linear algebra, the statement, however, is intuitively clear.

Theorem 2.3 The homogeneous system

a11x1 + . . . + a1nxn = 0... (2.6)

am1x1 + . . . + amnxn = 0

has always the trivial solution x = 0 = 0.

.

.

0 . If n > m (i.e., more unknowns than equations),

there exist nontrivial solutions x = 0.

27



Proof: It is obvious that a homogeneous system has always the trivial solution. By inductionover m, we prove that a system of m equations in n > m unknowns has also nontrivial solutions.For m = 1, there is only one equation:

a1x1 + . . . + anxn = 0. (2.7)

If a1 = 0, there is a solution x = 0, e.g., x = 10...0

. If a1 = 0, it follows that

x1 = − 1a1

(a2x2 + a3x3 + . . . + anxn) with no restriction for x2, x3, . . . , xn. Choosing x2 = 1

and x3 = . . . = xn = 0, we obtain the solution x =

−a2a1

10...0

of (2.7).

Now assume that the statement is true for a system of m − 1 equations in more than m − 1unknowns. We have to show that the system (2.6) of m equations in more than m unknownshas a solution x = 0. If, on the one hand, all first coefficients of the equations (2.6) are zero, i.e.,

if a11 = a21 = . . . = am1 = 0, then again x = 1

0...0

is a nontrivial solution. If, on the other

hand, at least one of these coefficients is not zero, say ak1 = 0, we can solve the k-th equationof (2.6) for x1:

ak1x1 = −(ak2x2 + . . . + aknxn)

x1 = − 1

ak1

n j=2

akjx j . (2.8)

Inserting (2.8) into the other equations of (2.6),

ai1x1 + ai2x2 + . . . + ainxn, i = 1, . . . , m, i = k,

we obtain

− ai1

ak1

n j=2

akjx j +n

j=2

aijx j = 0

or, equivalently,n

j=2

aij − ai1

ak1akj

x j = 0 (2.9)

where i = 1, . . . , m, i = k. Eqs. (2.8) and (2.9) are equivalent to the system (2.6). By inductionhypothesis, the homogeneous system (2.9) of m − 1 linear equations in the unknowns x2, . . . , xn

does have a nontrivial solution, say,

x2...xn

= 0. Supplementing this solution by x1 according

to (2.8), we have constructed a nontrivial solution

x =

x1x2

...xn

= 0

of the system (2.6). Thus, the proof of the theorem is finished. 2

28



2.2 Vector Spaces

Linear algebra is founded on the famous vector-space axioms which are stated in the next defini-tion. The concept of vector space refers to a common structure of different concrete mathematicalobjects.

Definition 2.4 A vector space

V over the real numbers R is a set of elements for which an

addition and a multiplication by numbers are defined, i.e.,

x + y ∈ V for any x, y ∈ V λx ∈ V for any λ ∈ R and any x ∈ V ,

such that the following rules hold:

I. (i) for all x, y ∈ V , x + y = y + x (commutative law )

(ii) for all x, y , z ∈ V , (x + y) + z = x + (y + z) (associative law )

(iii) there exists a uniquely determined zero element 0 ∈ V such that for all x ∈ V ,x + 0 = x

(iv) for each x ∈ V , there exists a uniquely determined inverse element −x ∈ V such thatx + (−x) = 0

II. (i) for all λ ∈ R and all x, y ∈ V , λ(x + y) = λx + λy ( first distributive law )

(ii) for all λ, µ ∈ R and all x ∈ V , (λ + µ)x = λx + µx (second distributive law )

(iii) for all λ, µ ∈ R and all x ∈ V , λ(µx) = (λµ)x (mixed associative law )

(iv) for all x ∈ V , 1x = x.

The elements of V are called vectors and, in this context, the numbers are called scalars. Themultiplication of the vectors by numbers is often called the scalar multiplication .

The sum of more than two vectors is successively defined according to

x1 + x2 + x3 := (x1 + x2) + x3 = x1 + (x2 + x3),

etc. As in the case of numbers, one writes

x1 + x2 + . . . + xn =:n

i=1

xi.

From the vector-space axioms listed under I it follows that, for given vectors a, b ∈ V , theequation

a + x = b (2.10)

always has the unique solution

x = b + (−a) =: b − a.

Namely, adding −a to both sides of (2.10), we obtain

(−a) + (a + x) = (−a) + b. (2.11)

The application of the axioms (ii), (i), (iv), and (iii) of I to the left-hand side of (2.11) yields

(−a) + (a + x) = ((−a) + a) + x = (a + (−a)) + x = 0 + x = x + 0 = x;

hence, by (2.11), x = b + (−a) which is, by definition, the difference b − a. Moreover, thefollowing statements are valid.

29



Theorem 2.5

(a) Let λ ∈ R and x ∈ V . Then

λx = 0 ⇐⇒ λ = 0 or x = 0.

(b) For all x ∈ V ,(

−1)x =

−x.

Proof: If λ = 0, we have that λx = 0x = (0 + 0)x = 0x + 0x and so 0x = 0x + 0x. Adding−(0x) to both sides of the latter equation, we obtain 0 = 0x, i.e., 0x = 0. Similarly, if x = 0,λx = λ0 = λ(0 + 0) = λ0 + λ0, so λ0 = λ0 + λ0 and hence λ0 = 0.

Conversely, letλx = 0. (2.12)

If λ = 0, there is nothing to prove. If λ = 0, the multiplication of (2.12) by 1λ

yields

1

λ(λx) =

1

λ0. (2.13)

We already know that the right-hand side of (2.13) is the zero vector. Applying axioms (iii) and

(iv) of II to the left-hand side, we obtain1λ(λx) = 1

λλx = 1x = x. Hence, by (2.13), x = 0.To show statement (b), consider the equality chain 0 = 0x = (1 + (−1))x = 1x + (−1)x =

x + (−1)x which implies x + (−1)x = 0. Hence, by axiom I.(iv), (−1)x = −x. 2

Example 2.6

(a) Consider the set of all n-tupels of real numbers, i.e.,

Rn :=

x =

x1...

xn

x1, . . . , xn ∈ R

.

For any x, y

∈R

n and any λ

∈R, define

x + y =

x1...

xn

+

y1...

yn

:=

x1 + y1...

xn + yn

∈ Rn

and

λx = λ

x1...

xn

:=

λx1...

λxn

∈ Rn.

To show that Rn equipped with this addition and this scalar multiplication is a vectorspace, we have to verify the vector-space axioms. The commutative law and the associative

law are simple consequences of the corresponding laws for numbers:

x + y =

x1 + y1...

xn + yn

=

y1 + x1...

yn + xn

= y + x,

(x + y) + z =

x1 + y1...

xn + yn

+

z1...

zn

=

(x1 + y1) + z1...

(xn + yn) + zn

= x1 + (y1 + z1)

...xn + (yn + zn)

= x + (y + z).

30



The vectors

0 :=

0...0

, −x :=

−x1...

−xN

are the zero vector and the inverse of x ∈ RN according to axioms (iii) and (iv) of I. Theaxioms of II are again inherited from corresponding laws for numbers.

(b) The three- or two-dimensional Euclidean vector space E (E = E 3, resp., E = E 2) introducedgeometrically in Chapter 1 is a vector space in the sense of Definition 2.4, as Theorem1.2 shows. The vectors of E are defined as equivalence classes of ordered pairs of pointsand are usually represented by arrows (cf. Definition 1.1). The dimensions of E 2 and E 3are intuitively clear, in the next section we shall give a precise definition of the conceptof dimension. The attribute Euclidean refers to the fact that in the vector space E weknow—again intuitively—what the length of a vector and the angle between two vectorsis; correspondingly, we can think about a vector of E as a quantity determined by its lengthand its direction. It is essential to understand that in a general vector space the conceptslength and angle are not defined.

Given the three basis vectors e1, e2, e3 of a coordinate system, a vector x ∈ E 3 can becharacterized by its components and these can be summarized in terms of a column vector x ∈ R3:

x = x1e1 + x2e2 + x3x3 ←→ x =

x1

x2

x3

. (2.14)

In other words, there is a one-one correspondence between the vectors of E and the columnvectors of R3. With respect to a second coordinate system with basis vectors e1

, e2, e3

,we have

x = x1e1

+ x2e2

+ x3e3

←→x =

x1

x2

x3 . (2.15)

Note that in (2.14) and (2.15) x ∈ E 3 is the same vector whereas x ∈ R3 and x ∈ R3 aredifferent column vectors! Accordingly, (2.14) and (2.15) represent two different bijectivemaps between the vector spaces E 3 and R3.

In Chapter 1, we considered a coordinate system as given and kept it fixed. Thus, wewere allowed to identify the vectors x ∈ E 3 with the column vectors of R3; in this chapter,however, we distinguish between these two kinds of vectors and consider E 3 and R3 asdifferent vector spaces.

Finally, with respect to a given coordinate system (O; e1, e2, x3) in the Euclidean point space

P 3 we can represent the vectors by position vectors, thus obtaining a one-one cor-

respondence between points and vectors. Hence, there is also a one-one correspondence

between the points P ∈ P 3 and the column vectors x =

x1x2x3

∈ R

3; in fact, x1, x2, x3

are just the coordinates of P . The three bijective maps P ↔ x, x ↔ x, and P ↔ x dependon the coordinate system.

(c) This example shows the generality of the vector-space concept and teaches the studentwith knowledge from school that things are vector spaces he or she would never expect.Consider functions f : R → R, x → y = f (x); remember that f denotes the functionas a rule uniquely assigning numbers y to numbers x whereas f (x) denotes the assignednumber y, the value of the function at x. If g is a second such function with domain R, we

can define a third function h according to h(x) := f (x) + g(x) for all x ∈ R. This functionh is denoted by f + g; f + g is that function that associates each number x with the sum

31



f (x) + g(x) of the values of f and g at x. Similarly, if λ is a constant real number, thefunction λf associates x with the product λf (x).

In consequence, the set of the continuous functions on R for instance,

C 0(R) := {f : R → R | f continuous},

becomes a vector space by the pointwise defined sum of two functions,

(f + g)(x) := f (x) + g(x), (2.16)

and the pointwise defined product of a function by a number λ ∈ R,

(λf )(x) := λf (x). (2.17)

In fact, if f and g are continuous, then f + g and λf are continuous, and the vector-spaceaxioms are satisfied. We verify some of them explicitly, e.g., the associative law. From

((f + g) + h)(x) = (f + g)(x) + h(x) = ((f (x) + g(x)) + h(x)

= f (x) + (g(x) + h(x))

= f (x) + (g + h)(x)

= (f + (g + h))(x)

for all x ∈ R it follows that (f + g) + h = f + (g + h), i.e., the associative law. The zerovector is the function 0 that vanishes identically, i.e., 0(x) := 0 for all x ∈ R. The function−f defined by (−f )(x) := −f (x) for all x ∈ R is inverse to f with respect to the additionof functions. Finally, to show the first distributive law, consider

(λ(f + g))(x) = λ(f + g)(x) = λ(f (x) + g(x))

= λf (x) + λg(x)

= (λf )(x) + (λg)(x)

= (λf + λg)(x)

for all x ∈ R, which implies λ(f + g) = λf + λg.

(d) Let P (R) be the set of all real polynomials on R, i.e., p ∈ P (R) is a function of the form

p(x) = anxn + an−1xn−1 + . . . + a2x2 + a1x + a0

where n is any natural number and a0, a1, . . . , an are real constants. Defining the sum p1 + p2 of two polynomials and the product λp of a polynomial by a number pointwiseas in (2.16) and (2.17), P (R) becomes a vector space where the vector-space axioms areverified the same way as in the preceding example.

Definition 2.7 A nonempty subset S of a vector space V is called a subspace of V if

(i) for all x, y ∈ S , x + y ∈ S (ii) for all λ ∈ R and all x ∈ S , λx ∈ S .

The definition says that a subspace S ⊆ V is closed under the addition and scalar mul-tiplication defined in the vector space V so that S itself is equipped with an addition and amultiplication by numbers. Since S is supposed to be nonempty, it contains an element x and,as a consequence of the defining property (ii), the zero vector 0 = 0x. As a further consequenceof (ii),

S contains, with each x

∈ S , also the inverse vector

−x = (

−1)x. Thus, the vector-space

axioms (iii) and (iv) of I are satisfied, and all the other vector-space axioms hold in S becausethey hold in V . Hence, a subspace is a vector space itself.

32



Example 2.8

(a) In the Euclidean vector space E 3, all vectors that are parallel to a fixed plane constitutea subspace S . Representing all vectors of E 3 by position vectors w.r.t. the origin of acoordinate system and interpreting the position vectors as points, S becomes a planethrough the origin.

In fact, besides the trivial cases

{ 0

}and

E 3, the subspaces of

E 3 can be imagined as the

planes and straight lines through the origin of a coordinate system. A straight line or aplane not passing through the origin does not represent a subspace since, for instance, thesum of the position vectors of two points of such a plane corresponds to a point outsidethe plane. Another example of a subset of E 3 that is not a subspace is the set of all vectorswith a length less or equal than 1; this set corresponds to a ball of radius 1 centered atthe origin.

(b) The set S of all solutions x ∈ Rn of the homogeneous system (2.6) of linear equations is

a subspace of Rn. Namely, if x =

x1...xn

is one solution of (2.6) and y =

y1...yn

is

another one, then the addition of thei-th equation of (2.6) and the

i-th equation of (2.6)with y instead of x yields, for all i = 1, . . . , m,

a11(x1 + y1) + . . . + a1n(xn + yn) = 0

...

am1(x1 + y1) + . . . + amn(xn + yn) = 0;

that is, with x and y being solutions, x + y is also a solution. Furthermore, multiplyingthe equations (2.6) by λ ∈ R, we obtain

a11(λx1) + . . . + a1n(λxn) = 0

...

am1(λx1) + . . . + amn(λxn) = 0;

that is, λx is a solution if x is. Since the homogeneous system (2.6) has always the trivialsolution x = 0, S is in particular not empty. Hence, the solution set of a homogeneoussystem of n linear equations is a subspace of Rn.

(c) Consider one homogeneous linear equation in three unknowns,

a1x1 + a2x2 + a3x3 = 0, (2.18)

which is a particular case of the system (2.6). The solution vectors of (2.18) constitute asubspace of R3. Assuming that at least one of the coefficients a1, a2, a3 is not zero and

interpreting the solution vectors x ∈ R

3

w.r.t. a coordinate system as points, (2.18) isthe equation of a plane through the origin which, as we already know, corresponds to asubspace of E 3.

The inhomogeneous equation

a1x1 + a2x2 + a3x3 = b (2.19)

where b = 0 and not all coefficients are zero, has infinitely many solutions that do notconstitute a subspace of R3. Namely, the requirements (i) and (ii) of Definition 2.7 arenot fulfilled; for instance, if x, y ∈ R3 satisfy (2.19), x + y satisfies the equation

a1(x1 + y1) + a2(x2 + y2) + a3(x3 + y3) = 2b

which is different from (2.19). Geometrically speaking, the plane described by Eq. (2.19)does not pass through the origin and does consequently not represent a subspace of E 3.

33



(d) The vector space P (R) of all real polynomials is a subspace of the space C 0(R) of allcontinuous functions on R.

The next theorem gives a method to construct subspaces. Let v1, . . . , vm be a system of vectors of a vector space V . A vector x ∈ V is called a linear combination of v1, . . . , vm if x =

mi=1 λivi for some numbers λ1, . . . , λm ∈ R.

Theorem 2.9 Let V be a real vector space and let v1, . . . , vm ∈ V . The set of all linear combi-nations

x =mi=1

λivi

where λ1, . . . , λm are any real numbers, is a subspace S of V .

Proof: Since, e.g., v1 = 1v1 + 0v2 + . . . + 0vm, v1 is also a linear combination of v1, . . . , vm andthus v1 ∈ S . Consequently, S = ∅, i.e., the set S is not empty. If x, y ∈ S , then, according to

x + y =m

i=1

λivi +m

i=1

µivi =m

i=1

(λi + µi)vi,

x + y ∈ S . Further, if λ ∈ R and x ∈ R, then

λx =mi=1

(λλi)vi,

i.e., λx ∈ S . Hence, S is a subspace. 2

The subspace S of the theorem is called the subspace generated by v1, . . . , vm or spanned by v1, . . . , vm.

Example 2.10 Two nonzero vectors x, y ∈ E 3 where y = λx, λ ∈ R, span a subspace corre-sponding to a plane through the origin. If y is some multiple of x = 0, the spanned subspacecorresponds to a straight line through the origin, and if x and y are zero, the spanned subspaceis the trivial subspace { 0}.

2.3 Linear Independence, Bases, and Dimension

All our further investigations in linear algebra are based on the fundamental concept of lineardependence of vectors, respectively, linear independence.

Definition 2.11 A system v1, . . . , vm of vectors of a vector space V is called linearly independent if mi=1

λivi = 0

is possible only if λ1 = λ2 = . . . = λm = 0.

The system v1, . . . , vm ∈ V is linearly dependent if there exists numbers λ1, . . . , λm not all beingzero such that

mi=1

λivi = 0.

34



Remark 2.12

(a) For two linearly dependent vectors v1, v2 ∈ V we have

λ1v1 + λ2v2 = 0

for a nontrivial choice of the two coefficients. Assuming λ1 = 0, we obtain v1 = −λ2λ1

v2.

That is, in the Euclidean vector space E two vectors v1, v2 = 0 are linearly dependent if and only if they have the same or opposite direction.

(b) For three linearly independent vectors v1, v2, v3 ∈ V we have

λ1v1 + λ2v2 + λ3v3 = 0

for a nontrivial choice of the three coefficients. Assuming λ1 = 0, we obtain v1 =−λ2

λ1v2 − λ3

λ1v3. That is, in the Euclidean vector space E 3 three vectors v1, v2, v3 are lin-

early dependent if and only if they can be represented by arrows lying in one plane. Inparticular, in E 2 any three vectors are linearly dependent.

(c) If one of the vectors v1, . . . , vm is zero, the system is linearly dependent. If v1, . . . , vm islinearly dependent, every larger system v1, . . . , vm, vm+1, . . . , vn is also linearly dependent.

Example 2.13

(a) The vectors

011

,

021

,

153

∈ R3 are linearly independent. Namely,

λ

011

+ µ

021

+ ν

153

= 0

is equivalent to ν

λ + 2µ + 5ν

λ + µ + 3ν

=

000

,

resp.,

ν = 0

λ + 2µ + 5ν = 0

λ + µ + 3ν = 0.

That is,

ν = 0

λ + 2µ = 0

λ + µ = 0,

resp., ν = 0, µ = 0, and λ = 0.

(b) Are the vectors

111

,

0

−12

, and

341

∈ R

3 linearly independent or not? The

vectorial equation

λ 1

11+ µ 0

−12+ ν 3

41 = 0

35



yields

λ + 3ν = 0

λ − µ + 4ν = 0

λ + 2µ + ν = 0.

The latter system is equivalent to

λ + 3ν = 0

−µ + ν = 0

2µ − 2ν = 0,

resp.,

λ + 3ν = 0

µ = ν.

Hence, we have nontrivial solutions; one is, e.g., ν = 1, µ = 1, λ = −3. So the vectors arelinearly dependent.

(c) Consider V = Rn and let

e1 :=

10...0

, e2 :=

010...0

, . . . , en :=

0...01

. (2.20)

The system e1, . . . , en is linearly independent:

0 =n

i=1

λiei =

λ1

λ2...

λn

=

⇒λ1 = λ2 = . . . = λn = 0.

(d) Consider the vector space P (R) of the real polynomials and let qi ∈ P (R) be defined by

qi(x) := xi

where x ∈ R and i = 0, 1, 2, . . .. Note that every polynomial p ∈ P (R) is a linear combi-nation of the monomials qi:

p(x) = anxn + an−1xn−1 + . . . + a1x + a0

= anqn(x) + an−1qn−1(x) + . . . + a1q1(x) + a0q0(x)

= (anqn + an−1qn−1 + . . . + a1q1 + a0q0)(x),i.e.,

p = anqn + an−1qn−1 + . . . + a1q1 + a0q0 =n

i=0

aiqi.

The polynomials q0, q1, . . . , qn are linearly independent. In fact,

ni=0

λiqi = 0

means

0 = n

i=0

λiqi (x) =n

i=0

λiqi(x) =n

i=0

λixi

for all x ∈ R, hence, λ0 = λ1 = . . . = λn = 0.

36



The next two definitions and Theorem 2.18 are crucial.

Definition 2.14 A vector space V is called n-dimensional (briefly, dim V = n) if

(i) there exists a system of n linearly independent vectors v1, . . . , vn ∈ V (ii) every system of n + 1 vectors w1, . . . , wn+1 ∈ V is linearly dependent.

A vector space V is called infinite-dimensional if, for every n ∈ N, there exists a linearly inde-pendent system v1, . . . , vn ∈ V . The trivial vector space {0} has the dimension 0.

In other words, n = dim V is the maximal number of linearly independent vectors of V .

Example 2.15

(a) We prove that the dimension of Rn is n. By part (c) of Example 2.13 we already know thatthere are n linearly independent vectors in Rn, namely, the vectors e1, . . . , en according to(2.20). We have to show that any n + 1 vectors u1, . . . , un+1 ∈ Rn are linearly dependent.The equation

λ1u1 + . . . + λn+1un+1 = 0is equivalent to

λ1

u11...

u1n

+ λ2

u21...

u2n

+ . . . + λn+1

un+1,1...

un+1,n

=

0...0

,

resp.,

λ1u11 + . . . + λn+1un+1,1 = 0

...

λ1u1n + . . . + λn+1un+1,n = 0

where uij are the components of the column vectors ui. According to Theorem 2.3, thelatter homogeneous system of n linear equations in the n +1 unknowns λ j has a nontrivial

solution

λ1...

λn+1

= 0. Hence, u1, . . . , un+1 are linearly dependent.

(b) For the space of the polynomials, we have that dim P (R) = ∞. In fact, according to part(d) of Example 2.13, the n + 1 monomials q0, . . . , qn are linearly independent, for everyn

∈N.

Definition 2.16 A system v1, . . . , vn ∈ V is called a basis of V (or a basis in V ) if every vectorx ∈ V has a representation of the form

x =n

i=1

ξivi

where the coefficients ξi ∈ R are uniquely determined. The numbers ξi are the components of x

w.r.t. the basis v1, . . . , vn.

37



Example 2.17

(a) In the two-dimensional Euclidean vector space E 2, two unit vectors e1, e2 being perpen-dicular to each other form a basis since every vector x ∈ E 2 can uniquely be representedaccording to x = x1e2 + x2e2. The vectors v1 = e1 + e2 and v2 = −e1 + e2 obviouslyconstitute a new basis in E 2 (v1 and v2 are perpendicular to each other as well, but theyare not unit vectors). We calculate the components ξ1 and ξ2 of a vector x w.r.t. to the

new basis from those w.r.t. the old basis. From

x = x1e1 + x2e2 = ξ1v1 + ξ2v2

= ξ1(e1 + e2) + ξ2(e2 − e1)

= (ξ1 − ξ2)e1 + (ξ1 + ξ2)e2

it follows that x1 = ξ1 − ξ2 and x2 = ξ1 + ξ2, i.e.,

ξ1 =1

2(x1 + x2)

ξ2 =1

2

(x2

−x1).

(2.21)

The three vectors e1, e2, v1, for instance, are not a basis since the representation of anyvector x ∈ E 2 as a linear combination of these is not unique, as the equality

x = x1e1 + x2e2 + 0v1 =1

2x1e1 + (x2 − 1

2x1)e2 +

1

2x1(e1 + e2)

=1

2x1e1 + (x2 − 1

2x1)e2 +

1

2x1v1

shows. The vectors v1, v2 are perpendicular to each other, but are not unit vectors. Noticethat any two linearly independent vectors of E 2 form a basis, neither they have to beorthogonal nor unit vectors.

(b) In the three-dimensional Euclidean vector space E 3, three orthogonal unit vectors e1, e2, e3

are a basis; every vector x ∈ E 3 has the unique representation x = x1e1 + x2e2 + x3e3. Anythree linearly independent vectors of E 3 obviously constitute a basis, whereas two vectors,three vectors lying in one plane, or four vectors do not form a basis.

The following theorem states how the fundamental concepts of linear independence, dimen-sion, and basis are related.

Theorem 2.18 A system v1, . . . , vn ∈ V is a basis of V if and only if

(i) v1, . . . , vn is linearly independent

(ii) n = dim V (in particular, dim V < ∞).

Proof: The proof consists of two parts. First, we prove that the conditions (i) and (ii) implythat v1, . . . , vn is a basis.

Suppose the system v1, . . . , vn satisfies (i) and (ii). For any vector x ∈ V , the systemv1, . . . , vn, x is, according to (ii), linearly dependent. Let

ni=1

λivi + λx = 0 (2.22)

where not all of the coefficients λ1, . . . , λn, λ are zero. Assume λ = 0. It follows that ni=1 λivi =

0 and, by (i), λ1 = . . . = λn = 0. In consequence, λ1 = . . . = λn = λ = 0 which contradicts our

38



nontrivial choice of the coefficients in (2.22). Hence, λ = 0, and we can solve Eq. (2.22) for x,obtaining

x = − 1

λ

ni=1

λivi =n

i=1

−λi

λ

vi =

ni=1

ξivi

where ξi := −λiλ

. That is, every vector x is a linear combination of the vectors vi. It remains toshow the uniqueness of the coefficients ξi. The equality

x =n

i=1

ξivi =n

i=1

ηivi

impliesn

i=1(ξi−ηi)vi = 0; consequently, by (i) again, ξ1 = η1, . . . , ξn = ηn. Hence, the systemv1, . . . , vn is a basis of V .

Second, we have to prove that a basis v1, . . . , vn has the properties (i) and (ii).Suppose the system v1, . . . , vn is a basis. In particular, the zero vector has the unique

representation

0 =n

i=1

ξivi,

so all the coefficients ξi must be zero. Hence, the n vectors v1, . . . , vn are linearly independent.It remains to show that dim V = n. To that end, we prove that any system w1, . . . , wm ∈ V ,m > n, is linearly dependent. Inserting the representation

wi =n

j=1

aijv j ,

i = 1, . . . , m, intom

i=1

λiwi = 0, (2.23)

we obtain that

0 =mi=1

λi

n j=1

aijv j =mi=1

n j=1

λiaijv j =n

j=1

mi=1

λiaij

v j .

Since we already know that the vectors v1, . . . , vn are linearly independent, we conclude that

mi=1

aijλi = 0 (2.24)

for all j = 1, . . . , n. According to Theorem 2.3, the homogeneous system (2.24) of n linear

equations in the m > n unknowns λ j has a nontrivial solution

λ1...

λn+1

= 0. Hence, by the

equivalence of (2.23) and (2.24), w1, . . . , wm are linearly dependent; consequently, n = dim V . 2

Remark 2.19 If v1, . . . , vn and w1, . . . , wm are bases of V , then n = m = dim V .

Example 2.20

(a) From Example 2.15, part (a), we know that dimRn = n. Hence, the linearly independent

vectors e1, . . . , en introduced in part (c) of Example 2.13 are a basis of Rn. This can also

39



be seen directly. Namely, every x ∈ Rn is a unique linear combination of the vectors ei; infact,

x =

x1...

xn

=n

i=1

ξiei

if and only if ξi = xi. Among all the bases of Rn, the basis e1, . . . , en is distinguished by

the fact that the components of every vector x ∈ Rn are just the entries of the column;e1, . . . , en is called the canonical or the standard basis of Rn.

(b) As an example of another basis of Rn for n = 4, consider the four linearly independentvectors

1111

,

1110

,

1100

,

1000

of R4 which, according to the preceding theorem, are a basis of R4.

(c) The first two of the four vectors

e1 =

10

, e2 =

01

; v1 :=

11

, v2 :=

−11

of R2 constitute the canonical basis of R2, the second ones are linearly independent andform a second basis. For any x ∈ R2 we have

x =

x1

x2

= x1e1 + x2e2 = ξ1v1 + ξ2v2.

From

x1x2 = ξ1v1 + ξ2v2 = ξ1 1

1 + ξ2 −11 = ξ

1 −ξ

2ξ1 + ξ2

it follows x1 = ξ1 − ξ2, x2 = ξ1 + ξ2, respectively, Eqs. (2.21). In fact, the current exampleis analogous to Example 2.17, part (a).

Clearly, if S is a subspace of a finite-dimensional vector space V , then dim S ≤ dim V , anddim S = dim V if and only if S = V . Moreover, the following statement holds.

Theorem 2.21 Let S be a (nontrivial) subspace of a vector space V and dim S = m < n =dim V . Then every basis v1, . . . , vm of S can be supplemented to a basis

v1, . . . , vm, vm+1, . . . , vn

of V (in particular, v1, . . . , vm ∈ V , vm+1, . . . , vn ∈ V \ S ).

Proof: Take any basis v1, . . . , vm of V and choose any vector vm+1 ∈ V \ S (such a vector existssince S is a proper subspace of V because of dim S < dim V ). The system v1, . . . , vm, vm+1 islinearly independent. Namely, the assumption λm+1 = 0 in

mi=1

λivi + λm+1vm+1 = 0

implies

vm+1 = − 1λm+1

mi=1

λivi.

40



Since S is a subspace, the right-hand side of the latter equation is a vector of S , whereas vm+1 ∈S . Because of this contradiction we obtain λm+1 = 0, and because of the linear independence of v1, . . . , vm we conclude that λ1 = . . . = λm = 0. Hence, the vectors v1, . . . , vm, vm+1 are linearlyindependent.

If m + 1 = n, the theorem has been proved. If m + 1 < n, consider the subspace S 1 generatedby v1, . . . , vm, vm+1; S 1 is a proper subspace of V . Choose a vector vm+2 ∈ V \S 1 and show asabove that the system v1, . . . , vm, vm+1, vm+2 is linearly independent. Thus, after n

−m steps

of this kind, we obtain a basis v1, . . . , vm, vm+1, . . . , vn of V . 2

2.4 Linear Maps and Matrices

Besides the concepts of vector space and linear independence, the concept of a linear mappingis the most fundamental one. We motivate this important concept by the next example.

Example 2.22 Consider the rotation of the vectors of the two-dimensional Euclidean space E 2by an angle φ in the positive sense. Represent all vectors by arrows with the same beginning

point, say, as position vectors w.r.t. the origin of a coordinate system, and rotate the positionvectors counterclockwise around the origin by φ. For x ∈ E 2, call the rotated vector L(x); thatis, L : E 2 → E 2 is a map transforming the vectors into the rotated ones. It is evident that thesum x + y of two vectors coincides after rotation with the sum of the rotated vectors L(x) andL(y), i.e., L(x + y) = L(x) + L(y). Furthermore, L(λx) = λL(x) for any real number λ.

Definition 2.23 Let V and W be real vector spaces. A map

L : V → W x → L(x)

assigning a vector L(x)

∈ W to each vector x

∈ V , is called linear if

(i) for all x, y ∈ V , L(x + y) = L(x) + L(y)

(ii) for all λ ∈ R and all x ∈ V , L(λx) = λL(x).

A linear map L : V → V is also called a linear transformation and a linear map l : V → R alinear function .

The rotation L of Example 2.22 is a particular linear transformation with additional prop-erties; for instance, it preserves the lengths of vectors (i.e., |L(x)| = |x|) and the angles be-tween vectors. As follows from Definition 2.23, every linear map L : V → W preserves sumsand linear combinations, e.g., L(x + y + z) = L(x) + L(y) + L(z), L (

mi=1 xi) =

mi=1 L(xi),

L(λx + µy) = λL(x) + µL(y), and L(x − y) = L(x + (−1)y) = L(x) + (−1)L(y) = L(x) − L(y).The general statement reads

L

mi=1

λixi

=

mi=1

λiL(xi)

where λi ∈ R and xi ∈ V .—We consider some further examples for linear maps.

Example 2.24

(a) Let x =

x1x2x3

∈ R3. According to

L(x) := 3x1 + 2x2 − 4x3

x1 − x2 + 2x3

∈ R2,

41



a map L : R3 → R2 is defined. By an easy calculation we obtain

L(x + y) =

3(x1 + y1) + 2(x2 + y2) − 4(x3 + y3)

(x1 + y1) − (x2 + y2) + 2(x3 + y3)

= L(x) + L(y)

and

L(λx) = 3λx1 + 2λx2 − 4λx3

λx1 − λx2 + 2λx3 = λL(x),

that is, L is linear.

(b) The (orthogonal) projection p of a vector x ∈ E along a unit vector u ∈ E is

p = |x| cos φu = (u · x)u

where φ is the angle between x and u (cf. Example 1.12, part (c)). Keeping the unit vectoru fixed, a map L : E → E is defined by x → p =: L(x). The projection p depends linearlyon x, i.e., L is a linear map:

L(x + y) = (u

·(x + y))u = (u

·x)u + (u

·y)u = L(x) + L(y),

L(λx) = (u · (λx))u = λ(u · x)u = λL(x).

The linear map L is called the orthogonal projection onto the one-dimensional subspacespanned by u.

(c) Let x ∈ R4. According to

l(x) := x1 + x2 + 4x3 − 2x4 ∈ R,

a linear function l : R4 → R is defined.

(d) The definite integral b

af (x)dx of a continuous function f : [a, b]

→R is a real num-

ber. Hence, we can define a linear function l : C 0([a, b]) → R on the vector space of thecontinuous functions on [a, b] by

l(f ) :=

ba

f (x)dx.

Since l acts on vectors that are functions, l is also called a linear functional .

Linear maps acting between finite-dimensional vector spaces can be represented by matrices,as we are going to show. Let V be a vector space of dimension n, W a vector space of dimensionm, and L : V → W be a linear map. Choose a basis v1, . . . , vn in V and a basis w1, . . . , wm in

W . The images of the basis vectors v j under L can be decomposed with respect to the basis in

W ,L(v j) =

mi=1

aijwi, j = 1, . . . , n , (2.25)

and the coefficients aij can be summarized by an m × n matrix A:

A :=

a11 . . . a1n...

...am1 . . . amn

. (2.26)

Note that the first index of the entries aij counts the rows and the second index the columns.

The j-th column of A consists of the components of the vector L(v j); A is called the matrix of L w.r.t. v1, . . . , vn and w1, . . . , wm. By means of A, one can calculate the image y = F (x) of

42



Proof: Because of the preceding discussion, it only remains to show the converse statement of the theorem. But it is obvious that the map L defined by (2.31) is linear. 2

We emphasize that the matrix of a linear map depends on the bases chosen in V and W —likethe components of a vector depend on the basis. In Eq. (2.31) a matrix A and bases are usedto define a linear map L, but the vectors x, y = L(x), and the resulting map L do not dependon any bases. The comparison with (2.25) or (2.27) shows that the matrix of the linear mapdefined by (2.31) is again A.

In the case of a linear map L : Rn → Rm, i.e., V = R

n and W = R, one commonly workswith the canonical bases e1, . . . , en of Rn and e1, . . . , em of Rm:

e1 =

10...0

, . . . , en =

0...01

; e1 =

10...0

, . . . , em =

0...01

.

Note that the column vectors e j ∈ Rn have n entries whereas ei ∈ R

m have m entries. Sincethe entries of a column vector x

∈R

n coincide with its components w.r.t. the basis e1, . . . , en,

x coincides with the column vector X introduced in (2.28); analogously, y = L(x) = Y . Hence,when working with column vectors and the canonical bases, it follows that y = L(x) = Ax. Thecolumns of the matrix A of L are just the vectors L(e1), . . . , L(en).

Example 2.26

(a) Consider the rotation L : E 2 → E 2 of Example 2.22 and choose a positively orientedorthonormal basis e1, e2 in V = W = E 2. (An orthonormal basis in E 2 consists of twoorthogonal unit vectors. Any basis v1, v2 in E 2 is positively oriented if the vector v2 followsv1 counterclockwise.) It is geometrically evident that the rotated basis vectors are givenby

L(e1) = cos φe1 + sin φe2

L(e2) = − sin φe1 + cos φe2.

Therefore, according to (2.25) and (2.26),

A =

cos φ − sin φ

sin φ cos φ

(2.32)

is the matrix of the rotation w.r.t. the basis e1, e2. Consequently, for an arbitrary vectorx = x1e1 + x2e2, the components of the rotated vector y = L(x) can be calculated by Eq.(2.30), yielding

y1 = x1 cos φ − x2 sin φ

y2 = x1 sin φ + x2 cos φ.

Note that all vectors are referred to the same basis. W.r.t. a basis that is not orthonormaland positively oriented, the matrix of a rotation is different from (2.32).

(b) We are going to calculate the matrix of the projection map of Example 2.24, part (b),w.r.t. an orthonormal basis e1, e2, e3 in E 3. (An orthonormal basis in E 3 consists of threeorthogonal unit vectors.) From the decompositions

x = x1e1 + x2e2 + x3e3

u = u1e1 + u2e2 + u3e3

p = p1e1 + p2e2 + p3e3

44



and p = L(x) = (u · x)u

it follows that

p1 = (u · x)u1 = (u1x1 + u2x2 + u3x3)u1 = u21x1 + u1u2x2 + u1u3x3

and analogously

p2 = u2u1x1 + u22x2 + u2u3x3

p3 = u3u1x1 + u3u2x2 + u23x3.

In matrix denotation we can write p1

p2

p3

=

u21 u1u2 u1u3

u2u1 u22 u2u3

u3u1 u3u2 u23

x1

x2

x3

or, briefly, P = AX where

A =

u21 u1u2 u1u3

u2u1 u22 u2u3

u3u1 u3u2 u23

. (2.33)

Note that the matrix A of the projection map L is symmetric and that the entries areproducts of the components of the fixed unit vector u. Since the components of u depend onthe chosen orthonormal basis, A depends on that. For a basis that is not orthonormal, thematrix A looks different from (2.33), i.e., the entries of A are no longer simply aij = uiu j .

(c) The linear map L : R3 → R2 of Example 2.24, part (a), can directly be written in matrix

form:

L(x) = 3x1 + 2x2 − 4x3

x1 − x2 + 2x3

= 3 2 −41 −1 2

x1x2

x3

.

According to the remark preceding these examples, the matrix A :=

3 2 −41 −1 2

is just

the matrix of L w.r.t. the canonical bases of R3 and R2. This corresponds to the fact thatL(e1), L(e2), and L(e3) are the columns of A.

(d) Let L : R2 → R2 be the linear map defined by

L(x) :=

x1 + 2x2

3x1 + 4x2 =

1 23 4

x1

x2 = Ax. (2.34)

Clearly, A is the matrix of L w.r.t. the canonical basis of R2. What is the matrix A of L

w.r.t. the basis v1 =

11

, v2 =

−11

?

We have to look at the equationY = AX

where the entries of the column vectors X and Y are the components of x =

x1x2

=

ξ1v1 + ξ2v2 and y = L(x) =

y1y2

= η1v1 + η2v2 w.r.t. the basis v1, v2. According to

Example 2.20, part (c), we have

x1 = ξ1 − ξ2

x2 = ξ1 + ξ2,(2.35)

45



ξ1 =1

2(x1 + x2)

ξ2 =1

2(x2 − x1),

as well as

η1 =1

2(y1 + y2)

η2 = 12

(y2 − y1).

(2.36)

Using (2.36) andy1 = x1 + 2x2

y2 = 3x1 + 4x2

which is implied by Eq. (2.34), it follows that

Y =

η1

η2

=

12 (y1 + y2)12 (y2 − y1)

=

12 ((x1 + 2x2) + (3x1 + 4x2))12 ((3x1 + 4x2) − (x1 + 2x2))

,

i.e.,Y =

2x1 + 3x2

x1 + x2

.

Replacing x1 and x2 by ξ1 and ξ2 according to (2.35), we obtain

Y =

2(ξ1 − ξ2) + 3(ξ1 + ξ2)

(ξ1 − ξ2) + (ξ1 + ξ2)

=

5ξ1 + ξ2

2ξ1

=

5 12 0

ξ1

ξ2

,

i.e., Y = AX where

A =

5 12 0

.

(e) Let P 2(R) be the set of all polynomials of degree 3 or less. A polynomial p ∈ P 2(R) is of the form

p(x) = a0 + a1x + a2x2 (2.37)

where x ∈ R. The sum of two such polynomials and the product of such a polynomialand a number are a polynomials of the same type. Since, moreover, the addition andthe scalar multiplication in P 2(R) satisfy the vector-space axioms, P 2(R) is a vector space(and in fact a subspace of the vector space P (R) of all polynomials, cf. Example 2.6,part (d), Example 2.13, part (d), and Example 2.15, part (b)). Using the monomials

q0, q1, q2 defined by q0(x) = 1, q1(x) = x, and q2(x) = x2, it follows from (2.37) that p(x) = a0q(x) + a1q1(x) + a2q2(x) or briefly

p = a0q0 + a1q1 + a2q2. (2.38)

Hence, q0, q1, q2 is a basis of P 2(R) and the coefficients a0, a1, a2 of the polynomial p arethe components of p, considered as a vector, w.r.t. this basis. In particular, P 2(R) is athree-dimensional vector space.

Now we use the differentiation of polynomials to define a map that to each polynomial

p ∈ P 2(R) assigns its derivative p:

d

dx: P 2(R) → P 2(R)

p → p = ddx

p.

46



Here we understand the symbol ddx

as a denotation for the mapping p → p. Since ddx

is obviously a linear map, it can be represented by a matrix D w.r.t. the basis q0, q1, q2.From (2.37) it follows that p(x) = a1 + 2a2x, i.e.,

p = a1q0 + 2a2q1. (2.39)

According to (2.38) and (2.39), we associate the polynomials p and p with the column

vectors P = a0a1a2

and P = a12a2

0

. From P = DP and the obvious equality

a1

2a2

0

=

0 1 00 0 20 0 0

a0

a1

a2

we conclude that the matrix of d

dxw.r.t. the given basis is

D =

0 1 00 0 2

0 0 0

.

The composition of linear maps is closely related to the multiplication of matrices. We beginthis dicussion with the definition of the latter.

Definition 2.27 Let A be an m × n matrix and B an n × p matrix. The product C = AB isthe m × p matrix with the entries

cik =n

j=1

aijb jk , i = 1, . . . , m, k = 1, . . . , p . (2.40)

Note that the matrix product AB is only defined when the number of columns of A coincideswith the number of rows of B. The entry cik of C is some kind of scalar product of the i-th rowof A and the k-th column of B.

C = AB =

...

ai1 . . . ain

...

b1k

· · · ... · · ·bnk

=

...

. . . cik . . ....

m × n n × p m × p

Notice also that, since a column vector X ∈ Rn is an n × 1 matrix, the definition of the productof an m

×n matrix A by X

∈R

n is a particular case of Definition 2.27. That is, Eq. (2.29) is a

particular case of (2.40).

Example 2.28 We calculate the product of a 3 × 3 matrix A and a 3 × 2 matrix B, the resultbeing a 3 × 2 matrix:

AB =

1 0 23 −1 02 5 3

1 1−1 2

2 3

=

5 84 12 21

.

Theorem 2.29 Let L : V → W and K : W → U be linear maps. Then the composite mapK ◦ L : V → U is linear and, if L is represented by the m × n matrix B and K by the l × m

matrix A, K ◦ L is represented by the l × n matrix C = AB.

47



Proof: The linearity of L and K implies the linearity of K ◦ L according to

(K ◦L)(x + y) = K (L(x + y)) = K (L(x) + L(y)) = K (L(x)) + K (L(y)) = (K ◦L)(x) + (K ◦L)(y)

and(K ◦ L)(λx) = K (L(λx)) = K (λL(x)) = λK (L(x)) = λ(K ◦ L)(x)

where x, y

∈ V and λ

∈R.

Now let v1, . . . , vn ∈ V , w1, . . . , wm ∈ W , and u1, . . . , ul ∈ U be bases of the respective vectorspaces, let x be any vector of V , and consider the following decompositions:

x =n

k=1

ξkvk

y := L(x) =m

j=1

η jw j

z := (K ◦ L)(x) = K (L(x)) = K (y) =l

i=1

ζ iui.

If, w.r.t. the given bases, the linear maps K , L, and K ◦ L are represented by the matrices A,B, and C , then, according to Theorem 2.25,

η j =n

k=1

b jkξk, j = 1, . . . , m (2.41)

ζ i =m

j=1

aijη j , i = 1, . . . , l (2.42)

ζ i =n

k=1

cikξk, i = 1, . . . , l . (2.43)

Inserting (2.41) into (2.42), we obtain

ζ i =m

j=1

aij

nk=1

b jkξk =n

k=1

m j=1

aijb jk

ξk. (2.44)

Since x ∈ V is arbitrary, its components ξk can take all real values. Therefore, the comparisonof Eqs. (2.43) and (2.44) yields

cik =m

j=1

aijb jk ,

i.e., C = AB. 2

The first part of the following concluding remark is addressed to readers with strong interestin mathematics whereas the second part is important for everyone.

Remark 2.30

(a) Consider the set of all linear maps between the same two vector spaces V and W :

L(V , W ) := {L : V → W | L linear}.

The sum of two linear maps K :

V → W and L :

V → W is defined according to

(K + L)(x) := K (x) + L(x)

48



and the product of L by a real number is given by

(λK )(x) := λK (x).

It is easy to see that the maps K + L : V → W and λL : V → W are again linear; thatis, if K, L ∈ L(V , W ) and λ ∈ R, then K + L ∈ L(V , W ) and λL ∈ L(V , W ). Moreover,one easily verifies the validity of the vector-space axioms for

L(

V ,

W ); hence,

L(

V ,

W ) is

a vector space of linear maps.

Assuming that V and W are finite-dimensional spaces and choosing fixed bases in V andW , each linear map L ∈ L(V , W ) corresponds to an m × n matrix B; denote this one-onecorrespondence by L ↔ B. The set of all m × n matrices is also a vector space, denotedby Mmn. From K ↔ A, L ↔ B, and λ ∈ R it follows that K + L ↔ A + B and λL ↔ λB.That is, the vector spaces L(V , W ) and Mmn have completely the same structure, theyare isomorphic. In particular, they have the same dimension; consequently, L(V , W ) is anmn-dimensional vector space.

(b) For the multiplication of matrices, the following rules hold (Mmn denoting the vectorspace of the real m

×n matrices):

(i) for A ∈ Mmn and B ∈ Mnm, AB = BA in general (i.e., the matrix multiplication isnot commutative)

(ii) for all A ∈ Mmn, all B ∈ Mnp, and all C ∈ M pq, (AB)C = A(BC ) =: ABC

(associative law)

(iii) for all A, B ∈ Mmn and all C ∈ Mnp, (A + B)C = AC + BC (first distributive law)

(iv) for all A ∈ Mmn and all B, C ∈ Mnp, A(B + C ) = AB + AC (second distributivelaw)

(v) for all numbers λ ∈ R, all A ∈ Mmn, and all B ∈ Mnp, (λA)B = λ(AB) = A(λB) =:λAB (mixed associative law)

(vi) for all A ∈ Mmn and all B ∈ Mnp, (AB)T = BT AT (the superscript T denotingtransposition).

The proof of the rules of part (b) of the remark is left to the reader as an exercise which theconcerned reader, meanwhile equipped with mathematical experience, will find quite simple.

2.5 Kernel, Image, and Rank

The next concepts are useful to investigate the properties of linear maps. They have applicationsin the context of linear equations and the invertibility of matrices, the latter being defined inthis section later.

Definition 2.31

(a) Let L : V → W be a linear map. The kernel and the image of L are defined by

Ker L := {x ∈ V | L(x) = 0}Im L := {y ∈ W | y = L(x) for some x ∈ V}.

The image of L is the same as its range, Im L = RL.

(b) The rank of an m × n matrix A, briefly rank A, is the maximal number of the linearly

independent columns of A (where the columns are considered as vectors of R

m

, resp., asm × 1 matrices).

49



Remember that a (not necessarily linear) map L : V → W is called injective (or one-one)if L(x1) = L(x2) implies x1 = x2 (i.e., x1 = x2 implies L(x1) = L(x2)). The map L is calledsurjective (or a map onto W ) if, for each y ∈ W , y = L(x) for some x ∈ V . A map that is bothinjective and surjective is bijective.

Theorem 2.32 Let L : V → W be a linear map.

(a) The set Ker L ⊆ V is a subspace of V .(b) The set Im L ⊆ W is a subspace of W .(c) The linear map L is injective if and only if Ker L = {0}.

(d) The map L is surjective if and only if Im L = W .(e) If the vector spaces V and W are finite-dimensional and the linear map L is represented

by a matrix A w.r.t. some bases, then

rank A = dim Im L.

In particular, different matrices representing the same linear map w.r.t. different bases,have the same rank.

Proof: Since L(0) = L(x − x) = L(x) − L(x) = 0, 0 ∈ Ker L, and Ker L is not empty.If u, v ∈ Ker L, then L(u) = L(v) = 0 and consequently L(u + v) = L(u) + L(v) = 0, s ou + v ∈ Ker L. Similarly, λ ∈ R and u ∈ Ker L implies L(λu) = λL(u) = 0, i.e., λu ∈ Ker L.Hence, Ker L is a subspace of V .

From L(0) = 0 we further conclude that 0 ∈ Im L, so Im L = ∅. If y, z ∈ Im L, then y = L(u)and z = L(v) for some u, v ∈ V ; consequently, y + z = L(u) + L(v) = L(u + v), so y + z ∈ Im L.Similarly, λ ∈ R and y ∈ Im L implies λy = λL(u) = L(λu), i.e., λy ∈ Im L. Hence, Im L is asubspace of

W .

To show statement (c), suppose L is injective. The equation L(x) = 0 has the solution x = 0,and this is, since L is injective, the only solution. Hence, Ker L = {0}. Conversely, supposeKer L = {0}. The equality L(u) = L(v) implies L(u − v) = 0 and consequently, because of thesupposition, u − v = 0, i.e., u = v. Hence, L is injective.

Statement (d) is just a reformulation of the definition of a surjective map. In fact, L issurjective if and only if every y ∈ W is of the form y = L(x), x ∈ V . Equivalently, W = Im L.

To prove statement (e), let v1, . . . , vn be a basis of V and w1, . . . , wm a basis of W . Thematrix A of L w.r.t. these bases is defined according to

L(v j) =m

i=1

aijwi, j = 1, . . . , n

(cf. Eqs. (2.25) and (2.26)). Consider the equation

n j=1

λ jL(v j) = 0 (2.45)

where λ1, . . . , λn ∈ R. Because of

n j=1

λ jL(v j) =n

j=1

λ j

mi=1

aijwi =mi=1

n

j=1

aijλ j

wi,

50



(2.45) is equivalent ton

j=1 λ jaij = 0 for all i = 1, . . . , m. The latter statement means

n j=1

λ j

a1 j...

amj

= 0. (2.46)

Therefore, Eq. (2.45) can be satisfied only for λ1 = . . . = λn = 0 if and only if Eq. (2.46) can be

satisfied only for λ1 = . . . = λn = 0, and the vectors L(v1), . . . , L(vn) are linearly independentif and only if the columns of the matrix A are linearly independent. Moreover, by the sameargumentation, any subsystem of p ≤ n vectors of L(v1), . . . , L(vn) is linearly independent (resp.,dependent) if and only if the corresponding columns are linearly independent (resp., dependent).Hence,

rank A := maximal number of linearly independent columns of A

= maximal number of linearly independent vectors of L(v1), . . . , L(vn).(2.47)

Now observe that every vector y ∈ Im L is a linear combination of the vectors L(v j); namely,

from y = L(x) and x =

n j=1 ξ jv j it follows that y = L

n j=1 ξ jv j

, i.e.,

y =n

j=1

ξ jL(v j). (2.48)

If rank A = n, the vectors L(v1), . . . , L(vn) are, according to (2.47), linearly independent, andthe coefficients in (2.48) are uniquely determined. Hence, the vectors L(v1), . . . , L(vn) are a basisof Im L, and consequently dim Im L = n = rank A. If r := rank A < n, then, again according to(2.47), the system L(v1), . . . , L(vn) contains r linearly independent vectors whereas any r + 1vectors of L(v1), . . . , L(vn) are linearly dependent. Without loss of generality, assume that justthe first r vectors L(v1), . . . , L(vr) are linearly independent. The equation

r

j=1

λ jL(v j) + λkL(vk) = 0 (2.49)

where k = r + 1, . . . , n can be satisfied for a nontrivial choice of the coefficients λ1, . . . , λr, λk,and the same argumentation as used in the context of Eq. (2.22) shows that in particular λk = 0.In consequence, we can solve Eq. (2.49) for vk, obtaining that

L(vk) =r

j=1

αkjL(v j) (2.50)

with some coefficients αkj , k = r + 1, . . . , n. From Eqs. (2.48) and (2.50) it follows that

y =n

j=1

ξ jL(v j) =r

j=1

ξ jL(v j) +n

j=r+1

ξ jL(v j) =r

j=1

ξ jL(v j) +n

k=r+1

ξkL(vk)

=r

j=1

ξ jL(v j) +n

k=r+1

ξk

r j=1

αkjL(v j)

=r

j=1

ξ jL(v j) +r

j=1

n

k=r+1

αkjξk

L(v j)

=r

j=1

ξ j +

nk=r+1

αkjξk

L(v j).

That is, every y

∈Im L is a linear combination of the r linearly independent vectors L(v1), . . . , L(vr).

Hence, these vectors are a basis of Im L and dim Im L = r = rank A. 2

51



Example 2.33 We determine the kernel and the image of the linear map L : R2 → R2 given by

L(x) :=

x1 + 2x2

3x1 + 6x2

=

1 23 6

x1

x2

= Ax.

The equation L(x) = 0 is equivalent to the homogeneous system

x1 + 2x2 = 03x1 + 6x2 = 0

with the solutions x = λ −2

1

; hence,

Ker L =

x ∈ R2

x = λ

−21

, λ ∈ R

.

The vectors L(x) can be written according to

L(x) = x1 + 2x2

3x1 + 6x2 = x11

3 + x22

6 = x11

3 + 2x21

3 (x1 + 2x2)1

3 where the real number x1 + 2x2 can take any value; hence,

Im L =

y ∈ R2

y = µ

13

, µ ∈ R

.

The linear map L is neither injective nor surjective, and rank A = 1 = dim Im L. Moreover,

dim Ker L + dim Im L = 1 + 1 = 2 = dimR2 (2.51)

where R2 is in particular the vector space on which L is defined.

The result (2.51) is a particular case of a general result which is in fact the central statementon linear maps between finite-dimensional vector spaces.

Theorem 2.34 Let L : V → W be a linear map and let dim V = n (and dim W = m). Then

dimKer L + dim Im L = dim V = n.

Proof: If Im L = {0} (which is of course a trivial case), then L(x) = 0 for all x ∈ V and

Ker L = V , dim Ker L + dim Im L = n + 0 = n = dim V .

Assume Im L = {0}. Consequently, r := dim Im L ≥ 1. Choose a basis w1, . . . , wr of Im L andwritew1 = L(v1), . . . , wr = L(vr) (2.52)

with some vectors v1, . . . , vr ∈ V . Furthermore, choose a basis u1, . . . , us of Ker L (provided thatKer = {0}) and consider the system of the vectors

u1, . . . , us, v1, . . . , vr (2.53)

(if Ker L = {0}, consider only v1, . . . , vr). We show now that the system (2.53) has two particularproperties.

First, the vectors u1, . . . , us, v1, . . . , vr are linearly independent. Namely, let

λ1u1 + . . . + λsus + µ1v1 + . . . + µrvr = 0. (2.54)

52



Applying the linear map L to both sides of this equation, we obtain

λ1L(u1) + . . . + λsL(us) + µ1L(v1) + . . . + µrL(vr) = 0. (2.55)

Since u1, . . . , us ∈ Ker L, L(u1) = . . . = L(us) = 0. Because of (2.52), it follows from (2.55) that

µ1w1 + . . . + µrwr = 0.

This implies µ1 = . . . = µr = 0 since the vectors w1, . . . , wr are linearly independent becausethey are a basis of Im L. In consequence, Eq. (2.54) reduces to

λ1u1 + . . . + λrus = 0.

This implies λ1 = . . . = λs = 0 because of the basis property of u1, . . . , us. Hence, Eq. (2.54)can be satisfied only if λ1 = . . . = λs = µ1 = . . . = µr = 0, and the system (2.53) is linearlyindependent.

Second, the vectors u1, . . . , us, v1, . . . , vr generate the vector space V , i.e., every vector x ∈ V can be written as a linear combination of them. In fact, since w1, . . . , wr is a basis of Im L, itfollows that

L(x) = η1w1 + . . . + ηrwr

where η1, . . . , ηr ∈ R; further, taking account of (2.52) and using the linearity of L,

L(x) = η1L(v1) + . . . + ηrL(vr) = L(η1v1 + . . . + ηrvr).

Again by linearity, we obtain

L(x − η1v1 − . . . − ηrvr) = 0;

in consequence, x − η1v1 − . . . − ηrvr ∈ Ker L and

x

−η1v1

−. . .

−ηrvr = ξ1u1 + . . . + ξsus,

ξ1, . . . , ξs ∈ R. Hence,x = ξ1u1 + . . . + ξsus + η1v1 + . . . + ηrvr,

that is, every vector x ∈ V is a linear combination of the system (2.53).Because the system (2.53) is linearly independent, every vector x ∈ V is a unique linear

combination of the system (2.53). Hence, the vectors u1, . . . , us, v1, . . . , vr constitute a basis of V . Since every basis of V consists of n vectors, we conclude that

dim Ker L + dim Im L = s + r = n = dim V .

2

Remark 2.35

(a) In the proof of the theorem it has not been used that W is a finite-dimensional vectorspace. In fact, the statement of Theorem 2.34 holds for all linear maps L : V → W whereV is of dimension n < ∞, but W can be infinite-dimensional. However, the range of L isfinite-dimensional, namely, dim Im L = n − dim Ker L.

(b) Without proof, we mention the following interesting (and, at first sight, surprising) result.If A is an m × n matrix, then

rank A := maximal number of linearly independent columns of A= maximal number of linearly independent rows of A.

53



(The rows can be considered as 1 × n matrices which form an n-dimensional vector space,so the linear independence of the rows of A is defined.) If the m × n matrix A representsa linear map L : V → W , then

rank A := column rank A = row rank = dim Im L

(where the definition of the column rank , resp., the row rank is obvious).

We draw some simple, but important conclusions from Theorem 2.34.

Conclusion 2.36 Let L : V → W be a linear map, dim V = n, and dim W = m. Then

(a) if m < n, L cannot be injective

(b) if m > n, L cannot be surjective

(c) if m = n, L is injective if and only if L is surjective (i.e., if m = n, an injective or surjective linear map is automatically bijective)

(d) if m= n, L cannot be bijective.

Proof: Let m < n. From dim Im L ≤ m < n and Theorem 2.34 it follows that dim Ker L ≥ 1.Hence, Ker L = {0}, and, according to statement (c) of Theorem 2.32, L is not injective.

Let m > n. From Theorem 2.34 it follows that dim Im L ≤ n < m, and dimIm L < m =dim W implies that Im L is a proper subspace of W , i.e., Im L ⊆ W and Im L = W (briefly,Im L ⊂ W ). Hence, according to statement (d) of Theorem 2.32, L is not surjective.

Let m = n and L be injective. The latter is equivalent to Ker L = {0}. That is, dimKer L =0 or (again by Theorem 2.34), equivalently, dim Im L = n = m = dim W . The statementdimIm L = dim W is equivalent to Im L = W which means that L is surjective and consequentlyeven bijective.

If m= n, then, according to parts (a) and (b) of the conclusion, L is not injective or not

surjective and consequently not bijective. 2

The inverse of a bijective linear map is related to the inverse of a matrix. We begin thisdiscussion with the definition of the unit matrix and the inverse of a matrix.

Definition 2.37 The n × n unit matrix is defined by

I n :=

1 0 . . . 00 1 . . . 0...

.... . .

...0 0 . . . 1

,

also simply denoted by I . The entries of the unit matrix are denoted by the Kronecker symbol δij , i.e.,

δij =

1 if i = j

0 if i = j.

For any n × m matrix A and any m × n matrix B we have that

I nA = A

BI n = B.

54



Namely,

(I nA)ik =n

j=1

δija jk = aik

(BI n)ik =n

j=1

bijδ jk = bik,

(I nA)ik and (BI n)ik denoting the entries of the respective product matrices.Next let A be a quadratic n × n matrix and assume there exists an n × n matrix B such that

BA = AB = I n. (2.56)

As a preparation for the following definition, we show that B is uniquely determined. Namely,given A, let C be a second n × n matrix satisfying (2.56). From

BA = AB = I n

CA = AC = I n

it follows thatC = CI n = C (AB) = (CA)B = I nB = B,

i.e., C = B.

Definition 2.38 An n×n matrix is called invertible if there exists an n×n matrix B such that

BA = AB = I n.

The uniquely determined matrix B is called the inverse of A, briefly, B =: A−1. Thus, A−1A =AA−1 = I n.

We remark that an n × n matrix need not have an inverse, e.g., A = 1 0

0 0

. If A had an

inverse B =

b11 b12b21 b22

, the equation AB = I , i.e., the equation

1 00 0

b11 b12

b21 b22

=

1 00 1

,

would imply b11 b12

0 0

=

1 00 1

which is a contradiction. Hence, the inverse of A does not exist.

The theorem now states the relation between invertible linear maps and invertible matrices.

Theorem 2.39 Let L : V → W be linear, dim V = dim W = n, and let A be a corresponding n × n matrix. The following statements are then equivalent:

(i) A is invertible

(ii) L is bijective

(iii) Ker L = {0}(iv) the homogeneous linear system AX = 0, X

∈R

n, has only the trivial solution X = 0

(v) rank A = n.

55



Moreover, the inverse matrix A−1 corresponds to the inverse map L−1 : W → V which is alsolinear.

Proof: Assume L is bijective. First of all, we show that L−1 is also linear. Let y, z ∈ W .Since L is bijective, there exists uniquely determined vectors x, u ∈ V such that y = L(x) and

z = L(u). In consequence, y + z = L(x + u) and

L−1(y + z) = x + u = L−1(y) + L−1(z). (2.57)

Now let y ∈ W and λ ∈ R. Then y = L(x), λy = L(λx), and consequently

L−1(λy) = λx = λL−1(y). (2.58)

Hence, by (2.57) and (2.58), L−1 is linear.Let A be the matrix of L and B the matrix of L−1 w.r.t. a basis in V and a basis in W .

According to Theorems 2.25 and 2.29, the equations

L−1(L(x)) = x

L(L−1(y)) = y,(2.59)

resp.,

(L−1 ◦ L)(x) = x

(L ◦ L−1)(y) = y

read in matrix representation

BAX = X

ABY = Y

where the column vectors X, Y

∈R

n represent x

∈ V and y

∈ W . The last two equations can

be written asBAX = I nX

ABY = I nY.(2.60)

Since Eqs. (2.59) hold for all x ∈ V and all y ∈ W , Eqs. (2.60) hold for all X, Y ∈ Rn;consequently, BA = I n as well as AB = I n. Hence, the inverse A−1 exists, and A−1 = B

corresponds to the linear map L−1.Now assume the matrix A is invertible. Let y ∈ W be arbitrary and consider the equation

y = L(x) (2.61)

which is equivalent to Y = AX . For every Y

∈R

n, the latter equation is uniquely solved by

X = A−1Y . Therefore, (2.61) has always a unique solution x ∈ V , and the linear map L isbijective.

It remains to show the equivalence of the statements (ii)–(v). Because of dimV = dim W and Conclusion 2.36, part (c), the linear map L is bijective if and only if it is injective; thelatter is, by Theorem 2.32, part (c), equivalent to Ker L = {0}. Statement (iii) means that theequation L(x) = 0 has only the trivial solution x = 0, i.e., AX = 0 has only the trivial solutionX = 0. Finally, Ker L = {0} if and only if dim Ker L = 0, that is, according to Theorem 6.32,part (e) and Theorem 2.34, rank A = dim Im L = n − dimKer L = n. 2

The first part of the following remark completes the rules for calculations with matriceswhereas the second part is again addressed to readers with strong interests in mathematics.

Remark 2.40

56



(a) We supplement the rules (i)–(vi) of part (b) of Remark 2.30 by two further rules:

(vii) for any two invertible matrices A, B ∈ Mnn, (AB)−1 = B−1A−1

(viii) for any invertible matrix A ∈ Mnn, (A−1)−1 = A.

In fact, from

(B−1

A−1

)(AB) = B−1

(A−1

(AB)) = B−1

((A−1

A)B) = B−1

(I nB) = I n

and(AB)(B−1A−1) = I n

it follows that (AB)−1 exists and (AB)−1 = B−1A−1. From

AA−1 = A−1A = I n

it is obvious that (A−1)−1 exists and (A−1)−1 = A.

(b) In Definition 2.38 the two conditions BA = I n and AB = I n have been used to define theinvertibility of the n

×n matrix A. In fact, one of the two conditions is sufficient, i.e., one

implies the other one. For instance, if, for an n × n matrix A, there is an n × n matrix Bsatisfying

BA = I n, (2.62)

then B = A−1. To prove this statement, define a linear map K : Rn → Rn according

to K (x) := Ax and a linear map L : Rn → Rn according to L(x) := Bx. Eq. (2.62) is

equivalent to BAx = x for all x ∈ Rn. The latter means (L ◦ K )(x) = x for all x, i.e.,

L(K (x)) = x (2.63)

for all x. From Eq. (2.63) it follows that the map K is injective. Namely, K (x1) = K (x2)implies L(K (x1)) = L(K (x2)) and, by (2.63), x1 = x2. Hence, K is injective and, by part

(c) of Conclusion 2.36, even bijective. Eq. (2.63) then states that L = K −1; consequently,the matrix A is invertible and B = A−1.

Finally, we discuss how the inverse of a matrix can be calculated. The most convenientmethod is a version of Gauss–Jordan elimination . According to part (b) of the preceding remark,the inverse of an invertible n × n matrix A is uniquely determined by the matrix equation

AA−1 = I n. (2.64)

Denoting the entries of A by aij and the entries of A−1 by xij , (2.64) is equivalent to the n2

linear equationsn

j=1

aijx jk = δik, (2.65)

i, k = 1, . . . , n. For k = 1, these equations read explicitly

a11x11 + . . . + a1nxn1 = 1

a21x11 + . . . + a2nxn1 = 0...

an1x11 + . . . + annxn1 = 0

and constitute a system of n linear equations in the n unknowns x11, . . . , xn1. We can write this

system as the vector equationAX 1 = e1

57



where X 1 is the first column of A−1 and e1 the first vector of the canonical basis of Rn. Fork = 2, . . . , n, we obtain analogous equations; in fact, the equations (2.65) are equivalent to then vector equations

AX 1 = e1, AX 2 = e2, . . . , AX n = en (2.66)

involving the columns of A−1 and the canonical basis of Rn. The n systems (2.66) of the

respective n linear equations can be solved simultaneously by Gauss–Jordan elimination. Thecorresponding augmented matrix is

A e1 . . . en

=

A I n

=

a11 . . . a1n...

...an1 . . . ann

1 . . . 0...

. . ....

0 . . . 1

. (2.67)

Since (2.64)–(2.67) are equivalent, the elimination procedure yields a unique result, namely

1 . . . 0...

. . ....

0 . . . 1

x11 . . . x1n...

...xn1 . . . xnn

=

I n

A−1

,

i.e.,

A−1 =

x11 . . . x1n...

...xn1 . . . xnn

.

Example 2.41 Determine the inverse of the matrix

A =

1 2 30 −3 −6

−1 2 0

.

Solving the homogeneous system Ax = 0 (e.g., by Gauss-Jordan elimination), one verifies thatx = 0 is the only solution. Hence, according to Theorem 2.39, A−1 exists. The correspondingGauss-Jordan elimination procedure yields 1 2 3

0 −3 −6−1 2 0

1 0 00 1 00 0 1

⇐⇒ 1 2 3

0 −3 −60 4 3

1 0 00 1 01 0 1

⇐⇒

1 2 30 1 20 1 3

4

1 0 00 −1

3 014 0 1

4

⇐⇒

1 2 30 1 2

0 0 − 54

1 0 00 − 1

3 014

13

14

⇐⇒

1 2 30 1 2

0 0 1

1 0 00 −1

3 0

−15 − 4

15 −15

⇐⇒ 1 2 0

0 1 0

0 0 1

85

45

35

25

15

25

−15 − 4

15 −15

⇐⇒

1 0 0

0 1 0

0 0 1

45

25 − 1

525

15

25

−15 − 4

15 − 15

,

i.e.,

A−1

=

1

5 4 2 −1

2 1 2−1 − 43 −1

.

58



2.6 Systems of Linear Equations II

We now draw important conclusions for simultaneous systems of linear equations where we shallessentially use the results of the preceding section. We write a system of m linear equations inn unknowns,

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

...

am1x1 + am2x2 + . . . + amnxn = bm,

briefly in matrix form, i.e.,Ax = b, x ∈ Rn, b ∈ Rm (2.68)

where A is an m × n matrix. Introducing the linear map L : Rn → Rm, L(x) := Ax, (2.68) canbe rewritten as

L(x) = b.

The first step in our discussion of the solutions of a system of linear equations is the followingsimple, but important theorem.

Theorem 2.42 The general solution of Ax = b is given by the sum of a fixed particular solution x0 (if there is any) and any homogeneous solution xh, i.e.,

x = x0 + xh

where Ax0 = b and Axh = 0. (If there is no particular solution, then there is no solution of Ax = b.)

Briefly, if Ax0 = b, x0 fixed, then

{x ∈ Rn

| Ax = b} = {x ∈ Rn

| x = x0 + xh, Axh = 0} =: x0 + Ker L.

Since L(x) = Ax, the kernel of L just consists of the solutions xh of the homogeneous systemAx = 0 of linear equations. The denotation x0 +Ker L is a short writing for the set of all vectorsof the form x = x0 + x1 with x0 fixed and x1 ∈ Ker L, i.e., for the set of all vectors x = x0 + xh.

Proof of 2.42: Adding the two equations

Ax0 = b

Axh = 0,

it follows that A(x0 + xh) = b, i.e., x = x0 + xh is a solution of the the inhomogeneous system

Ax = b. Conversely, let x and x0 be solutions of the inhomogeneous system. Subtracting

Ax = b

Ax0 = 0,

we obtain A(x − x0) = 0, i.e., xh := x − x0 is a solution of the homogeneous system. In conse-quence, any solution of Ax = b is of the form x = x0 + xh. 2

By the definition of the rank of a matrix (Definition 2.31, part (b)), it is clear that r :=rank A ≤ n. By part (e) of Theorem 2.32, we know that r = rank A = dimIm L. FromIm L

⊆R

m it follows that r = rank A = dimIm L

≤dimR

m = m; thus, r = rank A

≤m (this

is also implied by part (b) of Remark 2.35). Furthermore, according to Theorem 2.34, we havethat dim Ker L + dim Im L = n. Hence, r ≤ n, m, r = n − dim Ker L, and dimKer L = n − r.

59



Choosing a basis v1, . . . , vn−r of Ker L, we can write every solution of Ax = b as

x = x0 + xh = x0 + t1v1 + . . . + tn−rvn−r

where t1, . . . , tn−r ∈ R. This representation of the solutions is similar to the parametric equationof a straight line or a plane (cf. Eqs. (2.20) and (2.26)). Thus, we have proved the followingresult.

Conclusion 2.43 The solutions of a system of linear equations of rank r in n unknowns form an (n − r)-dimensional plane in Rn through x0 (provided that there are solutions). If r = n, thisplane is degenerated to a point.

We investigate the question when a system of linear equations does have a solution (thismeans at least one). The system Ax = b has a solution if and only if there exists an x ∈ R

n

such that b = Ax = L(x), i.e., if and only if b ∈ Im L. If r < m, i.e., if dim Im L < m, then Im L

is a proper subset of Rm; in this case it can happen that b ∈ Im L and there is no solution. If r = m, then dimIm L = m and Im L = R

m; consequently, b ∈ Im L, and Ax = b has a solution.Moreover, we have the following criterion.

Theorem 2.44 The system Ax = b has a solution if and only if

rank A = rank(A | b)

where

(A | b) =

a11 . . . x1n b1...

......

xn1 . . . xnn bn

is the augmented matrix of the system.

Proof: First assume that r = rank A = rank(A|

b). Then there are r linearly independentcolumns of A, say, C i1, . . . , C ir , but the r + 1 columns C i1, . . . , C ir , b are linearly dependent. So

λ1C i1 + . . . + λrC ir + λb = 0

where not all coefficients are zero, in particular, λ = 0. Therefore, we can solve the equation forb:

b = −λ1

λC i1 − . . . − λr

λC ir .

Observing that C ij = Aeij where j = 1, . . . , r and eij is the i j-th canonical basis vector, weobtain

b =

−

λ1

λ

Aei1

−. . .

−

λr

λ

Aeir = Aλ1

λ

ei1

−. . .

λr

λ

eir .

Hence, we have constructed a solution of Ax = b, namely, x0 := λ1λ

ei1 − . . . − λrλ

eir .Conversely, assume that Ax = b has a solution, say x =

ni=1 xiei. Then

b = A

n

i=1

xiei

=

ni=1

xiAei =n

i=1

xiC i. (2.69)

Again, let C i1, . . . , C ir a system of r = rank A linearly independent columns. Since every largersystem C i1, . . . , C ir , C k of columns of A is linearly dependent, every column C k, k = 1, . . . , m,k = i1, . . . , ir, is a linear combination of C i1, . . . , C ir . According to (2.69), b can also be writtenas a linear combination of C i1, . . . , C ir (in fact, these columns form a basis of Im L). Hence, the

system C i1 , . . . , C ir , b is linearly dependent and consequently rank (A | b) = rank A. 2

60



We remark that either rank (A | b) = rank A or rank (A | b) = rank A + 1. We now give asummarizing discussion of the different cases that can occur in the context of the solution of asystem of linear equations.

Case 1: The rank of A coincides with the number of rows, i.e., r = rank A = m. Thisis possible only if n ≥ m because r ≤ m, n. Since the matrix (A | b) has also m rows, itfollows that rank (A | b) ≤ m. Therefore, m = rank A ≤ rank(A | b) ≤ m, which implies thatrank A = rank (A

|b). Hence, according to the preceding theorem, Ax = b has a solution. This

result can also be concluded from dim Im L = rank A = m, as indicated above after Conclusion2.43. Furthermore, the general solution of Ax = b has n − r parameters and is unique if r = n.A matrix satisfying m = r = n is invertible, so the unique solution is

x = A−1b. (2.70)

Case 2: The rank of A is smaller than the number of rows, i.e., r = rank A < m. Then twosubcases are possible. First, rank A = rank(A | b). Then Ax = b has a solution, and the generalsolution involves n − r parameters. If r = n, the solution is unique, but cannot be representedby (2.70) since n = r < m and A−1 is not defined. Moreover, because m > r = rank (A | b) andr is also the maximal number of linearly independent rows of A (cf. Remark 2.35, part (b)),

m − r rows of the augmented matrix (A | b) can be represented by r linearly independent ones.Since each row of the augmented matrix corresponds to an equation of Ax = b, the system of the m linear equations is equivalent to a system of r linear equations and m − r equations areunnecessary.—The second subcase is r = rank A < rank(A | b) = r + 1. In this case m − (r + 1)equations are unnecessary; however, neither Ax = b nor the reduced system has a solution.

Example 2.45 Let A be a 7 × 4 matrix of maximally possible rank, i.e., r = 4, and considerthe system Ax = b, b ∈ R7. If rank(A | b) = 4, then there is exactly one solution of the system,and three equations are unnecessary. If rank (A | b) = 5, then the system of the seven equationscan be reduced to an equivalent system of five equations, but there is no solution.

Summarizing, if r = m, Ax = b has a solution. If r < m, Ax = b has a solution if andonly if r = rank(A | b). Whenever there exists a solution, the general solution contains n − r

parameters (and is unique if n = r).

2.7 Remarks on the Scalar Product

In the Euclidean vector space E (the symbol E means E 3 or E 2, cf. Definition 1.1), we definedthe scalar product of two vectors by x · y := |x|| y| cos φ (Definition 1.8). According to Theorem1.10, the scalar product has the following properties:

(i) symmetry : x · y = y · x

(ii) linearity in the second argument : x · (y + z) = x · y + x · z, x · (λy) = λ(x · y); linearity in the first argument : (x + y) · z = x · z + y · z, (λx) · y = λ(x · y)

(iii) positive definiteness: x · x ≥ 0, x · x = 0 if and only if x = 0.

In abstract linear algebra, these properties are used to define a scalar product in a general vectorspace V ; we do not consider this general definition.

According to the discussion following Definition 1.4, three orthogonal unit vectors e1, e2, e3

constitute a basis of E 3; then, by Theorem 2.18, three orthogonal unit vectors are linearlydependent, and the dimension of E 3 is three. Moreover, again by Theorem 2.18, any linearlyindependent system v1, v2, v3 in E 3 is a basis of E 3. More than three vectors of E 3 are necessarily

linearly dependent, but does there exist an orthogonal system of more than three nonzero vectors

61



of E 3? Obviously not; we can prove this as follows. Let v1, . . . , vm be an orthogonal system of nonzero vectors of E 3, i.e.,

vi = 0, vi · v j = 0 (2.71)

where i, j = 1, . . . , m and i = j. The equation

λ1v1 + . . . + λmvm = 0

implies thatvi · (λ1v1 + . . . + λmvm) = 0.

Using (2.71), it follows that λivi · vi = 0, i.e., λi = 0 for all i = 1, . . . , m. Hence, the systemv1, . . . , vm is linearly independent and thus m ≤ 3.

Definition 2.46 A system of vectors e1, e2, e3 ∈ E 3 satisfying

ei · e j = δij =

0, i = j

1, i = j

is called an orthonormal basis of (in) E 3.

The orthonormal bases of E 3 constitute a distinguished class of bases in E 3 and are usedfor convenience. Let e1, e2, e3 be an orthonormal basis in E 3 and let x be any vector of E 3.Multiplying the equation

x = x1e1 + x2e2 + x3e3.

in the sense of the scalar product by ei, we obtain ei · x = xi, i.e.,

xi = ei · x = |x| cos αi (2.72)

where αi is the angle between x and ei. The scalar product of two vectors x, y ∈ E 3 reads interms of components

x

·y = (x1e1 + x2e2 + x3e3)

·(y1e1 + y2e2 + y3e3) = x1y1 + x2y2 + x3y3, (2.73)

the length of x is given by

|x| =√

x · x =

x21 + x2

2 + x23, (2.74)

and the distance d of two points with position vectors x and y is

d = |x − y| =

(x1 − y1)2 + (x2 − y2)2 + (x3 − y3)2. (2.75)

For a nonorthonormal basis v1, v2, v3, formulas (2.72)–(2.75) become more complicated. Theanalog of (2.73), for instance, is

x·

y = 3

i=1

ξivi ·3

j=1

η jv j =3

i,j=1

ξiη j vi

·v j =

3

i,j=1

gijξiη j

where gij := v1 · v j .It is clear how Definition 2.46 and the results presented for E 3 read in the case E 2.—Next we

introduce a scalar product in Rn.

Definition 2.47 The scalar product in Rn is defined according to

x · y := x1y1 + . . . + xnyn =n

i=1

xiyi = xT y

where x, y ∈ Rn and xT y is the product of the matrices xT = (x1 . . . xn) and y = y1

..

.yn

.

62



It easy to show that the scalar product in Rn satisfies the same rules as the scalar productin E 3; in particular, it has the following properties:

(i) symmetry : x · y = y · x

(ii) bilinearity : x·(y +z) = x·y +x·z, x·(λy) = λ(x·y); (x+y)·z = x·z +y ·z, (λx) ·y = λ(x·y)

(iii) positive definiteness: x·

x≥

0, x·

x = 0 if and only if x = 0.

The following definition is analogous to statements (2.74) and (2.75) as well as to Definition2.46.

Definition 2.48 One defines

(i) the length (Euclidean norm) of x ∈ Rn by

|x| :=√

x · x =

x21 + . . . + x2

n

(ii) the distance of two points x, y

∈R

n by

d := |x − y| :=√

x · x =

(x1 − y1)2 + . . . + (xn − yn)2

(iii) a system of vectors a1, . . . , an ∈ Rn being an orthonormal basis of Rn if

ai · a j = δij .

The concepts of scalar product, length, and distance in R3 are closely related to their coun-terparts in E 3. Let e1, e2, e3 and e1

, e2, e3

be two orthonormal bases of E 3 and let x be anyvector of E 3. According to

x = x1e1 + x2e2 + x3e3 = x1e1 + x2e2 + x3e3

the vector x can, w.r.t. the basis e1, e2, e3, be represented by the column vector x :=

x1x2x3

∈ R3

and, w.r.t. the other basis e1, e2

, e3, by the column vector x :=

x

1

x

2

x

3

∈ R

3. Note that

x = x = x = x, but |x| = |x| = |x|. Moreover, if

y = y1e1 + y2e2 + y3e3 = y1e1 + y2e2

+ y3e3

is a second vector of

E 3 and y :=

y1y2

y3 , y := y

1

y

2

y

3 , then

x · y = x1y1 + x2y2 + x3y3 = x1y1 + x

2y2 + x3y3 = x · y = x · y.

Example 2.49

(a) The vectors v1 := 1√5

12

and v2 := 1√

5

2−1

satisfy v1·v1 = 1, v2·v2 = 1, and v1·v2 = 0,

so they form an orthonormal basis of R2. What are the components of x :=

11

w.r.t.

v1, v2?

From x = ξ1v1 + ξ2v2 it follows that

11 = ξ1 1√

5 1

2+ ξ2 1√

5 2−1

,

63



that is,

1 =1√

5(ξ1 + 2ξ2)

1 =1√

5(2ξ1 − ξ2).

From these equations we obtain ξ1 = 3√5 and ξ2 = 1√5 . Using the fact that v1, v2 is anorthonormal basis, we can can find this result easier:

ξ1 = v1 · x =1√

5

12

·

11

=

3√5

ξ2 = v2 · x =1√

5

2

−1

·

11

=

1√5

.

(ii) The canonical basis e1, . . . , en of Rn is an orthonormal basis.

A nontrivial vector space has infinitely many bases, and in general no basis is distinguished.

In a vector space in which a scalar product is defined (in our context, E 2, E 3, and Rn), there is adistinguished class of bases, namely, the orthonormal bases. In the Euclidean vector spaces E 2and E 3, there is, among the orthonormal bases, again no distinguished basis; however, the vectorspace Rn has, due to its structure, a distinguished (orthonormal) basis, namely, its canonicalbasis. If one chooses an orthonormal basis e1, e2, e3 in E 3 and refers every vector x ∈ E 3 to this

fixed basis, x = x1e1 + x2e2 + x3e3, then x can be identified with its representative x ∈ R3,

x =

x1x2x3

; that is, E 3 and R3 can be considered as the same vector spaces. The identification

of E 3 and R3 (w.r.t. an orthonormal basis of E 3!) makes sense because

(i) for any two vectors x, y ∈ E 3 with representatives x, y ∈ R3, x + y corresponds to x + y,

(ii) for any vector x with representative x and any number λ ∈ R, λx corresponds to λx,

(iii) for any two vectors x and y with representatives x and y, x · y = x1y1 + x2y2 + x3y3 = x · y.

In Chapter 1 on elementary vector algebra, we identified E 3 andR3. Moreover, if a fixed Cartesiancoordinate system (O; e1, e2, e3) (not necessarily right-handed) in the three-dimensional affine-Euclidean space P 3 of points is given, then every point P ∈ P 3 can be identified with its position

vector x = OP -

. Since x can be identified with x, the point P can finally be identified with the

column vector x =

x1x2x3

.

Conversely, by means of a three-dimensional coordinate system, every column vector x

∈R

3,

x = x1x2x3

, can be interpreted as a vector x ∈ E 3 with components x1, x2, x3 or as a point

P ∈ P 3 with coordinates x1, x2, x3.—We emphasize that in general it neither makes sense toidentify the vector spaces E 3 and R

3 nor to identify E 3 and P 3. We see this clearly by ourExercises 2.26 and 2.28, for instance, and by physics: The laws of physics cannot depend on thechoice of the coordinate system, so a coordinate-free formulation of the laws is necessary. Forthis reason vectorial physical quantities are described by vectors of E 3, and not of R3. Linearrelations between vectorial physical quantities are linear transformations L : E 3 → E 3 in the senseof linear algebra, in physics these are often called tensors (although in mathematics the conceptof tensor is somehow more general); w.r.t. a coordinate system, such a tensor is represented bya 3

×3 matrix, but one has to distinguish between a tensor and a matrix (a matrix is a trivial

concept, a tensor or a linear transformation is not).

64



2.8 Determinants

To motivate determinants, we consider the solution of the system

a11x1 + a12x2 = b1 (2.76)

a21x1 + a22x2 = b2 (2.77)

of two linear equations. Multiplying Eq. (2.76) by a22 and (2.77) by a12, we obtain

a11a22x1 + a12a22x2 = a22b1

a12a21x1 + a12a22x2 = a12b2.

The subtraction of the equations yields

x1 =a22b1 − a12b2

a11a22 − a12a21, (2.78)

provided that the denominator is not zero. Similarly, we find

x2 = a11b2 − a21b1a11a22 − a12a21

; (2.79)

the uniquely determined values of the unknowns give the unique solution x =

x1x2

. If a11a22 −

a12a21 = 0, Eqs. (2.78) and (2.79) do not make sense. In fact, a11a22 − a12a21 = 0 implies thata11a12

= a21a22

=: λ (provided that a12 and a22 are not zero), and the matrix A of the coefficients of the system (2.76,2.77) reads

A =

a11 a12

a21 a22

=

λa12 a12

λa22 a22

;

that is, the rank of A is one, and the system of the two linear equation has no solution orinfinitely many ones. If a12 = 0 or a22 = 0, then rank A = 1 also.The number a11a22 − a12a21 obviously plays an important role; it is called the determinant

of the 2 × 2 matrix A (cf. Remark 1.17) and is written as

det A =

a11 a12

a12 a22

:= a11a22 − a12a21. (2.80)

By means of this definition, formulas (2.78) and (2.79) can be rewritten according to

x1 =

b1 a12

b2 a22

a11 a12

a12 a22

, x1 =

a11 b1

a21 b2

a11 a12

a12 a22

. (2.81)

We investigate the essential properties of 2 × 2 determinants. From Definition (2.80) it followsthat

(i)

det

a + a b

c + c d

= (a + a)d − b(c + c) = ad − bc + ad − bc

= deta c

c d + deta b

c d 65



(ii)

det

λa b

λc d

= λad − λbc = λ(ad − bc) = λ det

a b

c d

(iii)

det a bc d

= ad − bc = −(bc − ad) = − det b ad c

(iv)

det I 2 = det

1 00 1

= 1.

Properties (i) and (ii) say that det A is linear in the first column , property (ii) says that det A

is alternating w.r.t. the columns, and property (iv) is a normalization property . However, det A

is not linear in A; in fact, in general det(A + B)

= det A + det B and det λA = λ2 det A.

Requiring properties (i)–(iv) for general n × n determinants, we obtain, by means of thefollowing theorem, a very aesthetic definition of general n × n determinants.

Theorem/Definition 2.50 To each n × n matrix A = (C 1, . . . , C n) =

R1

.

.

.

Rn

, C i denoting

the columns and Ri the rows, i = 1, . . . , n, one can assign a number det A uniquely such that

(i) det A is linear in the first column, i.e.,

det(C 1 +

C 1, C 2, . . . , C n) = det(C 1, C 2, . . . , C n) + det(

C 1, C 2, . . . , C n)

det(λC 1, C 2, . . . , C n) = λ det(C 1, C 2, . . . , C n)

(ii) the interchange of two columns of A changes the sign of det A, i.e.,

det(C 1, . . . , C i, . . . , C j , . . . , C n) = − det(C 1, . . . , C j , . . . , C i, . . . , C n)

(iii) det I n = 1.

The number

det A =

a11 . . . a1n...

...an1 . . . ann

is called the determinant of A.

We do not prove this theorem and the statements on determinants we present now; however,we indicate some reasons why the rules are as they are.

Statements, Properties, and Rules

1. It is not very hard to show that an association A → det A satisfying the conditions (i)–(iii)stated in the theorem, is necessarily given by

det A =

n

i1,...,in=1

(i1, . . . , in) ai11ai22 . . . ainn (2.82)

66



where

(i1, . . . , in) :=

0 if any two of the indices i1, . . . , in are equal1 if i1, . . . , in is an even permutation of 1, . . . , n

−1 if i1, . . . , in is an odd permutation of 1, . . . , n .

Conversely, from (2.82) follow the conditions (i)–(iii) stated in the theorem, thus provingthe existence of an association A

→det A satisfying these conditions. Hence, formula

(2.82) is equivalent to the statement of Theorem 2.50 and is often used as the definitionof an n × n determinant.

An arrangement i1, . . . , in of the numbers 1, . . . , n is called an even (odd) permutation of 1, . . . , n if i1, . . . , in can be obtained from 1, . . . , n by interchanging two numbers an even(odd) number of times. As an example, we calculate a 3 × 3 determinant. From (2.82)and the table

i1 i2 i3 (i1, i2, i3)

1 2 3 11 3 2 −12 1 3

−1

2 3 1 13 1 2 13 2 1 −1

it follows thata11 a12 a13

a21 a22 a23

a31 a32 a33

=

(2.83)

a11a22a33

−a11a32a23

−a21a12a33 + a21a32a13 + a31a12a23

−a31a22a13.

Formula (2.80) for 2 × 2 determinants is of course the particular case of (2.82) for n =2. One can use (2.82) to calculate any n × n determinant, but for n > 3 this requiresmuch work. There are more suitable methods which are consequences of (2.82) and arementioned below. The main meaning of (2.82) is that it is the key to prove many resultson determinants.

2. The determinant of a matrix is also given by

det A =n

i1,...,in=1

(i1, . . . , in) a1i1a2i2 . . . anin . (2.84)

The comparison of (2.82) and (2.84) shows that

det A = det AT, (2.85)

i.e., the determinant of a matrix does not change under transposition.

Applying (2.84) to a 3 × 3 determinant, we obtaina11 a12 a13

a21 a22 a23

a31 a32 a33

=

(2.86)

a11a22a33

−a11a23a32

−a12a21a33 + a12a23a31 + a13a21a32

−a13a22a31

which in fact coincides with the result (2.83).

67



3. The following so-called Sarrus’ rule applies only to the determinants of 3 × 3 matrices.Copy the first column of the matrix as a fourth column and the second as a fifth, as writtenin (2.87). Multiply the three entries in each diagonal indicated by and add the threeproducts, then multiply the three entries in each diagonal indicated by and subtractthese three products from the sum calculated first; the result is the 3 × 3 determinant, asone sees by comparison with (2.86).

+ + +a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32

− − −

(2.87)

4. The association A → det A is linear w.r.t. every column or row of A. The linearity w.r.t.every column follows from statements (i) and (ii) of Theorem 2.50, the linearity w.r.t. therows is then a consequence of (2.85).

5. The association A → det A is alternating w.r.t. the columns as well as w.r.t. the rows of A, i.e., the interchange of any two columns or any two rows (but not the interchange of acolumn with a row) changes the sign of det A.

6. For an n × n matrix, the following statements are equivalent:

(i) det A = 0

(ii) the columns of A are linearly dependent

(iii) the rows of A are linearly dependent.

We prove the equivalence of the three statements and show first that (ii) implies (i). If

the columns C 1, . . . , C n are linearly dependent, then the equation ni=1 λiC i = 0 can besatisfied for a nontrivial choice of the coefficients. Without loss of generality, assumeλ1 = 0. Then

C 1 = − 1

λ1

ni=2

λiC i =n

i=2

− λi

λ1

C i;

therefore, using the linearity of det A in the first column,

det A = det(C 1, . . . , C n) = det

n

i=2

− λi

λ1

C i, C 2, . . . , C n

=

n

i=2

−λi

λ1 det(C i, C 2, . . . , C i, . . . , C n)

= 0.

The last step in this equality chain follows from the fact that the determinant of matrix iszero if two columns coincide. This is a consequence of the alternating property; we have,for instance,

det(C i, C 2, . . . , C i, . . . , C n) = − det(C i, C 2, . . . , C i, . . . , C n)

where the first and the i-th column have been interchanged, both being equal to C i. Sodet(C i, C 2, . . . , C i, . . . , C n) = 0.

Now let det A = 0 and assume that the columns C 1, . . . , C n are linearly independent.Then the columns C 1, . . . , C n constitute a basis of Rn and the columns D1, . . . , Dn of

68



any other matrix B are linear combinations of the C i. By linearity, det B is a linearcombination of the determinants of the matrices (C i1, . . . , C in); by the alternating property,the determinant of each (C i1 , . . . , C in) is equal to det A, − det A, or 0. Since det A = 0,the determinant of all matrices (C i1, . . . , C in) is zero; consequently, det B = 0, whichis a contradiction because B is arbitrary. Hence, the columns C 1, . . . , C n are linearlydependent.

By means of (2.85), the equivalence of statements (i) and (ii) implies the equivalence of (i) and (iii).

7. For an n × n matrix, the following statements are equivalent:

(i) det A = 0

(ii) rank A = n

(iii) A−1 exists

(iv) Ax = 0 has only the trivial solution

(v) Ax = b has a unique solution for every b ∈ Rn.

According to point 6, det A = 0 is equivalent to the linear independence of the columnsthe matrix A, i.e., equivalent to rank A = n. The rest follows essentially from Theorem2.39.

8. The determinant of a matrix is not changed by adding a multiple of any column to anyother column or a multiple of any row to any other row. For instance,

det A =

a11 . . . a1n...

...an1 . . . ann

=

a11 . . . a1n...

...an1 . . . ann

+

a11 λa11 a13 . . . a1n...

......

...an1 λan1 an3 . . . ann

=

a11 a12 + λa11 a13 . . . a1n

... ... ... ...an1 an2 + λan1 an3 . . . ann

;

in fact, what has been added to det A is λ det(C 1, C 1, C 3, . . . , C n) = 0.

9. From (2.86) we obtain

det A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

= a11(a22a33 − a23a32) − a12(a21a33 − a23a31) + a13(a21a32 − a22a31)

= a11 a22 a23a32 a33

− a12 a21 a23a31 a33

+ a13 a21 a22a31 a32

(cf. Eq. (1.18)). The 3 × 3 determinant det A has been expanded w.r.t. the first row . Thisis a particular case of Laplace’s expansion theorem , stating that, for an n × n matrix,

det A =n

j=1

(−1)i+ jaij det Aij , i = 1, . . . , n , (2.88)

det A =n

i=1

(−1)i+ jaij det Aij , j = 1, . . . , n , (2.89)

where Aij is the (n − 1) × (n − 1) matrix obtained from A by deleting the i-th row and the j-th column. Choosing any i and keeping it fixed, formula (2.88) can be used to reduce the

69



calculation of det A to the calculation of n (n−1)×(n−1) determinants; det A is expanded w.r.t. the i-th row . According to (2.89), det A can also be expanded w.r.t. the j-th column .The number det Aij is called the subdeterminant w.r.t. (i, j); the sign of aij det A is givenby the chess-board rule.

An upper (lower) triangular matrix is a matrix where all entries above (below) the diagonalare zero. The determinant of a triangular matrix is the product of the diagonal entries,

as we show for a lower triangular matrix. Namely, expanding w.r.t. the first column, weobtain

a11 a12 a13 . . . a1n

0 a22 a23 . . . a2n

0 0 a33 . . . a3n...

......

. . ....

0 0 0 . . . ann

= a11

a22 a23 . . . a2n

0 a33 . . . a3n...

.... . .

...0 0 . . . ann

= a11a22

a33 . . . a3n

.... . .

...0 . . . ann

= a11a22a33 . . . ann.

10. For the calculation of 3 × 3 determinants, the expansion in terms of 2 × 2 determinants is

suitable. For the numerical calculation of larger determinants, again Gauß elimination isuseful:

a11 . . . a1n

.... . .

...an1 . . . ann

=

a11 a12 . . . a1n

0 b22 . . . b2n...

.... . .

...0 bn2 . . . ann

=

a11 a12 a13 . . . a1n

0 b22 b23 . . . b2n

0 0 c33 . . . c3n...

......

. . ....

0 0 0 . . . xnn

= a11b22c33 . . . xnn.

11. The determinant-multiplication theorem states that

det AB = det A det B

where A and B are n × n matrices.

12. If the inverse of an n×n matrix A exists, then it follows from the determinant-multiplicationtheorem that det A det A−1 = det AA−1 = det I n = 1. In consequence, det A = 0 and

det A−1 =1

det A.

It is already clear by point 7 that the existence of A−1 implies that det A = 0; in addition,point 7 says that det A = 0 also sufficient for the existence of A−1.

13. If det A = 0, then, again according to point 7, the system Ax = b of linear equationshas a unique solution, namely, x = A−1b. Cramer’s rule now states that the solution

x =

x1...xn

is given by

xi =

a11 . . . a1,i−1 b1 a1,i+1 . . . a1n

......

......

...an1 . . . an,i−1 bn an,i+1 . . . ann

det A

, i = 1, . . . , n;

(2.81) is a particular case of this rule. Cramer’s rule is mainly of theoretical interest since,beginning with n ≥ 3, its application to solving systems of linear equations requires too

70



many steps of calculation; besides this, it applies only to systems of n linear equations inn unknowns with a unique solution. Gauß-Jordan elimination requires many fewer stepsof calculation and applies to every system of linear equations.

14. The inverse of a matrix can also be represented in terms of determinants. If det A = 0,then

A−1 = transpose of the matrix with the entries (

−1)i+ j det Aij

det A

(2.90)

where det Aij is the subdeterminant w.r.t. (i, j); (−1)i+ j detAij

detA is called the cofactor w.r.t.(i, j). The representation (2.90) is closely related to Cramer’s rule, and it is also mainly of theoretical interest. Beginning with n ≥ 3, the calculation of the inverse of a 3 × 3 matrixaccording to (2.90) is tedious; the application of Gauß-Jordan elimination is again muchmore suitable for numerical purposes. The case n = 2 is simple; in fact, we find

A−1 =1

det A

a22 −a12

−a21 a11

.

15. Finally, determinants also have a geometrical meaning. Namely, if a, b, c is a right-handed

system of vectors of E 3, e1, e2, e3 a right-handed orthonormal basis of E 3, and

a = a1e1 + a2e2 + a3e3

b = b1e1 + b2e2 + b3e3

c = c1e1 + c2e2 + c3e3,

then, according to Remark 1.19,

a1 b1 c1

a2 b2 c2

a3 a3 c3

is the volume of the parallelepiped spanned by the vectors a, b, and c (cf. also Remark

1.17). By definition, an n × n determinant is, up to the sign, the volume of n-dimensionalparallelotop.

2.9 Eigenvalue Problems

We finish our study of linear algebra with the so-called eigenvalue problems of linear transforma-tions which play an important part in such different fields like geometry, differential equations,mechanics, and quantum mechanics.

Definition 2.51 Let

V be a real vector space and L :

V → V , x

→y = L(x), be a linear

transformation. A number λ ∈ R is called an eigenvalue of L if there exists a vector u = 0 suchthat

L(u) = λu. (2.91)

The vector u = 0 is called an eigenvector belonging to the eigenvalue λ. The set S λ of alleigenvectors belonging to λ together with the zero vector, obviously being a subspace of V , iscalled the eigenspace belonging to λ.

Now let dim V = n and let Y = AX be the matrix representation of y = L(x) w.r.t. a basisof V . In matrix form the eigenvalue equation (2.91) reads

AU = λU (2.92)

where λ is also called an eigenvalue of A and U ∈ Rn, U = 0, an eigenvector of A.

71



Without the requirement u = 0, every real number λ would be an eigenvalue since Eq. (2.91)is always satisfied for u = 0. So, for an eigenvalue λ, it makes sense to call only the nontrivialsolutions u of (2.91) eigenvectors. However, the number 0 can be an eigenvalue; λ = 0 is aneigenvalue if there exists a vector u = 0 such that L(u) = 0u, i.e., if

L(u) = 0. (2.93)

By definition, the eigenspace S λ belonging to an eigenvalue λ consists of all correspondingeigenvectors and u = 0 because without the zero vector, S λ would not be a subspace. We quicklyshow that S λ is a subspace; we have to verify the conditions of Definition 2.7. Since S λ containsan eigenvector as well as the zero vector, S λ is not empty. If u1, u2 ∈ S λ, then L(u1) = λu1 andL(u2) = λu2. From these two equations it follows that

L(u1 + u2) = L(u1) + L(u2) = λu1 + λu2 = λ(u1 + u2),

i.e., u1 + u2 ∈ S λ. Similarly, if µ ∈ R and u ∈ S λ, then

L(µu) = µL(u) = µλu = λ(µu),

i.e., µu ∈ S λ. Hence, S λ is a subspace of V .—If λ = 0 is an eigenvalue of L, then the vectors of the corresponding eigenspace satisfy Eq. (2.93), which means that the eigenspace coincides withKer L (cf. Definition 2.31).

Example 2.52

(a) Let L : R3 → R3 be the linear transformation defined by the matrix

A =

2 1 00 1 −10 2 4

according to L(x) := Ax. The eigenvalue problem of L, L(u) = λu, reads

Au = λu (2.94)

(compared with the general situation and Eq. (2.92), we have in this example u = U ). Eq.(2.94) can be rewritten as

(A − λI )u = 0. (2.95)

The number λ is an eigenvalue if this equation has nontrivial solutions u = 0. Accordingto Theorem 2.39 and statement 7 of Section 2.8, (2.95) has nontrivial solutions if and onlyif the matrix A − λI is not invertible, i.e., if and only if det(A − λI ) = 0. Hence, we can

find the eigenvalues as the solutions of the equation det(A−λI ) = 0. For the given matrixwe obtain

det(A − λI ) =

2 − λ 0

0 1 − λ −10 2 4 − λ

= (2 − λ)

1 − λ −12 4 − λ

= (2 − λ)((1 − λ)(4 − λ) + 2)

= (2 − λ)(λ2 − 5λ + 6)

= 0.

The eigenvalues are the roots of a cubic equation, resp., the zeros of the cubic polynomial

p given by p(λ) := (2 − λ)(λ2

− 5λ + 6). The zeros are λ1 = 2 and λ2 = 3; correspondingto p(λ) = −(λ − 2)2(λ − 3), λ1 = 2 is a twofold zero.

72



For each of the two eigenvalues we calculate the corresponding eigenvectors according toEq. (2.94) or (2.95). For λ1, (2.95) reads (A − 2I )u = 0; that is, 0 1 0

0 −1 −10 2 2

ξ1

ξ2

ξ3

= 0 (2.96)

where u = ξ1ξ2ξ3

. Eq. (2.96) is equivalent to

ξ2 = 0

−ξ2 − ξ3 = 0 (2.97)

2ξ2 + 2ξ3 = 0.

This system of three equations reduces to a system of two equations, which is not anaccident but related to the fact that (2.96) must have nontrivial solutions. From Eqs. (2.97)

we obtain ξ2 = 0, ξ3 = 0, and ξ1 = t where t is a parameter. Hence, u =

t

00

= t

100

,

and the eigenspace corresponding to λ1 is

S 1 =

u ∈ R3

u = t

001

, t ∈ R ;

S 1 is one-dimensional although λ1 is a twofold zero of the above polynomial p.

For λ2, (2.95) reads (A − 3I )u = 0; that is,

−1 1 00 −2 −10 2 1

ξ1

ξ2

ξ3

= 0.

resp.,

−ξ1 + ξ2 = 0

−2ξ2 − ξ3 = 0

2ξ2 + ξ3 = 0.

Setting ξ3 = t, we obtain ξ2 = − t2 , and ξ1 = − t

2 . Hence, u =

− t2

− t2

t

= t

− 1

2

− 1

2

1

=

− t2

11

−2

= s

11

−2

, and the eigenspace corresponding to λ2 is

S 2 =

u ∈ R3

u = s

11

−2

, s ∈ R .

(b) Consider the projection map L : E 3 → E 3 of Example 2.24, part (b), L(x) = (a · x)a wherea is a unit vector. We solve the eigenvalue problem of L. First, let u ∈ E 3 be any vectorsatisfying a · u = 0, u = 0. It follows that

L(u) = (a · u)a = 0 = 0u,

i.e., L(u) = 0u. Hence, λ = λ1 = 0 is an eigenvalue and u an eigenvector. The correspond-

ing eigenspace isS 1 = {u ∈ E 3 |a · u = 0}. (2.98)

73



Assume now that there is an eigenvector u satisfying a · u = 0. The eigenvalue equationL(u) = λu reads

(a · u)a = λu. (2.99)

The dot multiplication of (2.99) by a yields

(a · u)|a|2 = λu · a

which implies, since a is a unit vector, λ = λ2 = 1. From (2.99) and λ = 1 we obtainu = (a · u)a, i.e., u is a multiple of a. Conversely, every vector u = ta where t ∈ R, t = 0,satisfies

L(u) = L(ta) = (a · ta)a = ta = u,

i.e., L(u) = u. Hence, u is an eigenvector to the eigenvalue λ2 = 1. The correspondingeigenspace is

S 2 = {u ∈ E 3 | u = ta, t ∈ R}. (2.100)

Note that this result is geometrically evident: The vectors that have the same or oppositedirection as a are not changed by the projection, thus being eigenvectors to the eigenvalue1; the vectors that are perpendicular to a are annihilated, thus being eigenvectors to theeigenvalue 0. It is also evident that the subspace S 1 is one-dimensional and the subspaceS 2 is two-dimensional. Moreover, we have that S 1 = Ker L and S 2 = Im L; the first is ageneral property of every linear transformation with eigenvalue 0, whereas the second is aparticular property of this example.

Although the solution of the eigenvalue problem of the considered projection map is veryobvious, it is instructive to solve the eigenvalue problem in matrix representation. Thematrix representation of our projection map L was discussed in Example 2.26, part (b).With reference to an orthonormal basis, the matrix of L is

A = a2

1 a1a2 a1a3

a1a2 a

2

2 a2a3a1a3 a2a3 a2

3 (2.101)

where a1, a2, and a3 are the components of the unit vector a, a = a1e1 + a2e2 + a3e3 (cf.Eq. (2.33)). To determine the eigenvalues of this matrix, we again rewrite Eq. (2.92) as(A − λI )U = 0 and use the fact that the latter equation has nontrivial solutions if andonly if det(A − λI ) = 0. We obtain that

det(A − λI ) =

a21 − λ a1a2 a1a3

a1a2 a22 − λ a2a3

a1a3 a2a3 a23 − λ

= (

a2

1 −λ

)[(a2

2 −λ

)(a2

3 −λ

) −a2

2a2

3] −a

1a

2(a

1a

2(a2

3 −λ

) −a

1a

2a2

3)+ a1a3(a1a2

2a3 − a1a3(a22 − λ))

= (a21 − λ)(λ2 − (a2

2 + a23)λ) + a2

1a22λ + a2

1a23λ

= a21λ2λ3 − a2

1(a22 + a2

3)λ + (a22 + a2

3)λ2 + a21(a2

2 + a23)λ

= (a21 + a2

2 + a23)λ2 − λ3

and, since a is a unit vector,

p(λ) := det(A − λI ) = λ2 − λ3 = λ2(1 − λ).

From p(λ) = det(A − λI ) = 0 it follows that λ = λ1 = 0 and λ = λ2 = 1 where λ1 = 0 is

a twofold zero of the cubic polynomial p. The zeros of p are the eigenvalues of the matrixA and thus of the linear transformation L.

74



The eigenvectors of the matrix A belonging to the eigenvalue λ1 = 0 are the nontrivialsolutions of AU = 0 which can explicitly be written as

a21ξ1 + a1a2ξ2 + a1a3ξ3 = 0

a1a2ξ1 + a22ξ2 + a2a3ξ3 = 0 (2.102)

a1a3ξ1 + a2a3ξ2 + a23ξ3 = 0

where U = ξ1ξ2ξ3

and u = ξ1e1 + ξ2e2 + ξ3e3. Since a is a unit vector, at least one of its

components is not zero; without loss of generality, let us assume that a1 = 0. Dividing thefirst equation of (2.102) by a1, we obtain

a1ξ1 + a2ξ2 + a3ξ3 = 0. (2.103)

The other two equations of (2.102) are equivalent to (2.103), as the multiplication of thelatter by a2, resp., a3 shows. Eq. (2.103) means a · u = 0 which implies our former result(2.98).—Solving (2.103) for ξ1 and setting ξ2 = s and ξ3 = t, we obtain

U = ξ1

ξ2ξ3 =

−a2a1

s − a3a1

t

st

= s−a2

a1

10

+ t−a3

a1

01

,

i.e., u = s(−a2a1

e1 + e2) + t(−a3a1

e1 + e3) = sv1 + tv2 where v1 := −a2a1

e1 + e2 and v2 :=−a3

a1e1 + e3. One easily verifies that a · v1 = 0 and a · v2 = 0, from which it follows again

that a ·u = 0; the two linearly independent vectors v1 and v2 form a basis in the eigenspaceS 1.

The eigenvectors of the matrix A belonging to the eigenvalue λ2 = 1 are the nontrivialsolutions of (A − I )U = 0 which reads explicitly

(a21 − 1)ξ1 + a1a2ξ2 + a1a3ξ3 = 0

a1

a2

ξ1

+ (a2

2 −1)ξ

2+ a

2a

3ξ

3= 0 (2.104)

a1a3ξ1 + a2a3ξ2 + (a23 − 1)ξ3 = 0.

Taking account of a21 + a2

2 + a23 = 1, one can verify that the system (2.104) reduces

to a system of two equations with the solution U =

ξ1ξ2ξ3

= t

a1a2a3

. Hence, u =

t(a1e1 + a2e2 + a3e3) = ta, which is our former result (2.100).

The matrix solution of the eigenvalue problem of the projection map L simplifies essentiallyby the choice of an orthonormal basis that is adapted to the situation. Choosing e1, e2, e3

such that, for instance, e1 = a, we obtain a1 = 1 and a2 = a3 = 0. According to (2.101),the matrix A then takes the simple form

A = 1 0 0

0 0 00 0 0

.

This immediately implies that p(λ) = det(A − λI ) = λ2(1 − λ) and that the eigenvalues

are λ1 = 0 and λ2 = 1. The eigenvectors of A belonging to λ1 are U =

0s

t

=

s

010

+

001

which means, for the eigenvectors of L, u = se2 + te3, the latter implying

again a · u = e1 · u = 0 and hence (2.98). The eigenvectors of A belonging to λ2 are

U =

t

00 =

t100 which means, for the eigenvectors of

L,

u=

te1 =

ta, the latter

implying again (2.100).

75



We now summarize the basic statements on eigenvalue problems.

Theorem 2.53 Let L : V → V be a linear transformation and A a representing n × n matrix.

(a) The eigenvalues of L (resp., of A) are the solutions of the characteristic equation

p(λ) := det(A − λI ) = 0

where p is a polynomial of degree n, the characteristic polynomial.

(b) There are at most n eigenvalues of L (resp., of A) and possibly no (real) eigenvalue. If n

is odd, then there exists at least one (real) eigenvalue.

(c) Let λ1, . . . , λm be the different eigenvalues of L and S 1, . . . , S m the corresponding eigen-spaces. Eigenvectors u1, . . . , um belonging to these different eigenvalues (i.e., u1 ∈ S 1, . . . ,

um ∈ S m, u1, . . . , um = 0) are linearly independent. Moreover,

dim S 1 + . . . + dim S m ≤ dim V = n.

Proof:

(a) For n = 2, we have

p(λ) = det(A − λI ) =

a11 − λ a12

a21 a22 − λ

= (a11 − λ)(a22 − λ) − a12a21

where p is a poynomial of degree 2. For n = 3, we obtain

p(λ) = det(A − λI ) =

a11 − λ a12 a13

a21 a22 − λ a23

a31 a32 a33 − λ

= (

a11 −

λ) a22

−λ a23

a32 a33 − λ − a21 a12 a13

a32 a33 − λ +a

31 a12 a13

a22 − λ a23 where p is obviously a polynomial of degree 3. By induction one can show that, for ann × n matrix, λ → p(λ) is a polynomial of degree n.

An eigenvector U of the matrix A is a nontrivial solution of the equation AU = λU ,resp., of the homogeneous linear system (A − λI )U = 0. According to Theorem 2.39and statement 7 of Section 2.8, the latter system has nontrivial solutions if and only if rank(A− λI ) < n, i.e., if and only if det(A−λI ) = 0. Hence, the eigenvalues are the zerosof the characteristic polynomial p.

(b) Since a polynomial of degree n has at most n zeros and possibly no real zero, an n × n

matrix A can have at most n eigenvalues and possibly none. However, a real polynomialof odd degree has at least one real zero, so A has at least one eigenvalue if n is odd.

(c) Consider two different eigenvalues λ1, λ2 with two corresponding eigenvectors u1, u2,

L(u1) = λ1u1

L(u2) = λ2u2,(2.105)

u1, u2 = 0. Letµ1u1 + µ2u2 = 0 (2.106)

where µ1, µ2 ∈ R. Applying the linear transformation L to both sides of (2.106), we obtainµ1L(u1) + µ2L(u2) = 0; i.e., by (2.105),

µ1λ1u1 + µ2λ2u2 = 0. (2.107)

76



Multiplying Eq. (2.106) by λ1, we obtain

λ1µ1u1 + λ1µ2u2 = 0. (2.108)

The subtraction of Eq. (2.108) from (2.107) yields (λ1 − λ2)µ2u2 = 0. Since λ1 = λ2 andu2 = 0, it follows that µ2 = 0 and, by (2.106) and u1 = 0, µ1 = 0. Hence, u1 and u2 arelinearly independent. By induction, this procedure can be generalized to m > 2, i.e., to

the case of more than two eigenvalues.To prove the statement on the dimensions, choose a basis in each eigenspace:

basis of S 1 : u(1)1 , . . . , u

(n1)1

basis of S 2 : u(1)2 , . . . , u

(n2)2

...

basis of S m : u(1)m , . . . , u(nm)

m .

We show that the system of the vectors

u(1)

1 , . . . , u(n1)

1 , u(1)

2 , . . . , u(n2)

2 , . . . , u(1)m , . . . , u

(nm)m (2.109)

is linearly independent. Letmi=1

ni j=1

λ( j)i u

( j)i = 0. (2.110)

Defining

wi :=

ni j=1

λ( j)i u

( j)i , (2.111)

Eq. (2.110) can be written as

w1 + . . . + wi + . . . + wm = 0. (2.112)

Note that wi ∈ S i for i = 1, . . . , m since u( j)i ∈ S i and S i is a subspace. Assume that not

all wi are zero; denote the non-zero vectors of (2.112) by wi1 , . . . , wir . Eq. (2.112) thenimplies that

wi1 + . . . + wir = 0. (2.113)

The vectors wij ∈ S ij , wij = 0, j = 1, . . . , r, are eigenvectors of L. According to Eq.(2.113), these vectors are linearly dependent (each wij has the coefficient 1), which is acontradiction. Hence, our assumption is wrong and all vectors w1, . . . , wm in (2.112) arezero. From (2.111) it then follows that

ni

j=1

λ( j)i u

( j)i = 0

for all i = 1, . . . , m. Since, for each i, the vectors u(1)i , . . . , u

(ni)i are linearly independent,

we conclude that λ( j)i = 0 for all j = 1, . . . , ni and all i = 1, . . . , m. Hence, by (2.110),

the system (2.109) is linearly independent.—The number of the vectors of the linearlyindependent system (2.109) is n1 + . . . + nm. In consequence,

n1 + . . . + nm ≤ dim V = n,

and from n1 = dim

S 1, . . . , nm = dim

S m we finally obtain the statement on the dimensions

of the eigenspaces. 2

77



Finally, we summarize some consequences of parts (b) and (c) of the preceding theorem.One can distinguish the following three cases.

Case 1: The linear transformation L has n (different) eigenvalues λ1, . . . , λn (n = dim V ).Then

(i) corresponding eigenvectors u1, . . . , un are linearly independent (according to part (c) of Theorem 2.53)

(ii) each eigenspace is one-dimensional (by the dimension statement of part (c) of the theorem)

(iii) u1, . . . , un is a basis of V consisting of eigenvectors of L (since dim V = n).

Case 2: The linear map L has less than n eigenvalues, namely, λ1, . . . , λm (m < n) witheigenspaces S 1, . . . , S m, but

dim S 1 + . . . + dim S m = n = dim V .Then, as in the proof of the theorem, we can choose respective bases of S 1, . . . , S m and join them,thus obtaining a system of the type (2.109) of linearly independent vectors. Since n1 +. . .+nm =n, this system is a basis of

V consisting of eigenvectors of L.

Case 3: The map L has less than n eigenvalues, λ1, . . . , λm (m < n) with eigenspacesS 1, . . . , S m and

dim S 1 + . . . + dim S m < n = dim V .Then there is no basis of V consisting entirely of eigenvectors.

In the first two cases, we can draw some further important conclusions. For the reason of asimpler formulation, we consider only Case 1. That is,

L(ui) = λiui, ui = 0, i = 1, . . . , n .

Let v1, . . . , vn be any basis of V ; u1, . . . , un is a basis of eigenvectors. For every x ∈ V and everyy := L(x), we have

x =n

i=1

ξivi =n

i=1

αiui

y =n

i=1

ηivi =n

i=1

β iui

and

η1...

ηn

= A

ξ1...

ξn

,

β 1...

β n

= A

α1...

αn

(2.114)

where A is the matrix of L w.r.t. the basis v1, . . . , vn and A the matrix of L w.r.t. the basisu1, . . . , un. By Exercise 2.25, we know that these matrices are related according to

A = BAB−1

where B is the matrix of the basis transformation. However, we can determine A more easily.Using the eigenvector basis u1, . . . , un, we obtain

y = L(x) = L

n

i=1

αiui

=

ni=1

αiL(ui) =n

i=1

αiλiui.

Comparing the last expression for y with y = ni=1

β iui, we conclude that

β i = λiαi

78



where i = 1, . . . , n. Since the numbers β i and αi are related by the matrix A according to(2.114), it follows that

A =

λ1 0 . . . 00 λ2 . . . 0...

. . ....

0 0 . . . λn

.

In fact, we have proved the following result.

Theorem 2.54 Let L be a linear transformation satisfying the condition of Case 1. Then thematrix of L w.r.t. a basis consisting of eigenvectors is diagonal, the eigenvalues being the diagonal entries.

2.10 Exercises

2.1 Show that

(i) the set

Mmn of all real m

×n matrices with the usual addition of matrices and the usual

multiplication by real numbers is a real vector space

(ii) a subset of m × n matrices with the entries 0 at the same places is a subspace of Mmn

(iii) the subset S n of the symmetric n × n matrices is a subspace of Mnn

(iv) the subset An of the antisymmetric n × n matrices is a subspace of Mnn.

Give an example of a subset of Mmn that is not a subspace.

2.2 Let C 0([a, b]) be the space of the real-valued continuous functions on the interval [ a, b],a < b. Investigate which of the following subsets are subspaces of C 0([a, b]):

(i) the set of all functions being differentiable on [a, b]

(ii) the set of all functions f being continuous on [a, b] and satisfying f (a) = 0

(iii) the set of all functions f being continuous on [a, b] and satisfying f (x) ≥ 0 for all x ∈ [a, b]

(iv) the set of all continuous functions satisfying ba

f (x)dx = 0

(v) the set of all continuous functions satisfying ba

f (x)dx = 1.

2.3 Verify that the set R+ of all strictly positive real numbers with the operations

x ⊕ y := xy and λ ◦ x := xλ

where x,y > 0 and λ ∈ R, is a vector space.

2.4 Determine whether the following systems of vectors of Rn are linearly dependent or inde-pendent.

a)

4−1

2

,

−410

2

b) −2

01, 3

25, 6

−11, 7

02

79



c)

0022

,

3300

,

110

−1

2.5 Determine whether the following systems of functions f , g , h ∈ C 0(R) are linearly depen-

dent or independent.

a) f (x) := 1, g(x) := x, h(x) := ex

b) f (x) := sin x, g(x) := cos x

c) f (x) := 6, g(x) := sin2 x, h(x) := cos2 x

2.6 Let S n be the subspace of the symmetric matrices of Mnn and An the subspace of theantisymmetric matrices (cf. Problem 6.1). What are the dimensions of S n and An?

2.7 What is the dimension of the vector space of Problem 6.3?

2.8 Applying the method of Gauß elimination , solve the following system of linear equations:x + y + 2z = 9

2x + 4y − 3z = 1

3x + 6y − 5z = 0.

Represent the system by a matrix of row-echelon type. Furthermore, solve the system accordingto Gauß-Jordan elimination and determine the corresponding matrix of reduced row-echelon type.

2.9 Solve the equation Ax = b, i.e., the corresponding system of linear equations, where

a) A = 4 2 −2−3 1 0

1 −4 2

and b = b1 = −2

6−9

, resp., b = b2 = 26

122

b) A =

1 −3 52 −2 1

−3 5 −6

and b =

−26

−9

c) A =

1 2 −7 24 7 −26 9

−3 −5 19 −7

and b = b1 =

−26

−9

, resp.,

b = b2 = −3

−107

.

2.10 Let e1, e2 be an orthonormal basis in E 2 and x ∈ E 2 any vector. Dilate x w.r.t. thedirection of e1 by the factor 2 and then reflect the dilated vector at e2, thus obtaining a vector y.

(a) Calculate y from x and show that the transformation x → y defines a linear mapL : E 2 → E 2.

(b) What is the matrix of L w.r.t. the basis e1, e2?

(c) Introducing the new basis v1 := e1 − e2, v2 := e1 + 2e2, find the relation between thecomponents of x w.r.t. the two bases.

80



(d) Determine the matrix of L w.r.t. the new basis.

2.11 Let e1, . . . , en be the standard basis of Rn and let b1, . . . , bn be fixed vectors of Rm. Showthat the map L : Rn → R

m defined by

L(x) = L

n

i=1

xiei

:=

n

i=1

xibi

is linear and determine its matrix A with respect to the standard basis.

2.12 Consider the linear map L : R2 → R2 defined by

L(x) :=

x1 + 2x2

3x1 + 4x2

.

Determine the matrices of L with respect to the standard basis e1,e2 as well as with respect to

the basis v1 :=

11

, v2 :=

−11

.

2.13

a) Let A be an m × n matrix and b ∈ Rm. Is the map x → L(x) := Ax + b, x ∈ Rn, linear?

b) What are the linear maps from R1 = R into itself?

2.14 Describing a parallelogram in P 2 in terms of position vectors r ∈ E 2, show that its imageunder an affine map r → L(r) + b where L is a linear transformation of E 2 and b ∈ E 2 a constantvector, is again a parallelogram or is degenerated. The corresponding statement is true forparallelograms and parallelepipeds in three-dimensional space P 3.

2.15 Show that the multiplication of matrices is associative, but not commutative. Moreover,the multiplication is distributive with respect to the addition of matrices and mixed-associative

with respect to the multiplication of matrices by numbers. Finally, if the matrix A can bemultiplied by B, then (AB)T = BT AT .

2.16 For the quadratic matrices

A =

3 0 −24 1 −30 5 6

, B =

0 −2 43 1 21 3 5

,

calculate AB, BA, (A + B)2, (A − B)2, and (A + B)(A − B), and for

C 1 = 1 0

0 0 , C 2 = 0 1

0 0 , C 3 = 1 0

0 −1 , C 4 = 0 1

−1 0 ,

calculate C 2i , i = 1, 2, 3, 4.

2.17 Consider the three-dimensional vector space P 3 of all real polynomials of degree smalleror equal than two. With respect to the canonical basis of this space, the linear operator of differentiation, d

dx, has the matrix

D =

0 1 00 0 20 0 0

.

Show that D3 = 0 and explain why this is so. Furthermore, construct matrices A and B suchthat A3 = 0, A4 = 0 and B4 = 0, B5 = 0.

81



2.18 Let e1, e2, e3 be a right-handed orthonormal basis in the Euclidean space E 3.

a) Descibe positively oriented rotations of vectors around the basis vectors by an angle α > 0by matrices.

b) The vector x = e1 + e2 + 3e3 is rotated first around e1 in the positive sense by α1 = 30◦

and then around e2 in the positive sense by β = 60◦. What is the vector obtained this

way?c) The same question for α = β = 90◦.

d) The same question for α = β = 90◦ and the converse order of rotations.

e) Find the matrix that describes the operation of a positively oriented rotation around e1

by α followed by such a rotation around e2 by β .

2.19 Consider the orthogonal projection p of a vector x ∈ E 3 onto a plane with normal vectorn where |n| = 1. Let e1, e2, e3 be an orthonormal basis.

a) Represent p in terms of x and n.

b) Choose an orthonormal basis v1, v2 in the subspace given by n · r = 0 and represent p interms of x and v1, v2.

c) Show that the map x → p =: L(x) is linear.

d) Determine the matrix A of L w.r.t. e1, e2, e3 as well as the matrix A w.r.t. the orthonormalbasis v1, v2, n.

e) What are Ker L and Im L?

f) For the particular case n = 1√3

(e1 + e2 + e3), choose some v1, v2 and calculate A, A, as

well as the projection p of x = 2e1

−6e2 + 3e3.

2.20 An m × n matrix A defines a linear map L : Rn → Rm according to L(x) := Ax. For

the following matrices, determine dim Ker L and rank A = dim Im L as well as a basis of Ker L,resp., Im L.

A = A1 =

1 1 2 0−3 2 0 1

8 −2 −2 2

, A = A2 =

5 2 −1 0

19 −4 31 −188 −1 8 −42 4 −16 10

2.21

a) What is the dimension of the subspace of Rn consisting of those vectors x that satisfy onehomogeneous equation in n unknowns?

b) What is the dimension of the subspace of the space of the n × n matrices that consists of those matrices satisfying

tr A :=n

i=1

aii = 0?

2.22 Show that the following matrices are regular and determine their inverses:

A = 3 1 4

1 2 00 1 −2

, B = 4 5 −1

2 0 13 1 0

.

82



2.23 Consider two bases v1, . . . , vn and v1, . . . , vn of a vector space V of dimension n. Accordingto

v j =n

i=1

β ijvi, v j =n

i=1

γ ijvi,

j = 1, . . . , n, introduce two matrices B and C with the entries β ij and γ ij .

a) Show that C = B−1.

b) Show that the components of a vector x ∈ V , x =n

i=1 ξivi =n

i=1 ξivi, transform

according to

ξi =n

j=1

β ijξ j , ξi =n

j=1

γ ijξ j ,

j = 1, . . . , n.

The matrix B is called the matrix of the basis transformation .

2.24 Consider the Euclidean vector space E 3.

a) Show that, for two orthonormal bases e1, e2, e3 and e1, e2

, e3, the matrix B of the basis

transformation is orthogonal , i.e., B satisfies B−1 = BT . What is the geometrical meaningof the entries β ij of B?

b) Conversely, if B is an orthogonal matrix and e1, e2, e3 an orthonormal basis, a new or-thonormal basis is defined by e j

:=3

i=1 γ ijei =3

i=1 β jiei where γ ij are the entries of B−1.

2.25 Let A be the matrix of a linear transformation L : V → V w.r.t. the basis v1, . . . , vn andA the matrix of L w.r.t. the basis v1, . . . , vn. Show that

A = BAB−1

where B is the matrix of the basis transformation. That is, the matrix of a linear transformationtransforms according to a similarity transformation .

2.26 Consider a right-handed orthonormal basis e1, e2, e3 in the Euclidean vector space E 3.Using the results of the preceding problems, find the matrix of a positively oriented rotation byan angle α around the axis given by the unit vector n = 1√

2(e2 + e3).

2.27 Let L be a linear transformation acting in E 3 such that its matrix w.r.t. one orthonormalbasis is symmetric. Show that then the matrix of L w.r.t. any orthonormal basis is symmetric.In other words, L is a symmetric tensor in E 3.

2.28 A homogeneous elastic cylindrical body whose center lies in the origin of a Cartesian

coordinate system and whose axis has the direction 1√2 (e2 + e3), is stretched by suitable forces

acting at the end surfaces. The length of the cylinder increases by a factor 1 + α whereas anydiameter perpendicular to its axis decreases by a factor 1 − β , α and β being small positivenumbers. Assume that the position vector r = L(r) of a material point of the body afterdeformation depends linearly on its position r before deformation. What are the componentsof the (symmetric) tensor L w.r.t. the given coordinate system? (Hint: Introduce a secondcoordinate system adapted to the situation.)

2.29 Calculate the determinant

2 5 1 4−5 3 0 0

1 7 0 −39 3 4 5

.

83



2.30 Find the eigenvalues, eigenvectors, and eigenspaces of the following matrices, and if pos-sible, give a basis of eigenvectors.

A =

4 0 1−2 1 0−2 0 1

, B =

1 −3 33 −5 36 −6 4

, C =

1 1 00 1 10 0 1

, D =

1 −11 1

2.31 W.r.t. a Cartesian coordinate system in P 2, consider all points satisfying the equation

x21 − x1x2 + x2

2 = 1.

Writing this equation in the form r · L(r) = 1 where r is a position vector and L a suitable lineartransformation, show that the considered curve is an ellipse and determine the direction as wellas the length of its half-axes.

2.32 By means of a suitable coordinate transformation, diagonalize the symmetric matrixoccurring in the following equation representing a plane curve:

x21 + 2√2x1x2 = 1, resp., x1

x2 · 1√

2

√2 0 x1

x2 = 1.

2.33 Rotate the curves given by

x21

4+ x2

2 = 1, x1x2 = 1

in the positive sense by an angle of 45◦ and determine the equations of the rotated curves.

2.34 Again, transformation to principal axes. Let A be a fixed real symmetric 3 × 3 matrix

and X =

x1x2x3

. Show that the equation

X · AX = 1, resp., X T AX = 1

represents

a) an ellipsoid if A =

2 −1 0−1 2 −1

0 −1 2

b) an elliptic cylinder if A =

1 −1 0−1 2 −1

0−

1 1

) f A

1 1 1

linear algebra 03

Documents