matrix theory

Matrix TheoryFrom Generalized Inverses

to Jordan Form

PURE AND APPLIED MATHEMATICS

A Program of Monographs, Textbooks, and Lecture Notes

EXECUTIVE EDITORS

Earl J. TaftRutgers University

Piscataway, New Jersey

Zuhair NashedUniversity of Central Florida

Orlando, Florida

EDITORIAL BOARD

M. S. BaouendiUniversity of California,

San Diego

Jane CroninRutgers University

Jack K. HaleGeorgia Institute of Technology

S. KobayashiUniversity of California,

Berkeley

Marvin MarcusUniversity of California,

Santa Barbara

W. S. MasseyYale University

Anil NerodeCornell University

Freddy van OystaeyenUniversity of Antwerp,Belgiaun

Donald PassmanUniversity of Wisconsin,Madison

Fred S. RobertsRutgers University

David L. RussellVirginia Polytechnic Instituteand State University

Walter SchemppUniversitat Siegen

MONOGRAPHS AND TEXTBOOKS INPURE AND APPLIED MATHEMATICS

Recent TitlesW. J. Wickless, A First Graduate Course in Abstract Algebra (2004)

R. P. Agarwal, M. Bohner, and W-T Li, Nonoscillation and Oscillation Theory forFunctional Differential Equations (2004)

J. Galambos and 1. Simonelli, Products of Random Variables: Applications toProblems of Physics and to Arithmetical Functions (2004)

Walter Ferrer and Alvaro Rittatore, Actions and Invariants of Algebraic Groups (2005)

Christof Eck, Jiri Jarusek, and Miroslav Krbec, Unilateral Contact Problems: VariationalMethods and Existence Theorems (2005)

M. M. Rao, Conditional Measures and Applications, Second Edition (2005)

A. B. Kharazishvili, Strange Functions in Real Analysis, Second Edition (2006)

Vincenzo Ancona and Bernard Gaveau, Differential Forms on Singular Varieties:De Rham and Hodge Theory Simplified (2005)

Santiago Alves Tavares, Generation of Multivariate Hemiite Interpolating Polynomials(2005)

Sergio Macias, Topics on Continua (2005)

Mircea Sofonea, Weimin Han, and Meir Shillor, Analysis and Approximation ofContact Problems with Adhesion or Damage (2006)

Marwan Moubachir and Jean-Paul Zolesio, Moving Shape Analysis and Control:Applications to Fluid Structure Interactions (2006)

Alfred Geroldinger and Franz Halter-Koch, Non-Unique Factorizations: Algebraic,Combinatorial and Analytic Theory (2006)

Kevin J. Hastings, Introduction to the Mathematics of Operations Researchwith Mathematical, Second Edition (2006)

Robert Carlson, A Concrete Introduction to Real Analysis (2006)

John Dauns and Yiqiang Zhou, Classes of Modules (2006)

N. K. Govil, H. N. Mhaskar, Ram N. Mohapatra, Zuhair Nashed, and J. Szabados,Frontiers in Interpolation and Approximation (2006)

Luca Lorenzi and Marcello Bertoldi, Analytical Methods for Markov Semigroups(2006)

M. A. AI-Gwaiz and S. A. Elsanousi, Elements of Real Analysis (2006)

Theodore G. Faticoni, Direct Sum Decompositions of Torsion-free FiniteRank Groups (2006)

R. Sivaramakrishnan, Certain Number-Theoretic Episodes in Algebra (2006)

Aderemi Kuku, Representation Theory and Higher Algebraic K-Theory (2006)

Robert Piziak and P L. Ode!/, Matrix Theory: From Generalized Inverses toJordan Form (2007)

Norman L. Johnson, Vikram Jha, and Mauro Biliotti, Handbook of FiniteTranslation Planes (2007)

Matrix TheoryFrom Generalized Inverses

to Jordan Form

Robert PiziakBaylor University

Texas, U.S.A.

P. L. NOBaylor University

Texas, U.S.A.

Chapman & Hall/CRCTaylor & Francis Group

Boca Raton London New York

Chapman At Hall/CRC is an imprint of theTaylor lit Francis Group, an intorma business

Chapman & Hall/CRCTaylor & Francis Group6000 Broken Sound Parkway N%V, Suite 300Boca Raton, FL 33487-2742

O 2007 by Taylor & Francis Group, LLCChapman & Hall/CRC is an imprint of Taylor & Francis Group, an Infoi ma business

No claim to original U.S. Government worksPnnted in Canada on acid-free paper1098765432

International Standard Book Number-10:1-58488-625-0 (Hardcover)International Standard Book Number-13: 978-1-58488-625-9 (Hardcover)

This book contains information obtained from authentic and highly regarded sources. Reprinted materialis quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonableefforts have been made to publish reliable data and information, but the author and the publisher cannotassume responsibility for the validity of all materials or for the consequences of their use.

No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,mechanical, or other means, now known or hereafter invented, including photocopying, microfilming,and recording, or in any information storage or retrieval system, without written permission from thepublishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright com/) or contact the Copyright Clearance Center, Inc (CCC) 222 RosewoodDrive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses andregistration for a variety of users. For organizations that have been granted a photocopy license by theCCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and areused only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Piziak, Robert.Matrix theory : from generalized inverses to Jordan form / Robert Piziak and P.L.

Odellp cm -- (Pure and applied mathematics)

Includes bibliographical references and index.ISBN-13.978-1-5848 8-625-9 (acid-free paper)1. Matrices--textbooks. 2. Algebras, Linerar--Textbooks. 3. Matrix

inversion--Textbooks. I.Odell, Patrick L., 1930- II Title. Ill. Series.

QA 188. P59 2006512.9'434--dc22 2006025707

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

Dedication

Dedicated to the love and support of our spouses

Preface

This text is designed for a second course in matrix theory and linear algebraaccessible to advanced undergraduates and beginning graduate students. Manyconcepts from an introductory linear algebra class are revisited and pursued toa deeper level. Also, material designed to prepare the student to read more ad-vanced treatises and journals in this area is developed. A key feature of the bookis the idea of "generalized inverse" of a matrix, especially the Moore-Penroseinverse. The concept of "full rank factorization" is used repeatedly throughoutthe book. The approach is always "constructive" in the mathematician's sense.

The important ideas needed to prepare the reader to tackle the literature inmatrix theory included in this book are the Henderson and Searle formulas,Schur complements, the Sherman-Morrison-Woodbury formula, the LU fac-torization, the adjugate, the characteristic and minimal polynomial, the Framealgorithm and the Cayley-Hamilton theorem, Sylvester's rank formula, the fun-damental subspaces of a matrix, direct sums and idempotents, index and theCore-Nilpotent factorization, nilpotent matrices, Hermite echelon form, fullrank factorization, the Moore-Penrose inverse and other generalized inverses,norms, inner products and the QR factorization, orthogonal projections, thespectral theorem, Schur's triangularization theorem, the singular value decom-position, Jordan canonical form, Smith normal form, and tensor products.

This material has been class tested and has been successful with studentsof mathematics, undergraduate and graduate, as well as graduate students instatistics and physics. It can serve well as a "bridge course" to a more advancedstudy of abstract algebra and to reading more advanced texts in matrix theory.

ix

Introduction

In 1990, a National Science Foundation (NSF) meeting on the undergraduatelinear algebra curriculum recommended that at least one "second" course inlinear algebra should be a high priority for every mathematics curriculum. Thistext is designed for a second course in linear algebra and matrix theory taughtat the senior undergraduate and beginning graduate level. It has evolved fromnotes we have developed over the years teaching MTH 4316/5316 at BaylorUniversity. This text and that course presuppose a semester of sophomore levelintroductory linear algebra. Even so, recognizing that certain basic ideas needreview and reinforcement, we offer a number of appendixes that can be usedin the classroom or assigned for independent reading. More and more schoolsare seeing the need for a second semester of linear algebra that goes moredeeply into ideas introduced at that level and delving into topics not typicallycovered in a sophomore level class. One purpose we have for this course is toact as a bridge to our abstract algebra courses. Even so, the topics were chosento appeal to a broad audience and we have attracted students in statistics andphysics. There is more material in the text than can be covered in one semester.This gives the instructor some flexibility in the choice of topics. We require ourstudents to write a paper on some topic in matrix theory as part of our courserequirements, and the material we omit from this book is often a good startingpoint for such a project.

Our course begins by setting the stage for the central problem in linearalgebra,solving systems of linear equations. In Chapter 1, we present three views of thisproblem: the geometric, the vector, and the matrix view. One of the main goalsof this book is to develop the concept of "generalized inverses," especiallythe Moore-Penrose inverse. We therefore first present a careful treatment ofordinary invertible matrices, including a connection to the minimal polynomial.We develop the Henderson-Searle formulas for the inverse of a sum of matrices,the idea of a Schur complement and the Sherman-Morrison-Woodbury formula.In Chapter 2, we discuss the LU factorization after reviewing Gauss elimination.Next comes the adjugate of a matrix and the Frame algorithm for computingthe coefficients of the characteristic polynomial and from which the Cayley-Hamilton theorem results.

In Chapter 3, we recall the subspaces one associates with a matrix and re-view the concept of rank and nullity. We derive Sylvester's rank formula andreap the many consequences that follow from this result. Next come direct sum

xi

xii Introduction

decompositions and the connection with idempotent matrices. Then the idea ofindex of a matrix is introduced and the Core-Nilpotent factorization is proved.Nilpotent matrices are then characterized in what is probably the most chal-lenging material in the book. Left and right inverses are introduced in part toprepare the way for generalized inverses.

In Chapter 4, after reviewing row reduced echelon form, matrix equivalence,and the Hermite echelon form, we introduce the all-important technique offull rank factorization. The Moore-Penrose inverse is then introduced using theconcept of full rank factorization and is applied to the problem of solving asystem of linear equations, giving a consistency condition and a description ofall solutions when they exist. This naturally leads to the next chapter on othergeneralized inverses.

At this point, some choices need to be made. The instructor interested inpursuing more on generalized inverses can do so at the sacrifice of later material.Or, the material can be skipped and used as a source of projects. Some middleground can also be chosen.

Chapter 6 concerns norms, vector and matrix. We cover this material in oneday since it is really needed only to talk about minimum norm solutions andleast squares solutions to systems of equations. The chapter on inner productscomes next. The first new twist is that we are dealing with complex vectorspaces so we need a Hermitian inner product. The concept of orthogonalityis reviewed and the QR factorization is developed. Kung's approach to find-ing the QR factorization is presented. Now we are in a position to deal withminimum norm and least squares solutions and the beautiful connection withthe Moore-Penrose inverse. The material on orthogonal projections in Chapter8 is needed to relate the Moore-Penrose inverse to the orthogonal projectionsonto the fundamental subspaces of a matrix and to our treatment of the spectraltheorem. Unfortunately, sections 2 and 5 are often skipped due to lack of time.In Chapter 9, we prove the all-important spectral theorem.

The highlights of Chapter 10 are the primary decomposition theorem andSchur's triangularization theorem. Then, of course, we also discuss singularvalue decomposition, on which you could spend an inordinate amount of time.The big finish to our semester is the Jordan canonical form theorem. To behonest, we have never had time to cover the last chapter on multilinear algebra,Chapter 12.

Now to summarize what we feel to be the most attractive features of the book:

1. The style is conversational and friendly but always mathematically cor-rect. The book is meant to be read. Concrete examples are used to makeabstract arguments more clear.

2. Routine proofs are left to the reader as exercises while we take the readercarefully through the more difficult arguments.

introduction xiii

3. The book is flexible. The core of the course is complex matrix theory ac-cessible to all students. Additional material is available at the discretion ofthe instructor, depending on the audience or the desire to assign individualprojects.

4. The Moore-Penrose inverse is developed carefully and plays a centralrole, making this text excellent preparation for more advanced treatisesin matrix theory, such as Horn and Johnson and Ben Israel and Greville.

5. The book contains an abundance of homework problems. They are notgraded by level of difficulty since life does not present problems that way.

6. Appendixes are available for review of basic linear algebra.

7. Since MATLAB seems to be the language of choice for dealing withmatrices, we present MATLAB examples and exercises at appropriateplaces in the text.

8. Most sections include suggested further readings at the end.

9. Our approach is "constructive" in the mathematician's sense, and wedo not extensively treat numerical issues. However, some "NumericalNotes" are included to be sure the reader is aware of issues that arisewhen computers are used to do matrix calculations.

There are many debts to acknowledge. This writing project was made possibleand was launched when the first author was granted a sabbatical leave by BaylorUniversity in the year 2000. We are grateful to our colleague Ron Stanke forhis help in getting the sabbatical application approved.

We are indebted to many students who put up with these notes and withus for a number of years. They have taken our courses, have written master'stheses under our direction, and have been of immeasurable help without evenknowing it. A special thanks goes to Dr. Richard Greechie and his Math 405linear algebra class at LA TECH, who worked through an earlier version of ournotes in the spring of 2004 and who suggested many improvements. We mustalso mention Curt Kunkel, who took our matrix theory class from this bookand who went way beyond the call of duty in reading and rereading our notes,finding many misprints, and offering many good ideas. Finally, we appreciatethe useful comments and suggestions of our colleague Manfred Dugas, whotaught a matrix theory course from a draft version of our book.

Having taught out of a large number of textbooks over the years, we havecollected a large number of examples and proofs for our classes. We have madesome attempt to acknowledge these sources, but many have likely been lost inthe fog of the past. We apologize ahead of time to any source we have failed toproperly acknowledge. Certainly, several authors have had a noticeable impact

xiv Introduction

on our work. G. Strang has developed a picture of "fundamental" subspacesthat we have used many times. S. Axler has influenced us to be "determinantfree" whenever possible, though we have no fervor against determinants anduse them when they yield an easy proof. Of course, the wonderful book by C.D. Meyer, Jr., has changed our view on a number of sections and caused us torewrite material to follow his insightful lead. The new edition of Ben-Israel andGreville clearly influences our treatment of generalized inverses.

We also wish to thank Roxie Ray and Margaret Salinas for their help in thetyping of the manuscript. To these people and many others, we are truly grateful.

Robert Piziak

Patrick L. Odell

References

D. Carlson, E. R. Johnson, D. C. Lay, and A. D. Porter, The linear algebracurriculum study group recommendations for the first course in linearalgebra, College Mathematics Journal, 24 (2003), 41-45.

J. G. Ianni, What's the best textbook?-Linear algebra, Focus, 24 (3)(March, 2004), 26.

An excellent source of historical references is available online at http://www-history.mcs.st-andrews.ac.uk/ thanks to the folks at St. Andrews, Scotland.

Contents

1 The Idea of Inverse ...............................................1

1.1 Solving Systems of Linear Equations ........................... 11.1.1 Numerical Note ...................................... 10

1.1.1.1 Floating Point Arithmetic ..................... 101.1.1.2 Arithmetic Operations ........................ 111.1.1.3 Loss of Significance ......................... 12

1.1.2 MATLAB Moment ................................... 131.1.2.1 Creating Matrices in MATLAB ............... 13

1.2 The Special Case of "Square" Systems ........................ 171.2.1 The Henderson Searle Formulas ....................... 21

1.2.2 Schur Complements and theSherman-Morrison-Woodbury Formula ................. 24

1.2.3 MATLAB Moment ................................... 371.2.3.1 Computing Inverse Matrices .................. 37

1.2.4 Numerical Note ...................................... 391.2.4.1 Matrix Inversion ............................. 39

1.2.4.2 Operation Counts ............................ 39

2 Generating Invertible Matrices .................................. 41

2.1 A Brief Review of Gauss Elimination with Back Substitution .... 412.1.1 MATLAB Moment ................................... 47

2.1.1.1 Solving Systems of Linear Equations .......... 472.2 Elementary Matrices ........................................ 49

2.2.1 The Minimal Polynomial ............................. 572.3 The LU and LDU Factorization ...............................63

2.3.1 MATLAB Moment ................................... 752.3.1.1 The LU Factorization ........................ 75

2.4 The Adjugate of a Matrix .................................... 762.5 The Frame Algorithm and the Cayley-Hamilton Theorem ....... 81

2.5.1 Digression on Newton's Identities ......................85

2.5.2 The Characteristic Polynomial and the MinimalPolynomial .......................................... 90

xv

xvi Contents

2.5.3 Numerical Note ......................................95

2.5.3.1 The Frame Algorithm ........................ 952.5.4 MATLAB Moment ................................... 95

2.5.4.1 Polynomials in MATLAB .................... 95

3 Subspaces Associated to Matrices ................................ 993.1 Fundamental Subspaces ..................................... 99

3.1.1 MATLAB Moment ..................................109

3.1.1.1 The Fundamental Subspaces ................. 109

3.2 A Deeper Look at Rank .....................................111

3.3 Direct Sums and Idempotents ............................... 1173.4 The Index of a Square Matrix ............................... 128

3.4.1 MATLAB Moment .................................. 1473.4.1.1 The Standard Nilpotent Matrix ............... 147

3.5 Left and Right Inverses ..................................... 148

4 The Moore-Penrose Inverse ..................................... 155

4.1 Row Reduced Echelon Form and Matrix Equivalence ..........4.1.1 Matrix Equivalence .................................

4.1.2 MATLAB Moment ..................................

4.1.2.1 Row Reduced Echelon Form .................4.1.3 Numerical Note .....................................

4.1.3.1 Pivoting Strategies ..........................

4.1.3.2 Operation Counts ...........................

4.2 The Hermite Echelon Form .................................

4.3 Full Rank Factorization .....................................

4.3.1 MATLAB Moment ..................................4.3.1.1 Full Rank Factorization .....................

4.4 The Moore-Penrose Inverse .................................

4.4.1 MATLAB Moment ..................................

4.4.1.1 The Moore-Penrose Inverse ..................

4.5 Solving Systems of Linear Equations ........................4.6 Schur Complements Again (optional) ........................

155160

167

167169

169

170

171

176

179

179179

190

190

190194

5 Generalized Inverses ........................................... 199

5.1 The {l}-Inverse ............................................ 199

5.2 {1,2}-Inverses ............................................. 208

5.3 Constructing Other Generalized Inverses ..................... 2105.4 {2}-Inverses ...............................................217

5.5 The Drazin Inverse ......................................... 223

5.6 The Group Inverse ......................................... 230

Contents xvii

6 Norms ......................................................... 233

6.1 The Nonmed Linear Space C" ................................. 2

6.2 Matrix Norms ............................................. 244

6.2.1 MATLAB Moment .................................. 2526.2.1.1 Norms .....................................252

7 Inner Products ................................................. 257

7.1 The Inner Product Space C" .................................257

7.2 Orthogonal Sets of Vectors in C" ............................ 2627.2.1 MATLAB Moment .................................. 269

7.2.1.1 The Gram-Schmidt Process ..................2697.3 QR Factorization ...........................................269

7.3.1 Kung's Algorithm ................................... 2747.3.2 MATLAB Moment ..................................276

7.3.2.1 The QR Factorization ....................... 276

7.4 A Fundamental Theorem of Linear Algebra ...................2787.5 Minimum Norm Solutions .................................. 2827.6 Least Squares ..............................................285

8 Projections .....................................................291

8.1 Orthogonal Projections ..................................... 291

8.2 The Geometry of Subspaces and the Algebraof Projections ..............................................299

8.3 The Fundamental Projections of a Matrix ..................... 3098.3.1 MATLAB Moment .................................. 313

8.3.1.1 The Fundamental Projections ................ 3138.4 Full Rank Factorizations of Projections .......................3138.5 Afline Projections ..........................................315

8.6 Quotient Spaces (optional) .................................. 324

9 Spectral Theory ................................................ 329

9.1 Eigenstuff................................................. 329

9.1.1 MATLAB Moment .................................. 3379.1.1.1 Eigenvalues and Eigenvectors

in MATLAB ............................... 3379.2 The Spectral Theorem ...................................... 3389.3 The Square Root and Polar Decomposition Theorems..........347

10 Matrix Diagonalization ....................................... 35110.1 Diagonalization with Respect to Equivalence ............... 35110.2 Diagonalization with Respect to Similarity ................. 357

xviii Contents

10.3 Diagonahzation with Respect to a Unitary .................. 37110.3.1 MATLAB Moment ............................... 376

10.3.1.1 Schur Triangularization ................. 37610.4 The Singular Value Decomposition ........................ 377

10.4.1 MATLAB Moment ............................... 38510.4.1.1 The Singular Value Decomposition ....... 385

11 Jordan Canonical Form ....................................... 38911.1 Jordan Form and Generalized Eigenvectors ................. 389

11.1.1 Jordan Blocks ...................................389

11.1.1.1 MATLAB Moment ..................... 39211.1.2 Jordan Segments ................................. 392

11.1.2.1 MATLAB Moment ..................... 39511.1.3 Jordan Matrices ..................................396

11.1.3.1 MATLAB Moment ..................... 39711.1.4 Jordan's Theorem ................................398

11.1.4.1 Generalized Eigenvectors ............... 402

11.2 The Smith Normal Form (optional) ........................ 422

12 Multilinear Matters ...........................................43112.1 Bilinear Forms .......................................... 43112.2 Matrices Associated to Bilinear Forms ..................... 43712.3 Orthogonality ........................................... 44012.4 Symmetric Bilinear Forms ................................ 44212.5 Congruence and Symmetric Matrices ...................... 44712.6 Skew-Symmetric Bilinear Forms .......................... 45012.7 Tensor Products of Matrices .............................. 452

12.7.1 MATLAB Moment ............................... 45612.7.1.1 Tensor Product of Matrices .............. 456

Appendix A Complex Numbers ................................. 459A.l What Is a Scalar? ........................................ 459A.2 The System of Complex Numbers ......................... 464A.3 The Rules of Arithmetic in C ..............................466

A.3.1 Basic Rules of Arithmetic in 0 .....................466A.3.1.1 Associative Law of Addition ..............466A.3.1.2 Existence of a Zero ...................... 466A.3.1.3 Existence of Opposites ................... 466A.3.1.4 Commutative Law of Addition ............466A.3.1.5 Associative Law of Multiplication .........467A.3.1.6 Distributive Laws ........................ 467A.3.1.7 Commutative Law for Multiplication ...... 467

Contents xix

A.3.1.8 Existence of Identity ..................... 467A.3.1.9 Existence of Inverses .................... 467

A.4 Complex Conjugation, Modulus, and Distance .............. 468A.4.1 Basic Facts about Complex Conjugation ............ 469A.4.2 Basic Facts about Magnitude ...................... 469A.4.3 Basic Properties of Distance ....................... 470

A.5 The Polar Form of Complex Numbers ...................... 473A.6 Polynomials over C ...................................... 480

A.7 Postscript ............................................... 482

Appendix B Basic Matrix Operations ........................... 485B.l Introduction ............................................. 485

B.2 Matrix Addition ..........................................487

B.3 Scalar Multiplication ..................................... 489

B.4 Matrix Multiplication ..................................... 490B.5 Transpose ............................................... 495

B.5.1 MATLAB Moment ............................... 502B.5.1.1 Matrix Manipulations .................... 502

B.6 Submatrices ............................................. 503B.6.1 MATLAB Moment ............................... 506

B.6.1.1 Getting at Pieces of Matrices ..............506

Appendix C Determinants ...................................... 509C.1 Motivation ...............................................509

C.2 Defining Determinants ....................................512

C.3 Some Theorems about Determinants ....................... 517C.3.1 Minors .......................................... 517

C.3.2 The Cauchy-Binet Theorem ........................517

C.3.3 The Laplace Expansion Theorem ................... 520C.4 The Trace of a Square Matrix .............................. 528

Appendix D A Review of Basics .................................531D.1 Spanning ................................................ 531

D.2 Linear Independence ..................................... 533

D.3 Basis and Dimension ..................................... 534D.4 Change of Basis ......................................... 538

Index ..............................................................543

Chapter 1

The Idea of Inverse

systems of linear equations, geometric view, vector view, matrix view

1.1 Solving Systems of Linear Equations

The central problem of linear algebra is the problem of solving a system oflinear equations. References to solving simultaneous linear equations that de-rived from everyday practical problems can be traced back to Chiu Chang SuanShu's book Nine Chapters of the Mathematical Art, about 200 B.C. [Smoller,2005]. Such systems arise naturally in modern applications such as economics,engineering, genetics, physics, and statistics. For example, an electrical engi-neer using Kirchhoff's law might be faced with solving for unknown currentsx, y, z in the system of equations:

1.95x + 2.03y + 4.75z = 10.023.45x + 6.43y - 5.02z = 12.132.53x +7.O1y+3.61z = 19.463.O1x+5.71y+4.02z = 10.52

Here we have four linear equations (no squares or higher powers on theunknowns) in three unknowns x, y, and z. Generally, we can consider a systemof m linear equations in n unknowns:

a11xj + a12x2 + ... + alnxn = b1a21x1 + a22x2 + ... + a2nxn = b2

a3lxI + a32x2 + ....+ a3nXn = b3

am I xl + am2X2 + ... + arnnxn = bm

1

2 The Idea of Inverse

The coefficients a,j and constants bk are all complex numbers. We use thesymbol C to denote the collection of complex numbers. Of course, real num-bers, denoted R, are just special kinds of complex numbers, so that case isautomatically included in our discussion. If you have forgotten about complexnumbers (remember i2 = -1 ?) or never seen them, spend some time reviewingAppendix A, where we tell you all you need to know about complex num-bers. Mathematicians allow the coefficients and constants in (1. I) to come fromnumber domains more general than the complex numbers, but we need not beconcerned about that at this point.

The concepts and theory derived to discuss solutions of a system of linearequations depend on at least three different points of view. First, we could vieweach linear equation individually as defining a hyperplane in the vector spaceof complex n-tuples C". Recall that a hyperplane is just the translation of an(n - 1) dimensional subspace in C". This view is the geometric view (or rowview). If n = 2 and we draw pictures in the familiar Cartesian coordinate planeR2, a row represents a line; a row represents a plane in R3, etc. From this pointof view, solutions to (1.1) can he visualized as the intersection of lines or planes.For example, in

J 3x-y=7x+2y=7

the first row represents the line (= hyperplane in R2) through the origin y = 3xtranslated to go through the point (0, -7) [see Figure 1.11, and the second rowrepresents the line y = -zx translated to go through (0, 2). The solution tothe system is the ordered pair of numbers (3, 2) which, geometrically, is theintersection of the two lines, as is illustrated in Figure I.1 and as the reader mayverify.

Another view is to look "vertically," so to speak, and develop the vector (orcolumn) view; that is, we view (1.1) written as

all a12 a1,, b1

a21 a22 a2n b2XI +X1 + + x _ (1.2)

a,nl amt am,, bm

recalling the usual way of adding column vectors and multiplying them byscalars. Note that the xs really should be on the right of the columns to producethe system (I.1). However, complex numbers satisfy the commutative law ofmultiplication (ab = ba), so it does not matter on which side the x is written.The problem has not changed. We are still obliged to find the unknown xs giventhe aids and the bks. However, instead of many equations (the rows) we havejust one vector equation. Regarding the columns as vectors (i.e., as n-tuples inC"), the problem becomes: find a linear combination of the columns on the left

/.1 Solving Systems of Linear Equations

Figure 1.1: Geometric view.

3

to produce the vector on the right. That is, can you shrink or stretch the columnvectors on the left in some way so they "resolve" to the vector on the right? Andso the language of vector spaces quite naturally finds its way into the funda-mental problem of linear algebra. We could phrase (1.2) by asking whether thevector (bl, b2, , bm) in C' is in the subspace spanned by (i.e., generated by)the vectors (all, a21, " ' , aml), (a12, a22, ' * *

, a , , , 2 ) , . ' , (aln, a2n, a,,,,).Do you remember all those words? If not, see Appendix D.

Our simple example in the vector view is

x7

L 1 i L 21 J- L 7 J'Here the coefficients x = 3, y = 2 yield the solution. We can visualize thesituation using arrows and adding vectors as scientists and engineers do, by thehead-to-tail rule (see Figure 1.2).

The third view is the matrix view. For this view, we gather the coefficients a,1of the system into an m-by-n matrix A, and put the xs and the bs into columns

4

Y T

The Idea of Inverse

(7,7);]

-> x

Figure 1.2: Vector view.

(n-by-1 and m-by-1, respectively). From the definition of matrix multiplicationand equality, we can write (1.1) as

all a12 a1n

a21 a22 a2n

xl

X2

b,

b2

(1.3)

a., I amt amn X11 bn,

or, in the very convenient shorthand,

Ax=b (1.4)

where

alla21

an,

alta22

a,n2

} x= }b,

b2

and b =

bm

1.1 Solving Systems of Linear Equations 5

This point of view leads to the rules of matrix algebra since we can view (1.1)as the problem of solving the matrix equation (1.4). The emphasis here is on"symbol pushing" according to certain rules. If you have forgotten the basics ofmanipulating matrices, adding them, multiplying them, transposing them, andso on, review Appendix B to refresh your memory. Our simple example can beexpressed in matrix form as

I 1 2'xy] 7

The matrix view can also be considered more abstractly as a mapping orfunction view. If we consider the function x H Ax, we get a linear transfor-mation from C" to C'" if A is m-by-n. Then, asking if (1.1) has a solution isthe same as asking if b lies in the range of this mapping. From (1.2), we seethis is the same as asking if b lies in the column space of A, which we shalldenote Col(A). Recall that the column space of A is the subspace of C'" gener-ated by the columns of A considered as vectors in C'". The connection betweenthe vector view and the matrix view is a fundamental one. Concretely in the3-by-3 case,

a b c ] x ] a ] bd e f y = d x+ eg h i z g h

c

fi

Z.

More abstractly,

xi

Ax = [Coll (A) C0120) ... Col,, (A)] X'

X11

= x1Coll (A) + x2Col2(A) + + (A).

Now it is clear that Ax = b has a solution if b is expressible as a linearcombination of the columns of A (i.e., b lies in the column space of A). Arow version of this fundamental connection to matrix multiplication can alsobe useful on occasion

a b c

[x y z] d e f =x[a b c]+y[d e f]+z[g h i].g h i


More abstractly,

Rowi(A)Row2(A)

xT A = [XI X2 ... X,

Row,,,(A)

= x1 Rowi(A) + x2Row2(A) + + x,,, Row(A).

You may remember (and easy pictures in R2 will remind you) that three casescan occur when you try to solve (1.1).

CASE 1.1A unique solution to (1.1) exists. That is to say, if A and b are given, there isone and only one x such that Ax = b. In this case, we would like to have anefficient algorithm for finding this unique solution.

CASE 1.2More than one solution for (1.1) exists. In this case, infinite solutions arepossible. Even so, we would like to have a meaningful way to describe allof them.

CASE 1.3No solution to (1.1) exists. We are not content just to give up and turn our backson this case. Indeed, "real life" situations demand "answers" to (I.1) even whena solution does not exist. So, we will seek a "best approximate solution" to (1.1),where we will make clear later what "best" means.

Thus, we shall now set some problems to solve. Our primary goal will be togive solutions to the following problems:

Problem 1.1Let Ax = b where A and b are specified. Determine whether a solution for xexists (i.e., develop a "consistency condition") and, if so, describe the set of allsolutions.

Problem 1.2Let Ax = b where A and b are specified. Suppose it is determined that nosolution for x exists. Then find the best approximate solution; that is, find xsuch that the vector Ax - b has minimal length.


Problem 1.3Given a matrix A, determine the column space of A; that is, find all b such thatAx = b for some column vector x.

To prepare for what is to come, we remind the reader that complex numberscan be conjugated. That is, if z = a + bi, then the complex conjugate of z is z =a -bi. This leads to some new possibilities for operations on matrices that we didnot have with real matrices. If A = [aid ], then A = [ai j ] and A* is the transpose

of A (i.e., the conjugate transpose) of A. Thus if A =L

2 + 3i 4+5i7-5i 6-3i

then A - f 2-3i 4-5i I and A*r 2 - 3i 7 + 5i 1. A matrix A7+5i 6+3i LL 4-5i 6+3i J

is called self-adjoint or Hermitian if A* = A. Also, det(A) is our way ofdenoting the determinant of a square matrix A. (See Appendix C for a reviewon determinants.)

Exercise Set 1

1. Write and discuss the three views for the following systems of linearequations.

5 4x+3y=17 .2x- y=10'

3x+2y+z=52x+3y+z=5.x+y+4z=6

2. Give the vector and matrix view of

5 (4 + 30z, + (7 - 'Tri )zz = - '13i1 (2 - i )z i + (5 + 13i )Z2 = 16 + 10i

3. Solve

J (1 + 2i)z + (2 - 31)w = 5 + 3i

(1 - i)z+ (4i)w= 10-4i

4. Solve

3ix+4iy+5iz= 17i2iy+7iz = 161.

3iz = 6i

5. Consider two systems of linear equations that are related as follows:

f aiixi + a12x2 = kiand

f

biiyi + bi2Y2 = xi

azi xi + azzxz = kz b21 Y1 + b22y2 = xz

8 The Idea of'Inverse

Let A be the coefficient matrix of the first system and B he the coefficientmatrix of the second. Substitute for the x's in the system using the secondsystem and produce a new system of linear equations in the y's. How doesthe coefficient matrix of the new system relate to A and B?

6. Argue that the n-tuple (s,, s2, ... , satisfies the two equations:

J arlxl+...+a1nxn=b1

ajlxl+...+ajnxn=bj

if and only if it satisfies the two equations:

a11x1 + + a;,,x = b;(ajl + carl)xl + ... + (ajn + ca;n)xn = bj + cb;{

for any constant c.

7. Consider the system of linear equations (1.1). Which of the followingmodifications of (1.1) will not disturb the solution set? Explain!

(i) Multiply one equation through by a scalar.(ii) Multiply one equation through by a nonzero scalar.

(iii) Swap two equations.(iv) Multiply the ilh equation by a scalar and add the resulting equation

to the j" equation, producing a new j`h equation.(v) Erase one of the equations.

8. Let S = {x I Ax = b} be the solution set of a system of linear equationswhere A is m-by-n, x is n-by- 1, and b is m-by-1. Argue that S could beempty (give a simple example); S could contain exactly one point but, ifS contains two distinct points, it must contain infinitely many. In fact, theset S has a very special property. Prove that if xl and x2 are in S, then7x1 + (1 - '\)X2 is also in S for any choice of it in C. This says the set ofsolutions of a system of linear equations is an affine subspace of Cn.

9. Argue that if x, solves Ax = b and x2 solves Ax = b, then x, - x2solves Ax = 6. Conversely, if z is any solution of Ax = -6 and is aparticular solution of Ax = b, then argue that xN + z solves Ax = b.

10. You can learn much about mathematics by making up your own examplesinstead of relying on textbooks to do it for you. Create a 4-by-3 matrix A


composed of zeros and ones where Ax = b has

(i) exactly one solution(ii) has an infinite number of solutions

(iii) has no solution.

If you remember the concept of rank (we will get into it later inSection 3.1 ), what is the rank of A in your three examples?

11. Can a system of three linear equations in three unknowns over the realnumbers have a complex nonreal solution? If not, explain why; if so, givean example of such a system.

Further Reading

[Atiyah, 2001 ] Michael Atiyah, Mathematics in the 20th Century, TheAmerican Mathematical Monthly, Vol. 108, No. 7, August-September,(2001), 654-666.

[D&H&H, 2005] Ian Doust, Michael D. Hirschhorn, and Jocelyn Ho,Trigonometric Identities, Linear Algebra, and Computer Algebra, TheAmerican Mathematical Monthly, Vol. 112, No. 2, February, (2005),155-164.

[F-S, 1979] Desmond Fearnley-Sander, Hermann Grassmann and the Cre-ation of Linear Algebra, The American Mathematical Monthly, Vol. 86,No. 10, December, (1979), 809-817.

[Forsythe, 1953] George Forsythe, Solving Linear Equations Can Be In-teresting, Bulletin of the American Mathematical Society, Vol. 59, (1953),299-329.

[Kolodner, 1964] Ignace I. Kolodner, A Note on Matrix Notation, TheAmerican Mathematical Monthly, Vol. 71, No. 9, November, (1964),1031-1032.

[Rogers, 1997] Jack W. Rogers, Jr., Applications of Linear Algebra inCalculus, The American Mathematical Monthly, Vol. 104, No. 1, January,(1997), 20-26.

10 The Idea of inverse

[Smoller, 2005] Laura Smoller, The History of Matrices, http://wwwualr. edu/ lasmoller/matrices. html.

[Wyzkoski, 1987] Joan Wyzkoski, An Application of Matrices to SpaceShuttle Technology, The UMAP Journal Vol. 8, No. 3, (1987), 187-205.

1.1.1 Numerical Note

1.1.1.1 Floating Point Arithmetic

Computers, and even some modern handheld calculators, are very usefultools in reducing the drudgery that is inherent in doing numerical computationswith matrices. However, calculations on these devices have limited precision,and blind dependence on machine calculations can lead to accepting nonsenseanswers. Of course, there are infinitely many real numbers but only a finitenumber of them can be represented on any given machine. The most commonrepresentation of a real number is floating point representation. If you havelearned the scientific notation for representing a real number, the followingdescription will not seem so strange.

A (normalized) floating point number is a real number x of the form

x = ±.d, d2d3 ... d, x be

where d, $ 0, (that's the normalized part), b is called the base, e is the exponent,and d, is an integer (digit) with 0 < d; < b. The number t is called theprecision, and ±.d, d7d3 . d, is called the mantissa. Humans typically preferbase b = 10; for computers, b = 2, but many other choices are available (e.g.,b = 8, b = 16). Note 1.6 x 102 is scientific notation but .16 x 103 is thenormalized floating point way to represent 160. The exponent e has limits thatusually depend on the machine; say -k < e < K. If x = ±.d,d2d3 . d, x beand e > K, we say x overflows; if e < -k, x is said to underflow. A number xthat can he expressed as x = ±.d,d2d3 ... d, x be given b, t, k, and K is calleda representable number. Let Rep(b, t, k, K) denote the set of all representablenumbers given the four positive integers b, t, k, and K. The important thingto realize is that Rep(b, t, k, K) is a finite set. Perhaps it is a large finite set,but it is still finite. Note zero, 0, is considered a special case and is always inRep(b, t, k, K). Also note that there are bounds for the size of a number inRep(b, t, k, K). If x # 0 in Rep(b, t, k, K), then bk-1 < jxl < bK(l - b-')

Before a machine can compute with a number, the number must he convertedinto a floating point representable number for that machine. We use the notationf l(x) = x` to indicate that x' is the floating point representation of x. That is

z=fl(x)=x(1+8),where 8

A'


signifies that i is the floating point version of x. In particular, x is repre-sentable if fl(x) = x. The difference between x and z is called roundoff

error. The difference x - x is called the absolute error, and the ratio IX Ix Ix I is

called the relative error. The maximum value for 161 is called the unit roundofferror and is typically denoted by E. For example, take x = 51, 022. Say wewish to represent x in base 10, four-digit floating point representation. Thenf l(51022) = .5102 x 105. The absolute error is -2 and the relative error isabout -3.9 x 10-5.

There are two ways of converting a real number into a floating point number:rounding and truncating (or chopping). The rounded floating point version xrof x is the t-digit number closest to x. The largest error in rounding occurswhen a number is exactly halfway between two representable numbers. In thetruncated version x,, all digits of the mantissa beyond the last to be kept arethrown away. The reader may verify the following inequalities:

(i)Ix`r-X1 <2b"IXI,I

(ii) 16rI < 26'-',

(iii)I1,-x1 :5 b'-' Ix I,(iv) 18,1 < b".

The reader may also verify that the unit roundoff satisfies

_ ' b'-' for roundingE0 = { b" for truncating

1.1.1.2 Arithmetic Operations

Errors can occur when data are entered into a computer due to the floatingpoint representation. Also, errors can occur when the usual arithmetic operationsare applied to floating point numbers. For x and y floating point numbers, wecan say

(i) fl(x + y) = (x + y)(I + s),

(ii) fl(Xy) = xy(1 + s),(iii) fl(x - y) = (x - y)(I + 8),(iv) f l(x _ y) = (X - y)(1 + s).

Unfortunately, the usual laws of arithmetic fail. For example, there existsrepresentable numbers x, y, and z, such that

(v) fl(fl(x+y)+z)0 fl(x+fl(y+z)),


that is, the associative law fails. It may also happen that

(vi) fl(x+y) # fl(x)+ fl(y),(vii) fl(ay) 54 fl(x)fl(y).

For some good news, the commutative law still works.

1.1.1.3 Loss of Significance

Another phenomenon that can have a huge impact on the outcome of floatingpoint operations is if small numbers are computed from big ones or if two nearlyequal numbers are subtracted. This is called cancellation error. It can happenthat the relative error in a difference can be many orders of magnitude largerthan the relative errors in the individual numbers.

Exercises

1. How many representable numbers are in R.ep(l0, 1, 2, 1)? Plot them onthe real number line. Are they equally spaced?

2. Give an example of two representable numbers whose sum is not repre-sentable.

3. Write the four-digit floating point representation of 34248, and computethe absolute and relative error.

Further Reading

[Dwyer, 1951 ] P. S. Dwyer, Linear Computations, John Wiley & Sons,New York, (1951).

[F,M&M, 1977] G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Com-puter Methods for Mathematical Computations, Prentice-Hall, Engle-wood Cliffs, NJ, (1977).

[G&vanL, 1996] Gene H. Golub and Charles F. Van Loan, Matrix Com-putations, 3rd edition, Johns Hopkins Press, Baltimore, MD, (1996).


[Higham, 1996] Nicholas J. Higham, Accuracy and Stability of NumericalAlgorithms, SIAM, Philadelphia, (1996).

[Householder, 1964] Alston S. Householder, The Theory of Matrices inNumerical Analysis, Dover Publications, Inc., New York, (1964).

[Kahan, 2005] W. Kahan, How Futile are Mindless Assessments ofRoundoff in Floating-Point Computation?, http://www.cs.berkeley.edu/wkahan/ Mindless. pdf.

[Leon, 1998] Steven J. Leon, Linear Algebra with Applications, 5thedition, Prentice Hall, Upper Saddle River, NJ, (1998).

1.1.2 MATLAB Moment

1.1.2.1 Creating Matrices in MATLAB

MATLAB (short for MATrix LABoratory) is an interactive program firstcreated by Cleve Moler in FORTRAN (1978) as a teaching too] for courses inlinear algebra and matrix theory. Since then, the program has been written in C(1984) and improved over the years. It is now licensed by and is the registeredtrademark of The Math Works Inc., Cochituate Place, 24 Prime Park Way,Natick, MA. A student edition is available from Prentice Hall.

As its name implies, MATLAB is designed for the easy manipulation ofmatrices. Indeed, there is basically just one object it recognizes and that is arectangular matrix. Even variable names are considered to be matrices. We as-sume the reader will learn how to access MATLAB on his or her local computersystem. You will know you are ready to begin when you see the commandline prompt: >>. The first command to learn is "help". MATLAB has quite anextensive internal library to assist the user. You can be more specific and ask forhelp on a given topic. Experiment! Closely related is the "lookfor" command.Type "lookfor" and a keyword to find functions relating to that keyword.

This first exercise helps you learn how to create matrices in MATLAB. Thereare a number of convenient ways to do this.

1.1.2.1.1 Explicit Entry Suppose you want to enter the matrix

1 2 3 4

2 3 4 1

3 4 1 2

There are a couple of ways to do this. First

> > A = [1 234;234 1;34 1 2]


will enter this matrix and give it the variable name A. Note we have used spacesto separate elements in a row (commas could he used too) and semicolons todelimit the rows. Another way to enter A is one row at a time.

> > A=[1234234134 121

A row vector is just a matrix with one row:

>> a=[4321]

A column vector can he created with the help of semicolons.

>> b = [1;2;3;4]

The colon notation is a really cool way to generate some vectors. m : ngenerates a row vector starting at m and ending at n. The implied step size is1. Other step sizes can be used. For example, m : s : n generates a row vectorstarting at m, going as far as n, in steps of size s.

For example

returns

and

>> y=2.0:-0.5:0

returns

y = 2.0000 1.5000 1.0000 0.5000 0

Now try to create some matrices and vectors of your own.

1.1.2.1.2 Built-in Matrices MATLAB comes equipped with many built-inmatrices:

1. The Zero Matrix: the command zeros(m, ii) returns an in-by-n matrixfilled with zeros. Also, zeros(n) returns an n-by-n matrix filled withzeros.

2. The Identity Matrix: the command eye(n) returns an n-by-n identitymatrix (clever, aren't they'?)


3. The Ones Matrix: the command ones(m, n) returns an m-by-n matrixfilled with ones. Of course, ones(n) returns an n-by-n matrix filled withones.

4. Diagonal Matrices: the function diag creates diagonal matrices. Forexample, diag([2 4 61) creates the matrix

2 0 0

0 4 0

0 0 6

Now create some of the matrices described above. Try putting the matrix Aabove into diag. What happens? What is diag(diag(A))?

1.1.2.1.3 Randomly Generated Matrices MATLAB has two built-in ran-dom number generators. One is rand, which returns uniformly distributed num-bers between zero and one, and the other is randn, which returns normallydistributed numbers between zero and one. For example,

>> A = rand(3)

might create the 3-by-3 matrix

0.9501 0.4860 0.45650.2311 0.8913 0.01850.6068 0.7621 0.8214

We can generate a random 5-by-5 matrix with integers uniformly distributedbetween 0 and 10 by using

floor(] 1 * rand(y)).

A random 5-by-5 complex matrix with real and imaginary parts being integersnormally distributed between 0 and 10 could be generated with

floor(11 * rand n(5)) + i *floor(] 1 * rand n(5)).

Experiment creating random matrices of various kinds.

1.1.2.1.4 Blocks A really neat way to build matrices is to build them up byblocks of smaller matrices to make a big one. The sizes, of course, must fittogether.


For example,

>> A = [B 5 * ones(2)eye(2) 3 * eye(2)]

returns

2 4 5 5

6 8 5 5

1 0 3 0

0 1 0 3

where B = [2 4; 6 8] has been previously created. The matrix A could alsohave been created by

> > A = [B 5 * ones(2); eye(2) 3 * eye(2)]

Very useful for later in the text is the ability to create block diagonal matrices.For example,

> > A = blkdiag(7 * eye(3), 8 * eye(2))

returns the matrix

7 0 0 0 0

0 7 0 0 0

0 0 7 0 0

0 0 0 8 0

0 0 0 0 8

Now create some matrices using blocks.

Further Reading

IH&H, 2000] Desmond J. Higham and Nicholas J. Higham, MATLABGuide, SIAM, Philadelphia, (2000).

[L&H&F, 1996] Steven Leon, Eugene Herman, and Richard Faulken-berry, ATLAST Computer Exercises for Linear Algebra, Prentice Hall,Upper Saddle River, NJ, (1996).

[Sigmon, 19921 Kermit Sigmon, MATLAB Primer, 2nd edition,http://math.ucsd.edu/"driver/2Id-s99/matlab-prinier.html, (1992).

I.2 The Special Case of "Square" Systems 17

matrix inverse, uniqueness, basic facts, reversal law, Henderson andSearle formulas, Schur complement, Sherman-Morrison-Woodburyformula

1.2 The Special Case of "Square" Systems

In this section, we consider systems of linear equations (l .1) where the numberof equations equals the number of unknowns; that is, m = n. In the formAx = b, A becomes a square (i.e., n-by-n) matrix while x and b are n-by-1column vectors. Thinking back to the good old high school days in Algebra I,we learned the complete story concerning the equation

ax = b (1.5)

over the real numbers R.If a = 0 and b = 0, any real number x provides a solution. If a = 0 but

b A 0, then no solution x exists. If a 0, then we have the unique solutionx = b/a; that is, we divide both sides of (2.1) by a. Wouldn't it be great if wecould solve the matrix equation

Ax = b (1.6)

the same way? However, as you well remember, dividing by a matrix was neveran allowed operation, but matrix multiplication surely was. Another way tolook at (1.5) is, rather than dividing by a, multiply both sides of (1.5) by thereciprocal (inverse) of a, I la. Then x = (1 /a)b is the unique solution to (1.5).Now, the reciprocal of a in R is characterized as the unique number c such thatac = I = ca. Of course, we write c as I /a or a-t so we can write x = a-' babove. We can translate these ideas into the world of square matrices.

DEFINITION 1.1 (matrix inverse)An n-by-n matrix A is said to have an inverse if there exists another n-by-nmatrix C with AC = CA = I,,, where I is the n-by-n identity matrix. Anymatrix A that has an inverse is called invertible or nonsingular.

It is important to note that under this definition, only "square" (i.e., n-by-n)matrices can have inverses. It is easy to give examples of square matrices that are

not invertible. For instance [ 0 2J

can not be invertible since multiplication

by any 2-by-2 matrix yields

[ 0 2 [ b d ] - [ 2b 2d] '


To get the identity matrix 12 = [ 2b 2d ] as an answer, b would have to

be I and 2b would have to be 0, a contradiction. On the other hand,

[ 1 II [ 0]=[01

0

- o][°0 1

We see some square matrices have inverses and some do not. However, when amatrix is invertible, it can have only one inverse.

THEOREM 1.1 (uniqueness of inverse)Suppose A is an n-by-n invertible matrix. Then only one matrix can act as an

inverse matrix for A. As notation, we denote this unique matrix as A-'. ThusAA-' = A -' A = !,, .

PROOF In typical mathematical fashion, to prove uniqueness, we assumethere are two objects that satisfy the conditions and then show those two objectsare equal. So let B and C he inverse matrices for A. Then C = IC = (BA)C =B(AC) = B! = B. You should, of course, be able to justify each equalityin detail. We have used the neutral multiplication property of the identity, thedefinition of inverse, and the associative law of matrix multiplication. 0

The uniqueness of the inverse matrix gives us a handy procedure for provingfacts about inverses. If you can guess what the inverse should be, all you needto verify are the two equations in the definition. If they work out, you must havethe inverse.

We next list some basic facts about inverses that, perhaps, you already know.It would be a good exercise to verify them.

THEOREM 1.2 (basic facts about inverses)Suppose the matrices below are square and of the same size.

1. !, is invertible for any n and 1,-, ' = 1,,.

2. If k # 0, k! is invertible and (k!")-' = 11,

3. If A is invertible, so is A-'and (A-')-' = A.

4. If A is invertible, so is A" for any natural number n and (A")-'

5. If A and B are invertible, so is AB and (AB)-' = B'A'. (This iscalled the "reversal law for inverses" by some.)

6. !f k :0 and A is invertible, so is kA and (kA)-' = k A-'.

1.2 The Special Case of "Square" Systems 19

7. If A is invertible, so is A AT, and A*; moreover, (AT)-i = (A-' )T'(A-') _ (A)-', and (A*)-' = (A-')`

8. If A2=I", then A=A-'.

9. If A is invertible, so is AAT, AT A, AA*, and A*A.

10. If A and B are square matrices of the same size and the product AB isinvertible, then A and B are both invertible.

PROOF This is an exercise for the reader.

There are many equivalent ways to say that a square matrix has an inverse.Again, this should be familiar material, so we omit the details. For the squarematrix A, let the mapping from C" to C' given by x H Ax be denoted LA.

THEOREM 1.3 (equivalents to being invertible)For an n-by-n matrix A, the following statements are all equivalent (TA.E.):

1. A is invertible.

2. The equation Ax = ' has only the trivial solution x = .

3. Ax = b has a solution for every n-by-1 b.

4. Ax = b has a unique solution for each n-by- I b.

5. det(A) # 0.

6. The columns of A are linearly independent.

7. The rows of A are linearly independent.

8. The columns of A span C".

9. The rows of A span C'.

10. The columns of A form a basis of C'.

11. The rows of A form a basis of C".

12. A has rank n (A has full rank).

13. A has nullity 0.

14. The mapping LA is onto.

15. The mapping L A is one-to-one.


PROOF Again the proofs are relegated to the exercises. 0

An important consequence of this theorem is the intimate connection betweenfinding a basis of C" and creating an invertible matrix by putting this basis ascolumns to form the matrix. We shall use this connection repeatedly. Now,returning to the question of solving a "square" system of linear equations, weget a very nice answer.

THEOREM 1.4 (solving a square system of full rank)Suppose Ax = b has an invertible coefficient matrix A of size n-by-n. Then

the system has a unique solution, namely x = A-' b.

PROOF Suppose Ax = b has the invertible coefficient matrix A. We claimxo = A-'b is a solution. Compute Axo = A(A-'b) = (AA-')b = lb = b.Next, if x, is some other solution of Ax = b, we must have Ax, = b. Butthen, A-' (Ax,) = A-' b, and so (A-' A)x, = A-' b. Of course, A-' A = I, soIx, = A-' b, and hence, x, = A` b = x0, proving uniqueness. 0

As an illustration, suppose we wish to solve

2x, + 7x2 + x3 = 2x,+4x2-x3 =2.x, + 3x2 = 4

As a matrix equation, we have

2 7 1 x, 2

4 -1 x2 = 2

I 3 0 x1 4

2 7 1

The inverse of I 4 -11 3 0

lution to the system is x2X1

X3

-3/2 -3/2is 1 /2 1 /2

1/2 -1/2-3/2 -3/21/2 1/21/2 -1/2

11/2-3/2 , so the so--1/211/2 2

-3/2 2 =-1/2 4

16

-4 , as you may verify.-2

Theorem 1.4 is hardly the end of the story. What happens if A is not invertible?We still have a long way to go. However, we note that the argument given in


Theorem 1.4 did not need x and b to be n-by-1. Indeed, consider the matrixequation

AX = B

where A is n-by-n, X is n-by-p and, necessarily, B is n-by-p. By the sameargument as in Theorem 1.4, we conclude that if A is invertible, then thismatrix equation has the unique solution X = A-' B. So, if

[ 3 4) [ u

we get

yV

1

9 8[u v]-[3 4]-'[6 5

_ 12 -1110.5 9.5

1

2

1.51

8

5

Our definition of matrix inversion requires two equations to be satisfied,AC = I and CA = I. Actually, one is enough (see exercise set 2, problem 21).This cuts down on our labor substantially and we will use this fact in the sequel.

1.2.1 The Henderson Searle Formulas

Dealing with the inverse of a product of invertible matrices is fairly easyin light of the "reversal law," (AB)-' = B-'A-1. However, dealing with theinverse of a sum is more problematic. You may consider this surprising sincematrix addition seems so much more straightforward than matrix multiplication.There are some significant facts about sums we now pursue. We present theformulas found in Henderson and Searle [H&S, 1981 ]. This notable paper givessome historical perspective on why inverting sums of matrices is of interest.Applications come from the need to invert partitioned matrices, inversion of aslightly modified matrix (rank one update), and statistics. Henderson and Searlegive some general formulas from which many interesting special cases result.We present these general formulas in this section.

The following lemma is an important start to developing the Henderson-Searle formulas. Note that the matrices U and V below are included to increasethe generality of the statements. They are quite arbitrary, but of appropriate size.

LEMMA 1.1Suppose A is any n-by-n matrix with I, + A invertible. Then

1. (I, + A)-' = I - A(1n + A)-' = I, - (I + A)-'A.


In particular

2. A(/ + A)-' = (1 + A)-' A.

PROOF Clearly 1, = (I + A) - A, so multiply by (I + A)-' from eitherside. 0

Another important lemma is given next.

LEMMA 1.2Suppose A, B, (B + VA-1U), (A + UB-' V) are all invertible with Vs-by-n,A n-by-n, U n-by-s, and B s-by-s. Then (B + V A-' U)-' V A-' = B-' V (A +UB-'V)-1.

PROOF Note VA-'(A+UB-'V)= V+VA-'UB-tV =(B+VA-'U)B-' V so V A-I (A + UB-' V) = (B + VA-' U)B-' V . Multiply both sidesby (A + UB-' V)-' from the right to get VA-' = (B + VA-' U)B-' V (A +UB-'V)-'. Multiply both sides by (B + VA-1U)-1 from the left to get(B+VA-IU)-IVA-1 =B-IV(A+UB-'V)-1. 0

A quick corollary to Lemma 1.2 is given below.

COROLLARY 1.1Suppose (I,, + VU) and (I,, + U V) are invertible. Then (1, + VU)-I V =V(I + UV)-'.

PROOF Take A = I and B = 1, in Lemma 1.2.

Finally, the Henderson-Searle formulas are given below.

0

THEOREM 1.5 (Henderson Searle formulas)Suppose A is n-by-n invertible, U is n-by-p, B is p-by-q, and V is q-by-n.

Suppose (A + U B V) ' exists. Then

1. (A+UBV)-' =A-'

2. (A + UBV)-' =A-'

3. (A+UBV)-' =A-' -A-'U(I,,+BVA-'U)-'BVA-',

4. (A + UBV)-' = A-' - A-1 UB(Iq + VA-'UB)-'VA-1,

/.2 The Special Case of "Square" Systems 23

5. (A + UBV)-' = A-' - A-'UBV(In + A-'UBV)-'A-',

6. (A + UBV)-' = A-' - A-'UBVA-'(In + UBVA-I)-I.

PROOF First, note In + A-'UBV = A-'(A + UBV), which is a productof invertible matrices (by hypothesis) and hence In + A-' UBV is invertible.For(I)then,

(A + UBV)-' _ [A(1,, + A-'UBV)]-'

((In - (I + A-'UBV)-'A -'UBV ))A= A-' - (In + A-'UBV)-'A-UBVA-'

using Lemma 1.1. Using (2) of that lemma, we see (1 + A UBV)-' A-' U B V= A-' UBV (In + A-' UBV)-' so formula (5) follows.

Now for formula (2), note (In + UBVA-1) = (A + UBV)A-1, whichis a product of invertible matrices and so (1 + UBVA-1)-I exists. UsingLemma 1.1 again, wesee(A+UBV)-' = A-'(In+UBVA-')-' = A-'[1 - (In + UBVA-')-'UBVA-'] = A-' - A-'(In +UBVA-')-'UBVA-'. Again, using (2) of the lemma, formula (6) follows.Formula (3) follows from formula (2) and the corollary above if we knew(It, +BVA-' U)-' exists. But there is a determinant formula (see our appendixon determinants) that says det(1 + U(BVA-')) = det(I,, + (BVA-')U) sothat follows. Now using the corollary, A-' [(In +(U)(B V A-' ))-' U] B VA-' =A-'[U(I,, +(BVA-')U)-']BVA-', hence (3) follows from (2). Finally, (4)follows from (3) by a similar argument. 0

The next corollary gives formulas for the inverse of a sum of two matrices.

COROLLARY 1.2Suppose A, B E C", A invertible, and A + B invertible. Then

(A+ B)-' = A-' -(1n +A-'B)-'A-'BA-'= A-' - A-'(In + BA-')-' BA-'= A-' - A-' B(I,, + A'B'A'= A-' - A-'BA-'(In + BA-1)-1.

PROOF Take U = V = In in Theorem 1.5.

The next corollary is a form of the Sherman-Morrison-Woodbury formulawe will develop soon using a different approach.


COROLLARY 1.3Suppose A, B and A + U B V are invertible. Let D = -B-' so B = Dthen (A - UD-'V)-l = A-' +A-'U(D - VA-'U)-'VA-'.

PROOF Using (3) from Theorem 1.5, (A - UD-' V)-' =A-' - A-'U[(1 - D-'VA-'U)-'(-D-')]VA-' =A-' - A-'U[(D-'D - D-'VA-'U)-'(-D-')]VA-' =A-' + A-' U(D - VA-1 U)-' VA-1 using distrihutivity and the reversal law

for inverses. 0

We have a special case involving column vectors.

COROLLARY 1.4Suppose A is n-by-n invertible, a is a scalar, A + auv* is invertible, 1 +av*A-' u # 0 , and u, v are n-by-1. Then

(A+auv*)-' = A-' - A 'uv*A-1

+av*A-lu'.

PROOF By (4) from Theorem 1.5 with B = a/, (A + uBv*)-' = (A +u(al)v*)-' = A-' - A-'u(al)(I +v*A-'u(al))-'v*A-' _

A_' - aA-'uv*A-' 0

1 +av*A-'u

We finish with a result for self-adjoint matrices.

COROLLARY 1.5Suppose A = A*, B = B*, A invertible, and A + UBU* invertible. Then(A + UBU*)-' = A-' - A-'UB(l + U*A-'UB)-U*A-'.

PROOF Use (4) from Theorem 1.5 and take V = U*.

1.2.2 Schur Complements and the Sherman-Morrison-WoodburyFormula

We end this section by introducing the idea of a Schur complement, namedafter the German mathematician Issai Schur (10 January 1875-10 January1941) and using it to prove the Sherman-Morrison-Woodbury formula. TheSchur complement has to do with invertible portions of a partitioned matrix.To motivate the idea of a Schur complement, we consider the 2-by-2 matrix


M = [c

d E C2,,2. Suppose a # 0. Then

I

1 0 a b 1 -a-)b-ca ) I c d] 0 1

a b I -a-(b-ca-)a+c -ca-(b+d ] [ 0 1

a -as-'b +b a 00 d - ca-)b = 0 d - ca-(b

Since the matrices on either side of M are invertible, M is invertible iff

[ a 0 )

Jis invertible iff d -ca-) b A 0.Of course, all of this assumes

0 d - ca- bthat a j4 0. Note that det(M) = a(d - ca-)b).

Similarly, if we assumed # 0 instead of a 54 0, then I01 1

J[ a d

Jx

[ 0 1 1 = [ d c 1 is invertible if M is. With different letters, we apply1 0 b a J

the above argument and conclude M is invertible iff a - bd-) c # 0.Again, there is the underlying assumption that d i4 0. Now we extend these

ideas to larger matrices. Let M A'tx" B"xt E (r(n+s)x(n+t) Assume ACsxn Dsxt

is nonsingular. Can we mimic what we did above'? Let's try.

DEFINITION 1.2 (Schur complements)

Consider a matrix M partitioned as M = [ Anxn Bnxt 1 E C(n++)x(n+t)Cv xn D,xt J

Define the Schur complement of A in M by MIA = D - CA-'B E C'Sxr

assuming, of course, that A is invertible. If M is partitioned IA,

x tBs xn

JE

Cnxt DnxnC(n+s)x(n+t) where D is now invertible, define the Schur complement of D inM by M//D = A - BD-)C.

1 5 9 13

suppose M =To illustrate2 3 4

5where A is the up per,

6 7 8

7

- ,

5 -4 -3 2

left 2-by-2 block [2

3

1. Then M/A =

8 -72,- [ 5

6 7 1 5 9 13 0 -164,[23]-)[4

5,-[ 10 24[ g3


The next theorem is due to the astronomer/mathematicianTadeusz Banachiewicz(1882-1954).

THEOREM 1.6 (Banacheiwicz inversion formula, 1937)Consider a matrix M of four blocks partitioned as

M=[ C D] where A is r-by-r, B is r-by-s, C is s-by-r, and D is s-by-s.

1. if A-' and (M/A)-'exist, then the matrix [C

BJ

is invertible and

[A B ' _ A-' + A-' BS-' CA-' -A-' BS-'C D ] = [ -S-'CA-' S' I

where S = M/A.

2. if D-' and (M//D)-'exist, then the matrix [ A D J is invertible and

A B ' _ T-' -T-'BD-I[ C D ] = [ -D-'CT-' D-' + D-'CT-'BD-' ]

where T = M//D.

PROOF We will prove part (I) and leave part (2) as an exercise. Suppose A isinvertible and S = D - CA-' B is invertible also. We verify the claimed inverseby direct computation, appealing to the uniqueness of the inverse. Compute

I

A B A-' +A-'BS-'CA-'C D -S-'CA-'

_ AA-' + AA-' BS-' CA-' - BS-'CA-1C[A-' + A-' BS-'CA-'] + D[-S-'CA-']

-A-' BS-' 1S-' J

-AA-' BS-' + BS-'-CA-' BS-' + DS-'

[ I + IBS-'CA-' - BS-'CA-' -IBS-' + BS-'CA-' + CA-' BS-'CA-' - DS-'CA-' -CA-'BS-1 + DS-'

A few things become clear. The first row is just what we hoped for. The firstblock is /, and the second is so we are on the way to producing the identitymatrix. Moreover, the lower right block

-CA-' BS-' + DS-' = (-CA-' B + D)S-' = SS-' = I.


We are almost there! The last block is

CA-' +CA-'BS-'CA-' - DS-'CA-'= CA-' + [CA-'BS-' - DS-']CA-'= CA-' + [[CA-'B - D]S-']CA-' = CA-' - SS-'CA-' = 0

and we are done. 0

It is of interest to deduce special cases of the above theorem where B = 0or C = 0, or both B and C are 0. (See problem 30 of exercise set 2.) Note, inthe latter case, we get

L OD0

®' D®'

Statisticians have known an interesting result about the inverse of a certainsum of matrices since the late 1940s. A formula they discovered has uses inmany areas. We look at a somewhat generalized version next.

THEOREM 1.7 (Sherman-Morrison- Woodbury formula)Suppose A is n-by-n nonsingular and G is s-by-s nonsingular, where s <

n. Suppose C and D are n-by-s and otherwise completely arbitrary. ThenA + CGD* is invertible if G-' + D*A-'C is invertible, in which case

(A+CGD*)-' = A-' - A-'C(G-' + D*A-'C)-'D*A-'.

PROOF We will use a Schur complement to prove this result. First, supposethat A and G are invertible and S = G-' + D*A-'C is invertible. We must

show A + CGD* is invertible. We claimL

-D* G 'J

is invertible and

A C ' A-' - A-'CS-'D*A-' -A-'CS-1[ -D* G-' S-' D*A-' S-' Note the

Schur complement of A is G-' - (-D*)A-'C = G-' + D*A-'C, which isprecisely S. The theorem above applies; therefore

A-D*

CG-' ]

A-' + A-'CS-'(-D*)A-' -A-'CS-- [ -S-'(-D*)A-i S-'

'

- r A-' -A 'CS-'D*A-1 -A-' CS-1L S- D* A-' S-'

The claim has been established. But there is more to show. The next claim is

that 1 ® -CG1 is invertible. This is easy since

L

I C'J

is its inverse,


as you can easily check. Also, IGD* 1

I

®J . Now consider

f

is invertible since its inverse is

/ -CG Ir A C

I

/

1 -D* G' GD* /

Lr

A + CGD* 00 G-'

1

Being the product of three invertible matrices, this matrix must be invertible. Butthen, the two nonzero diagonal block matrices must he invertible, so A +CGD*must he invertible. Moreover,

A + CGD*0

but also equals

0GD* I

]-1 [-AD*

0 (A + CGD*)-' ®1G

_ 0 A-' - A-' CS-' D*A-' -A-'CS-1 / CG-G/D* I][ S-1 D*A-' S 110 /

_ A-' - A-' CS-' D*A-' stuffstuff stuff

Now compare the upper left blocks.We leave the converse result as an easy exercise. 0

The strength of a theorem is measured by the consequences it has. There aremany useful corollaries to this theorem.

COROLLARY 1.6For matrices of the appropriate size, we have

1. If A, G, and A + G are invertible, (A + G)-' = A-' - A-'(G-1 +A-')-'A-'.

2. If A is invertible and n-by-n, C and D are n-by-s, and (I + D*A-'C)-'exists, then (A+ CD*)-' = A-' - A-'C(1 + D*A-'C)-' D*A-1.


3. (1 + CD*)-' = I - C(1 + D*C)-I D*, if (I + D*C)-I exists.

4. Let c and d be n-by-1. Then (A +cd*)-' = A - I + cd c' providedd*A-'c # -1 and A-' exists.

cd*5. Let c and d be n-by-1. Then (I + cd*)-' = I - I + d*c ,

if I + d*c 00.

uv*6. If u and v are n-by-1, then (I - uv*)-1 = 1-

v*uifv*u 54 I.

- 1'

PROOF The proofs are left as exercises. a

We look at a special case of (2) above. Recall the "standard basis" vectorsfo1

ej = where the I appears in the jth position. Then we change the1

L 0 J(i, j) entry of a matrix A by adding a specified amount a to this entry. If A isinvertible and the perturbed matrix is still invertible, we have a formula for itsinverse. Some people say we are inverting a "rank one update" of A.

(A+ae;e')-' = A-' -a A-'e,e, A-' _ A-' -acol;(A-')rowj(A-')j I +ae*A-'e; I +aentji(A-1)

We illustrate in the exercises (see exercise 16 in exercise set 2).

Exercise Se t 2i 0 0 0 0 i

1. If A = 0 i 0 , what is A-'? What if A = 0 i 0 ?

0 0 i i 0 0

2+i 3+i 4+i2. Find A* and (A*)-i , if A = 5 + 3i 6 + 3i 6 + 2i

8+2i 8+3i 9-3i


3. If A is an m-by-n matrix, what is Aej, e A, e, Ae1, where e, is the standardbasis vector? What is ej*e;? What is eke;*?

4. Suppose A and B are two m-by-n matrices and Ax = Bx for all n-by-Imatrices x. Is it true that A = B? Suppose Ax = 0 for all x. What canyou say about A?

I a b

5. Argue that 0 1 c is always invertible regardless of what a, b, and0 0 1

1 0 0c are and find a formula for its inverse. Do the same for a 1 0

b c I

6. Suppose A is n-by-n and A3 = I,,. Must A be invertible? Prove it is orprovide a counterexample.

7. Cancellation laws: Suppose B and C are m-by-n matrices. Prove

(i) if A is m-by-m and invertible and AB = AC, then B = C.(ii) if A is n-by-n and invertible and BA = CA, then B = C.

8. Suppose A and B are n-by-n and invertible. Argue that A-' + B-' _A-'(A + B)B-'. If A and B were scalars, how would you have beenled to this formula? (Hint: consider ( + h)). If (A + B)-' exists, whatis (A-' + B-')-' in view of your result above?

9. Suppose U is n-by-n and U2 = I. Argue that I + U is not invertibleunless U = I.

10. Suppose A is n-by-n, A* = A, and A is invertible. Prove that (A-')* _A-'.

11. Suppose A is n-by-n and invertible. Argue that AA* and A*A are invert-ible.

12. Prove or disprove the following claim: Any square matrix can be writtenas the sum of two invertible matrices.

13. Verify the claims made in Theorem 1.2.

14. Complete the proof of Theorem 1.5.

15. Fill in the details of the proof of Corollary 1.1 and Corollary 1.6.

1.2 The Speci al Case of "Square" Systems 31

14 17 3

16. Let A = 17 26 5 . Argue that A is invertible and A- ' _3 5 1

1 -2 7 14 17 3

-2 5 -19 . Now consider the matrix Anew = 17 24 5

7 -19 75 3 5 1

Use the rank one update formula above to find (Anew)-I

17. Suppose A is invertible and suppose column j of A is replaced by anew column c, so that the new matrix A, is still invertible. Argue that

(A,)-' = A-' - (A-1c - ej)rowj(A-1)Continuing with the matrix in

rowj(A- )c1 17 3

exercise 16, compute the inverse of 1 26 5

1 5 1

18. Suppose we have a system of linear equations Ax = b, where A isinvertible. Suppose A is perturbed slightly to A + cd*. Consider thesystem of linear equations (A + cd*)y = b, where (A + cd*) is stillinvertible. Argue that the solution of the perturbed system is y = A-l b -A-icd*A-lbI +d*A-'c

19. Argue that for a small enough, A + ee;e* remains nonsingular if A is.

20. Suppose A E Cand B E C"x" are invertible. What can you say

about the matrixL

®B

l ? (Hint: Use Theorem 1.6.) How about the

converse? LJ

21. In our definition of inverse, we required that the inverse matrix workon both sides; that is, AC = I and CA = I. Actually, you can proveAC = I implies CA = I for square matrices. So, do it.

22. Prove the claims of Theorem 1.3. Feel free to consult your linear algebrabook as a review.

23. If A is invertible and A commutes with B, then does A-' commute withB as well? Recall we say "A commutes with B" when AB = BA.

24. If I+ AB is invertible, is I+ BA also invertible? If so, is there a formulafor the inverse of I + BA?

32 The Idea of'Inverse

25. Suppose P P2. Argue that I - 2P is its own inverse. Generalize thisto (I - a P) ' for any a.

26. Suppose A is not square but AA* = 1. Does A*A = I also?

1 r s

27. Compute det -r I t . What, if anything, can you conclude-s -t 1

about the invertibility of this matrix?

28. Argue that A, a square matrix, can not be invertible if each row of A sumsto 0.

29. Argue that I ®J

is always an invertible matrix regardless of what

A is. Exhibit the inverse matrix.

30. Investigate the special cases of the Schur complements theorem whereB = 0, C = 0, or both are zero. Be clear what the hypotheses are.

1 - X 1 1

31. For which values of X is 0 2 - A 3 an invertible0 -3 2-

matrix'?

a b c

32. Is the matrix d e 0 invertible'? Do you need any conditions onf 0 0

the entries'?

33. Suppose Ax = Xx, where k is a nonzero scalar and x is a nonzero vector.Argue that if A is invertible, then A-'x = fx.

34. Can a skew symmetric matrix of odd order be invertible'?

1 I 1

35. Let A = n / + = n 1 + B. Is A invertible? If

I 1 1

so, what is A-''? What is the sum of all entries in A"? (Hint: What isB2?)

36. Suppose A and B are nonsingular. Argue that [(AB)-l ]T = [(AB)T ]-I.


1 0 0 0 0

a 1 0 0 0

37. Let L5(a) = 0 a 1 0 0 . What is (L-5(a))-19 Can you

0 0 a 1 0

0 0 0 a 1

generalize this example to (L,, (a))-'?

38. Let A be a square and invertible matrix and let X be such that Ak+, X = Akfor some k in N. Argue that X = A-'.

1 0 0 01 0 0

1 0 -2 1 0 039. FindA-' ifA =

-2 11

-2 1 01 -2 0 0

1 -2 1

2 10 1

etc. Do you see a pattern?-

2 -1 0 02 -1 0

- 1 2 -1 040. Find A-' if A = -1 2 -1 , etc.

0 1

-2 1

-0 -1 20 0 -1 2

Do you see a pattern?

41. Let A = I

A A,2I , where A is k-by-k and invertible. Argue that

A2,

A22

A -L 1 ®1 All A,2

A2,Aii 1 J L 0 A22 - A21Ai,'A,2

r A l r x l b42. Consider

aI T

J

I J= , where A is n-by-n andLd a x,, l bn+,

invertible and a, b, d e C". Argue that if a -dTA-'a # 0, thenb,,+, -dTA-'b and x=A-'b-xn+,A-'a.

a - dTA-la

43. Argue that det(A + cd T) = det(A)(1 + dTA-Ic). In particular, deducethat det(1 + cdT) = I + dre.

44. Suppose v is an n-by-1, nonzero column vector. Argue that an invertiblematrix exists whose first column is v.


1 1 I I

45. What is the inverse of - I I I ? How about the inverse

0 1 1 1

of 1 01 1 '? Do you notice anything interesting?

-1 -1 -1 0

46. Under what conditions isab

-bJ

invertible'? Determine the inversesa

in these cases.

A-' 0 1 [ A U I A'U47. Argue that i = i

-VA - I V D 0 D- V A- U

where, of course, A is invertible. Deduce that det(I A U

LL

V D= det(A)det(D - V A-' U) = det(D)det(A - U D-' V). As a corollary,conclude that det(I + V U) = det(I + UV).

48. Suppose A, D, A - BD-' C, and A - BD-' C are invertible. Prove thatA B (A - BD-'C)-'_' -(A - BD-'C)-'BD-1C

] [D -(D - CA-' B)-'CA D - CA-'B -'

(Hint: See Theorem 1.6.)

49. Suppose A, B, C, and D are all invertible, as are A - BD-'C, C -

DB-' A, B - AC-'D, and D - CA-' B. Argue thatA B

C D(A - BD-'C)-' (C - DB-'A)-'(B-AC-'D)-' (D - CA- B-' ]

50. Can you derive the Sherman-Morrison-Woodhury formula from theHenderson-Searle formulas'?

51. Argue that (D - CA-'B)-'CA-' = D-'C(A - BD-'C)-'. (Hint:Show CA-'(A - BD-'C) = (D - CA-'B)D-'C.) From this, deduceas a corollary that (1 + AB)-' A = A(1 + BA)-'.

52. Argue that finding the inverse of an n-by-n matrix is tantamount to solvingn systems of linear equations. Describe these systems explicitly.

53. Suppose A + B is nonsingular. Prove that A - A(A + B)-IA =B - B(A + B)-'B.


54. Suppose A and B are invertible matrices of the same size. Prove thatA-' + B-' = A-'(A+ B)B-1.

55. Suppose A and B are invertible matrices of the same size. Suppose furtherthat A-' + B-' = (A + B)-1. Argue that AB-'A = BA-'B.

56. Suppose det(1,,, + AA*) is not zero. Prove that (1,,, + AA*)-' = I -A(I + A*A)-' A*. Here A is assumed to be m-by-n.

57. Suppose C = C* is n-by-n and invertible and A and B are arbitrary andn-by-n. Prove that (A - BC-')C(A - BC-')* - BC-' B* = ACA* -BA* - (BA*)*.

58. Refer to Theorem 1.6. Show that in case (1),I-'

L C D -[ 0I

-A1-'B

J L0

A ®'-CA-' I

59. Continuing our extension of Theorem 1.6, show thatA B

C D_ (A - BD-'C)-' -A-' B(D - CA-' B)-'

-D-'C (A - BD-'C)-' (D - CA-' B)-'

60. Use exercise 59 to derive the following identities:

(a) (A - BD-'C)-' BD-' = A-'B(D - CA-'B)-l(b) (A-' - BD-'C)-' BD-' = AB(D - CA-' B)-'(c) (A + BD-'C)-' BD-' = A-' B(D + CA-' B)-'(d) (D - CA-' B)-' = D-' + D-C (A - BD-'C)-' BD-'(e) (D - CA-' B)-' = D-' - D-'C (BD-'C - A)-' BD-'(I) (D + CA-' B)-' = D-' - D-'C (BD-'C + A)-' BD-'(g) (D-' + CA-' B)-' = D - DC (BDC + A)-' BD(h) (D - CAB)-' = D-' - D-'C (BD-'C - A-')-' BD-'(i) (D + CAB)-' = D-' - D-'C (BD-1C +A-')-' BD.

1-1

61. Apply Theorem 1.6 and rthe related 1problems above to the special case

of the partitioned matrixI.

B* D J . What formulas and identities can

you deduce?

62. Prove the Schur determinant formulas. Let M = A BC D

(i) If det(A) # 0, then det(M) = det(A)det(M/A).(ii) If det(D) 0 0, then det(M) = det(Ad)det(M//D).

36 The Idea of inverse

Further Reading

[Agg&Lamo, 2002] RitaAggarwala and Michael P. Lamoureux, Invertingthe Pascal Matrix Plus One, The American Mathematical Monthly, Vol.109, No. 4, April, (2002), 371-377.

[Banachiewicz, 1937] T. Banachiewicz, Sur Berechnung der Determi-nanten, wie auch der Inversen, and zur darauf basierten aufiosung derSysteme Linearer Gleichungen, Acta Astronomica, Serie C, 3 (1937),41-67.

[B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and VectorSpaces, Vol. 2, Chapman & Hall, New York, (1986).

[C&V 1993] G. S. Call and D. J. Velleman, Pascal's Matrices, The Amer-ican Mathematical Monthly, Vol. 100, (1993), 372-376.

[Greenspan, 1955] Donald Greenspan, Methods of Matrix Inversion, TheAmerican Mathematical Monthly, Vol. 62, May, (1955), 303-318.

[Hager, 1989] W. W. Hager, Updating the Inverse of a Matrix, SIAM Rev.,Vol. 31, (1989), 221-239.

[H&S, 19811 H. V. Henderson and S. R. Searle, On Deriving the Inverseof a Sum of Matrices, SIAM Rev., Vol. 23, (1981), 53-60.

[M&K, 2001 ] Tibor Mazuch and Jan Kozanek, New Recurrent Algorithmfor a Matrix Inversion, Journal of Computational and Applied Mathemat-ics, Vol. 136, No. 1-2, 1 November, (2001), 219-226.

[Meyer, 2000] Carl Meyer, Matrix Analysis and Applied Linear Algebra,SIAM, Philadelphia, (2000).

[vonN&G, 1947] John von Neumann and H. H. Goldstine, Numerical In-verting of Matrices of High Order, Bulletin of the American MathematicalSociety, Vol. 53, (1947), 1021-1099.

[Zhang, 2005] F. Zhang, The Schur Complement and Its Applications,Springer, New York, (2005).


1.2.3 MATLAB Moment

1.2.3.1 Computing Inverse Matrices

If A is a square matrix, the command for returning the inverse, if the matrixis nonsingular, is

inv(A)

Of course, the answer is up to round off. Let's look at an example of a random4-by-4 complex matrix.

> >format rat

>>A = fix(l0 * rand(4)) + fix(l0 * rand(4)) * i

A=

Columns 1 through 3

9+9i 8 8+ li2+9i 7+3i 4+2i6+4i 4+8i 6+ Ii4+8i 0 7+6i

Column 4

9+2i7+li1

4+7i>>inv(A)ans =

Columns I through 3

1/2 - 1/6i -959/2524+516/2381i 291/4762+520/4451i-1/10 - 2/1 51 1072/6651 + 139/1767i 212/4359 - 235/26361

-7/10 + 7/3 01 1033/2317 - 714/1969 71/23810 - 879/509811/ 10 - 1 113

Column 4

0i 129/1379+365/1057i 31/11905 + 439/81811

-489/2381 + 407/2381 i399/7241 + 331/338 1 i683/1844 - 268/1043i-6/11905+81/601i


We ask for the determinant of A which, of course, should not be zero

>>det(A)ans =

-396 + 1248i

> >det(inv(A))

ans =

-11/47620 - 26/35715i

> > det(A) * det(inv(A))

ans =

I- 1/1385722962267845i

Note that, theoretically, det(A) and det(inv(A)) should be inverses to oneanother. But after all, look how small the imaginary part of the answer is inMATLAB. We can also check our answer by multiplying A and inv(A). Again,we must interpret the answer presented.

>> A*inv(A)

ans =

Columns I through 3

*-* 1+* *-*

*-* *-* *-*

Column 4

We need to recognize the identity matrix up to round off. To get a moresatisfying result, try

>>round(A*inv(A)).

There may be occasions where you actually want a singular square matrix.Here is a way to construct a matrix that is not only singular but has a prescribeddependency relation among the columns. To illustrate let's say we want the thirdrow of a matrix to be two times the first row plus three times the second.

1.2 The Special Case of "Square" Systems

> > A [III 1; 2222; 3333; 4444]

A=i l 1 I

2 2 2 2

3 3 3 3

4 4 4 4

> > A(:, 3) = A(:, 1 : 2) * [2, 3]'

A=1 1 5 1

2 2 10 2

3 3 15 3

4 4 20 4

>>det(A)ans =

0

39


1.2.4.1 Matrix Inversion

Inverting a matrix in floating point arithmetic has its pitfalls. Consider the_ i l

simple example, A= I I I I n

Jwhere n is a positive integer. Exact

arithmetic gives A` _ n -nn IJ .

However, when n is large, the

machine may not distinguish -n + I from -n. Thus, the inverse returned

would be n -nJ ,

which has zero determinant and hence is singular.-n nUser beware!

1.2.4.2 Operation Counts

You probably learned in your linear algebra class how to compute the inverseof a small matrix using the algorithm [A 1 I] -> [1 I A-'] by using Gauss-Jordan elimination. If the matrix A is n-by-n, the number of multiplications andthe number of additions can be counted.

1. The number of additions is n3 - 2n2 + n.

2. The number of multiplications is n3.

For a very large matrix, n3 dominates and is used to approximate the numberof additions and multiplications.

Chapter 2

Generating Invertible Matrices

Gauss elimination, back substitution, free parameters, Type 1,Type 1!, and Type 111 operations; pivot

2.1 A Brief Review of Gauss Elimination with BackSubstitution

This is a good time to go back to your linear algebra book and review theprocedure of Gauss elimination with back substitution to obtain a solution to asystem of linear equations. We will give only a brief refresher here. First, werecall how nice "upper triangular" systems are.

Suppose we need to solve

2x,+x2-x3=5X2+x3=3.

3x3 = 6

There is hardly any challenge here. Obviously, the place to start is at thebottom. There is only one equation with one unknown; 3X3 = 6 so X3 = 2.Now knowing what x3 is, the second equation from the bottom has only oneequation with one unknown. So, X2 + 2 = 3, whence x2 = 1. Finally, since weknow x2 and x3, x, is easily deduced from the first equation; 2x, + 1 - 2 = 5, sox, = 3. Thus the unique solution to this system of linear equations is (3, 1, 2).Do you see why this process is called back substitution? When it works, it isgreat! However, consider

2X, + X2 - x3 = 5X3=2.

3X3 = 6

We can start at the bottom as before, but there appears to be no way to obtain avalue for X2. To do this, we would need the diagonal coefficients to be nonzero.

41

42 Generating /nvertible Matrices

Generally then, if we have a triangular system

a11xl + a12X2 + ... + alnx, = bl

a22x2 + ... + a2nxn = b2

annx,, = bpi

and all the diagonal coefficients aii are nonzero, then back substitution willwork recursively. That is, symbolically,

xn = bnlann1

X11-1 = (bn-1 - an-Inxn)an-In-I

xi = 1 (bi - ai.i+l xi+l - ai.i+2xi+2 - - ainxn)aii

for i =n - 1,n -2,... ,2, 1.

What happens if the system is not square? Considerx1 + 2x2 + 3x3 + xa - x5 = 2

x3 + xa + x5 = -1xa - 2x5 = 4

{x l + 3x3 + xa = 2 - 2x2 + x5

X3 + X4 = -1 - X5xa=4+2x5

Now back substitution gives

xl = 13 - 2x2 + 8x5X3 = -5 - 3x5

xa=4+2x5.X2 = X2

X5 = X5

Clearly, x2 and z5 can be any number at all. These variables are called 'free."So we introduce free parameters, say x2 = s, x5 = t. Then, all the solutions tothis system can he described by the infinite set {(13 - 2s + 8t, s, -5 - 3t, 4 +2t, t) I s, t are arbitrary}.

Wouldn't it he great if all linear systems were (upper) triangular? Well, theyare not! The question is, if a system is not triangular, can you transform itinto a triangular system without disturbing the set of solutions? The method wereview next has ancient roots but was popularized by the German mathematician

. We can make it triangular, sort of:

2.1 A Brief Review of Gauss Elimination with Back Substitution 43

Johann Carl Friedrich Gauss (30 April 1777 - 23 February 1855). It is knownto us as Gauss elimination.

We call two linear systems equivalent if they have the same set of solutions.The basic method for solving a system of linear equations is to replace thegiven system with an equivalent system that is easier to solve. Three elementaryoperations can help us achieve a triangular system:

Type 1: Interchange two equations.

Type 11: Multiply an equation by a nonzero constant.

Type 111: Multiply an equation by any constant and add the result toanother equation.

The strategy is to focus on the diagonal (pivot) position and use Type IIIoperations to zero out (i.e., eliminate) all the elements below the pivot. If wefind a zero in a pivot position, we use a Type I operation to swap a nonzeronumber below into the pivot position. Type II operations ensure that a pivot canalways be made to equal 1. For example,

I x, + 2x2 + 3x3 = 6

2x, + 5x2 + x3 = 9

X1 + 4x2 - 6x3 = 1

pivot

all = 1 {x, + 2x2 + 3x3 = 6

X2-5X3 = -32X2 - 9X3 = -5

pivot(I)

a22x, + 2x2 + 3x3 = 6

X2 - 5x3 = -3X3 = 1

Back substitution yields the unique solution (-1, 2, 1). As another example,consider

x, + 2x2 + x3 + x4 = 4 x, + 2x2 + x3 + x4 = 42x, + 4X2 - x3 + 2x4 = I I - -3x3 = 3 -

X1 + X2 + 2x3 + 3x4 = I -x2 + x3 + 2x4 = -3

x, + 2x2 + X3 + X4 = 4 x, + 2x2 + X3 = 4 - X4-x2+X3+2X4 = -3 -X2+X3 = -3-2X4.

-3X3 = 3 -3X3 = 3

We see x4 is free, so the solution set is

{(1 - 5t, 2 + 2t, -1, t) I t arbitrary}.

However, things do not always work out as we might hope. For example, ifa zero pivot occurs, maybe we can fix the situation and maybe we cannot.

44 Generating Invertible Matrices

Consider

I x+y+z= 4 I x+y+z= 42x + 2Y + 5z = 3 -> 3z = -5 .

4x + 4y + 8z = 10 4z = -6

There is no way these last two equations can he solved simultaneously. Thissystem is evidently inconsistent - that is, no solution exists.

Exercise Set 3

1. Solve the following by Gauss elimination:

(a)

(b)

(c)

(d)

(e)

(0

2x, + x2 - x3 = 2

X1 + 3x2 + 2x3 = 1

6x, - 3x2 - 3x3 = 12

4x, - 4x2 + 8x3 = 20{

{

I

2x, + 2x2

4x, + 2x2

9x1 + 18x2

10x, + 10x2

- 6x3 - 8x4 = -2+ 10x3 + 2x4 = 10

- 6x3 + 3x4 = 24+ 10x3 - 15x4 = 10

,f2-x, + 2x2 + 3x3

2fx, + 3x2 + x3

3,/2-x, + x2 + 2x3

Six, + 3ix2 - 6ix3 + 3ix4 + 9ix5 = 3i

4x, - 2x, + 4x3 + 4x4 + 12x5 = 4

X1 + ;X2 - 3x3 X4 - 3x5 = 1

{

X1 + x2 + x3 + X4 = 3

3x, + 4x2 - 5x3 - 6x4 = 1

4x, + 5x2 - 4x3 - 6x4 = 5{

4x, + (2 - 2i)x2 = i

1 (2 - 2i)x, - 2x2 = i

(g) (x, + 2x2 + 3x3 = 4

2.1 A Brief Review of'Gauss Elimination with Back Substitution 45

(h)

(i)

G)

(k)

XI + ZX2 - 2X3 - X4 = 0

X2 - jX3 - X4 = 0

X1 - 4X2 + 2X3 + 4X4 = 0

xI - IZx2 + 2x3 + 4x4 = 02

{xI + X2 + X3 = 3

2x1 + 3x2 + X3 = 6

5x1 - X2 + 3x3 = 7

{X1 - 12-X 2 + X3 = 'rte

2x1 + X2 - 1/

3x1 + 4x2 - y 2_x3 = 37r + 2,/2-

I+ 6x2 + (16 + 12i) x3 = 26

-4ix1 + (2 - 6i) X2 + (16 - 12i)x3 = 8 - 26i.

3ix2 + (-3 + 91)x3 = 3 + 12i

2. Consider the nonlinear problem

{x2 + 2xy +2x2 + xy +3x2 + 3xy +

y2 = 1

3y2 = 29.

y2 = 3

Is there a way of making this a linear problem and finding x and y?

3. Suppose you need to solve a system of linear equations with three differentright-hand sides but the same left-hand side -that is Ax = b1, Ax = b2,and Ax = b3. What would be an efficient way to do this?

4. Suppose you do Gauss elimination and back substitution on a systemof three equations in three unknowns. Count the number of multiplica-tions/divisions in the elimination part (11), in the back substitution part(6), and in the total. Count the number of additions/subtractions in theelimination part (8), in the back substitution part (3), and the total. (Hint:do not count creating zeros. The numerical people wisely put zeros wherethey belong and do not risk introducing roundoff error.)

5. If you are brave, repeat problem 4 for n equations in n unknowns. Atthe ith stage, you need (n - i) + (n - i)(n - i + 1) multiplications/divisions, so the total number of multiplications/divisions for theelimination is (2n3 + 3n2 - 5n)/6. At the ith stage, you need (n - i)(n - i + 1) additions/subtractions, so you need a total of (n3 - n)/3additions/subtractions. For the back substitution part, you need (n2+n)/2


multiplications/divisions and (n2-n)/2 additions/subtractions, so the to-tal number of multiplications/divisions is it 3 /3 + n2 - n/3 and the totalnumber of additions/subtractions is n3/3 + n222/2 - 5n/6.

6. Create some examples of four equations in three unknowns (x, y, z) suchthat a) one solution exists, b) an infinite number of solutions exist, and c)no solutions exist. Recall our (zero, one)-advice from before.

7. Pietro finds 16 U.S. coins worth 89 cents in the Trevi Fountain in Rome.The coins are dimes, nickels, and pennies. How many of each coin didhe find? Is your answer unique?

8. Suppose (xi, YO, (x2, y2), and (x3, y3) are distinct points in R2 and lie onthe same parabola y = Ax2 + Bx + C. Solve for A, B, and C in termsof the given data. Make up a concrete example and find the parabola.

9. A linear system is called overdetermined iff there are more equations thanunknowns. Argue that the following overdetermined system is inconsi-

2x+2y=1stent: -4x + 8y = -8. Draw a picture in 1R2 to see why.

3x-3y=910. A linear system of m linear equations inn unknowns is call underdeter-

mined iff m < n (i.e., there are fewer equations than unknowns). Arguethat it is not possible for an underdetermined linear system to have onlyone solution.

11. Green plants use sunlight to convert carbon dioxide and water to glucoseand oxygen. We are the beneficiaries of the oxygen part. Chemists write

xi CO2 + x2H2O X302 + X4C6H12O6.

To balance, this equation must have the same number of each atom oneach side. Set up a system of linear equations and balance the system. Isyour solution unique?

Further Reading


2.1 A Brief Review of Gauss Elimination with Back Substitution 47

Gauss (Forward) Elimination with Back Substitution

Use back substitution tofind the determined variables

in terms of the freevariables: there will be

infinitely many solutions

Use back substitution tofind the values of thevariables: there will beone unique solutions

Figure 2.1: Gauss elimination flow chart.

2.1.1 MATLAB Moment

2.1.1.1 Solving Systems of Linear Equations

MATLAB uses the slash (/) and backslash (\) to return a solution to a systemof linear equations. X = A \ B returns a solution to the matrix equation AX = B,while X = B/A returns a solution to XA = B. If A is nonsingular, thenA\B returns the unique solution to AX = B. If A is m-by-n with m > n

48 Generating hivertible Matrices

(overdetermined), A\B returns a least squares solution. If A has rank ii, thereis a unique least squares solution. If A has rank less than n, then A\B is a basicsolution. If the system is underdetermined (in < n), then A \B is a basic solution.If the system has no solution, the A\B is a least squares solution. Let's look atsome examples. Let's solve the first homework problem of Exercise Set 3, la.

>>A=[2 I -1;1 3 2;6 -3 -3;4 -4 8]A=2 1 -1

1 3 2

6 -3 -34 -4 8

>> b=[2;1;12;20]b=

2

1

12

20>> x=A\bx=

2.0000-1.00001.0000

>> A*xans =

2.00001.000012.00001.0000

Next let's solve I k since it has complex coefficients.

>> C=[4 6 16+12i;-4 2-6i 16-12i;0 3i -3+9i]C=

4.0000 6.0000 16.0000 + 12.0000i-4.0000 2.0000 - 6.0000i 16.0000 - 12.0000i

0 0 + 3.0000i -3.0000 + 9.0000i>> d=[26;8-26i;3+121]d=

26.00008.0000 - 26.0000i3.0000 + 12.0000i

>> y=C\dy=

-0.1 154 + 0.0769i1.1538 +.02308i0.7308 - 0.6538i

2.2 Elementary Matrices 49

» c*yans =

26.00008.0000 - 26.0000i3.0000 + 12.0000i

Now let's look at an overdetermined system.

>> E=[2 2;-4 8;3 -31E=

2 2

-4 8

3 -3>> f=[1; -8; 91

1

-89

>> z=E\fz=

1.6250-0.3750

>> E*zans =

2.5000-9.50006.0000

matrix units, transvections, dilations, permutation matrix,elementary row operations, elementary column operations,the minimal polynomial

2.2 Elementary MatricesWe have seen that when the coefficient matrix of a system of linear equations

is invertible, we can immediately formulate the solution to the system in termsof the inverse of this matrix. Thus, in this section, we look at ways of generatinginvertible matrices. Several easy examples immediately come to mind. The n-by-n identity matrix I,, is always invertible since = I. So

1 U -I =[ 1 01 0 0 ]-' 1 0 0

10 1 ] 0 1 ] 'L

0 1 0 = 0 1 0,0 0 1 0 0 1


and so on. Next, if we multiply / by a nonzero scalar X, then W, is invertibleby Theorem 1.2 of Chapter I page 18, with = a /,,. Thus, for example,

10 0 0 -' 1/10 0 0

0 10 0 = 0 1/ 10 0 More generally, when we0 0 10 0 0 1/10

take a diagonal matrix with nonzero entries on the diagonal, we have an in-vertible matrix with an easily computed inverse matrix. Symbolically, if D =diag(di,d2, with all d; 1-' 0fori = I, ,n, then D' _

I 0 0 -'diag(1/dj, 1/d2, , For example, 0 2 0

0 0 3

1 0 0

0 1/2 0

0 0 1/3

Now these examples are not going to impress our smart friends, so we needto be a bit more clever. Say we know an invertible matrix U. Then we cantake a diagonal invertible matrix D and form U-1 DU, which has the inverseU-' D-' U. Thus, one nontrivial invertible matrix U gives us a way to form

2 0 0

many nontrivial examples. For instance, take D = 0 2 00 0 1

and U =

1 0 2 -1 0 -21 1 1 . Then U-1 = 0 1 I , as you ma y ve rify, and

-1 0 -1 I 0 I

-1 0 -2 2 0 0 1 0 2 0 0 -20 1 1 0 2 0 1 1 1 = 1 2 1 is

1 0 1 0 0 1 -1 0 -1 1 0 3

an invertible matrix. Unfortunately, not all invertible matrices are obtainablethis way (i.e., from a diagonal invertible matrix). We now turn to an approachthat will generate all possible invertible matrices.

You probably recall that one of the first algorithms you learned in linearalgebra class was Gauss elimination. We reviewed this algorithm in Section 2.1.This involved three kinds of "elementary" operations that had the nice propertyof not changing the solution set to a system of linear equations. We will nowelaborate a matrix approach to these ideas.

There are three kinds of elementary matrices, all invertible. All are obtainedfrom the identity matrix by perturbing it in special ways. We begin with somevery simple matrices (not invertible at all) that can be viewed as the buildingblocks for all matrices. These are the so-called matrix units E,j. Define E11 tobe the n-by-n matrix (actually, this makes sense for m-by-n matrices as well),with all entries zero except that the (i, j) entry has a one. In the 2-by-2 case,


we can easily exhibit all of them:

Ell =[ 000 ]Ei2= [ 0 0 1,E21 =[ 0 0

]E22= 0 0 J

The essential facts about matrix units are collected in the next theorem, whoseproof is left as an exercise.

THEOREM 2.1

1. >r-i E,, = l t2. Eij=Epyiffi=pand j=q.

3. Ei j Erg0 ifr # j

E,5 ifr = j

4. E4 = Eii for all i.

5. Given any matrix A = [aij] in C' ' we have A i aij Eij.

6. T h e collection { Eij l i , j = I , - - - , n) is a basis for the vector space ofmatrices C"x" which, therefore, has dimension n2.

We will now use the matrix units to define the first two types of elementarymatrices. The first type of elementary matrix (Type III) goes under a namederived from geometry. It is called a transvection and is defined as follows.

DEFINITION2.1 (transvection)For i # j, define Tij (c) = 1" + cEij for any complex number c.

Do not let the formula intimidate you. The idea is very simple. Just take theidentity matrix and put the complex number c in the (i, j) position. In the 2-by-2case, we have

T12(0 _ [ 0 1] and T21 (c) _ [ 1 0 1

The essential facts about transvections are collected in the next theorem.


THEOREM 2.2Suppose c and d are complex numbers. Then

I. Tij(c)T,,j(d) = Tij (d + c).

2. Tj(0)=1,.

3. Each is invertible and Tj(c)-' = Tij(-c).

4. Tij(c)Tr+(d)=Tr.,(d)Tj(c), ifj 0r 54 s 54 i 0j

5. T,j(cd) = Tir(c)_'Trj(d)-'Tr(c)Trj(d), ifr 0 i 54 j 0 r.

6. T,j(c)T = T ,(c)

7. T,j(c)* = Ti,(C).

8. det(T,j(c)) = 1.

The proofs are routine and left to the reader.

The second type of elementary matrix (Type II) also has a name derived fromgeometry. It is called a dilation and is defined as follows.

DEFINITION 2.2 (dilation)For any i = 1, - - - , n, and any nonzero complex number c, define D,(c) _

1)E,,.

Once again, what we are doing is quite straightforward. To form D(c), simplywrite down the identity matrix and replace the 1 in the diagonal (i, i) positionwith c. In the 2-by-2 case, we have

Di(c) _ [ 010 1

and D2(c) =L

c0

The salient facts about dilations are collected in the next theorem.

THEOREM 2.3Suppose c and d are nonzero complex numbers. Then

1.

j 96i

2. Di(c)D,(d) = Di(cd).

3. Di(l)=I,,.

2.2 Elementary Matrices

4. Di (c) is invertible and Di (c)-1 = Di (c 1 ).

5. det(Di(c)) = c.

6. Di(c)Dj(d) = Dj(d)Di(c), if i # j.

7. Di (c)T = Di (c)

8. Di(c)* = Di(c).

53

Once again, the easy proofs are left to the reader.Finally, we have the third type of elementary matrix (Type I), which is called a

permutation matrix. Let S denote the set of all permutations of then element set[n] := (1, 2, , n}. Strictly speaking, a permutation is a one-to-one functionfrom [n] onto [ii]. If a is a permutation, we have or change the identity matrixas follows.

DEFINITION 2.3 (permutation matrix)Let a be a permutation. Define the permutation matrix P((T) = [61,a(j)] where

81 if i = Q(1) That is, ent Pi.a(j) - 0 if i 54 or(j) ( (v))

Once again, a very simple idea is being expressed here. All we are doing isswapping rows of the identity matrix according to the permutation U. Let'slook at an example. Suppose u, in cycle notation, is the permutation (123) of[3] = 11, 2, 3). In other words, under o,, l goes to 2, 2 goes to 3, and 3 goesback to 1. Then,

812 813 81 0 0 1

P(123) = 822 823 821 = 1 0 0

832 833 831 0 1 0

Notice that this matrix is obtained from the identity matrix /3 by sending thefirst row to the second, the second row to the third, and the third row to the first,just as v indicated. Since every permutation is a product (i.e., composition) oftranspositions (i.e., permutations that leave everything fixed except for swappingtwo elements), it suffices to deal with permutation matrices that are obtainedby swapping only two rows of the identity matrix. For example, in C3x3

0 1 0 1 0 0

P(12) = 1 0 0 and P(23) = 0 0 1

0 0 1 0 1 0

We collect the basic facts about permutation matrices next.


THEOREM 2.4

1. P(Tr) P(a) = P(7ru), where ar and a are any permutations in S,,.

2. P(L) = I, where r. is the identity permutation that leaves every elementof [n] fixed.

3. P(Q-1) = P(a)-' = P(Q)'' = p(U)*.

4. P(ij) = I - Ei1 - Eli + E;j + Ei,, for any i, j with I < i, j < n.

5. P(ij)-' = P(ij).

6. Ejj = P(ij)-'EuP(ij)

In the 2-by-2 case, only two permutations exist: the identity L and the trans-position (12). Thus,

P(L) =01

J

and P(12) =L

U

1

1.

So there you are. We have developed the three kinds of elementary matrices.Note that each is expressible in terms of matrix units and each is invertible withan inverse of the same type. Moreover, the inverses are very easy to compute!So why have we carefully developed these matrices? The answer is they do realwork for us. The way they do work is given by the next two theorems.

THEOREM 2.5 (theorem on elementary row operations)Let A be a matrix in cC^' X" . Then the matrix product

1. T;j(c)A amounts to adding the left c multiple of the jth row of A to theith row of A.

2. Di(c)A amounts to multiplying the ith row of A on the left by c.

3. P(v)A amounts to moving the ith row of A into the position of the a(i)throw for each i.

4. P(ij)A amounts to swapping the ith and jth rows of A and leaving theother rows alone.

We have similar results for columns.


THEOREM 2.6 (theorem on elementary column operations)Let A be a matrix in C"'x". Then the matrix product

1. AT, (c) amounts to adding the right c multiple of the ith column of A tothe jth column of A.

2. AD;(c) amounts to multiplying the ith column of A on the right by c.

3. AP(a-1) amounts to moving the ith column of A into the position of thea(i )th column.

4. AP(ij) amounts to swapping the ith and jth columns of A and leavingthe other columns alone.

We illustrate in the 2-by-2 case. Let A = [ ad

] . Then T12(a)A =

I a a b l_ r a+ac b+ad 1 _ r as ab0 1 J

rc d c d D1(a)A c d

nd dbP(1 2)A = [

1 0 ] [ c d ] [ a ba

]I a _ a as+b

[ 0 1 c ca+dNow that we have elementary matrices to work for us, we can establish

some significant results. The first will be to determine all invertible matrices.However, we need a basic fact about multiplying an arbitrary matrix by aninvertible matrix on the left. Dependency relationships among columns do notchange.

THEOREM 2.7Suppose A is an m-by-n matrix and R is an m-by-m invertible matrix. Select anycolumns C1, c2, , ck from A. Then (CI, c2, - , ek) is independent if and onlyif the corresponding columns of RA are independent. Moreover, (c, , c2, - , ck }

is dependent if and only if the corresponding columns of RA are dependent withthe same scalars providing the dependency relations for both sets of vectors.

PROOF First note that the product RA is the same as [Ra, I Rae I . I Ra" ],

if A is partitioned into columns, A = [a, 1a21 ... Suppose (ci, c2, , ck}

is an independent set of columns. We claim the corresponding vectors Re,,Rc2, - , Rck are also independent. For this we take the usual approach andassume we have a linear combination of these vectors that produces the zerovector, say

a, Rc, + a2 Rc2 + ... + ak Rck = -6.


But then, R(aici +a2C2+- -+a cA) = -6. Since R is invertible, we concludea1Ci + a2C2 + + akck = 7. But the c's are independent, so we mayconclude all the as are zero, which is what we needed to show the vectorsRci, Rc2, , Rck to be independent. Conversely, suppose Rci, Rc2, , RcA

are independent. We show Ci, c2, , ck are independent. Suppose Of ici +V. Then R(aici +a7c-) RV = 0.

But then, ai Rc1 + a2Rc2 + - - - + ak Rck = I and independence of the Rcsimplies all the a's are zero, which completes this part of the proof. The rest ofthe proof is transparent and left to the reader. 0

COROLLARY 2.1With A and R as above, the dimension of the column space of A is equal to thedimension of the column space of RA. (Note: We did not say the column spacesare the same.)

We now come to our main result.

THEOREM 2.8Let A be an n-by-n square matrix. Then A is invertible if and only if A can bewritten as a product of elementary matrices.

PROOF If A can be written as a product of elementary matrices, then A is aproduct of invertible matrices and hence itself must be invertible by Theorem1.2 of Chapter 1 page 18.

Conversely, suppose A is invertible. Then the first column of A cannot consistentirely of zeros. Use a permutation matrix P, if necessary, to put a nonzero entryin the (1,1) position. Then, use a dilation D to make the (1,1) entry equal to 1.Now use transvections to "clean out" (i.e., zero) all the entries in the first columnbelow the (1,1) position. Thus, a product of elementary matrices, some of which

I * * ... *

0 * * *

could be the identity, produces TDPA = 0 * * * . Now there

L0 * * *jmust be a nonzero entry at or below the (2,2) position of this matrix. Otherwise,the first two columns of this matrix would be dependent by Theorem 2.7. Butthis would contradict that A is invertible, so its columns must be independent.Use a permutation matrix, if necessary, to swap a nonzero entry into the (2,2)position and use a dilation, if necessary, to make it 1. Now use transvections to


"clean out" above and below the (2,2) entry. Therefore, we achieve

1 0 * * * ...0 1 * *

T, D, P, TDPA = 0 0 * *

0 0 * * * ... *

Again, if all the entries of the third column at and below the (3,3) entry werezero, the third column would be a linear combination of the first two, againcontradicting the invertibility of A. So, continuing this process, which mustterminate after a finite number of steps, we get E, E2E3 . EPA = 1", whereall the E;s are elementary matrices. Thus, A = EP' E2' El ', which againis a product of elementary matrices. This completes the proof. 0

This theorem is the basis of an algorithm you may recall for computing byhand the inverse of, at least, small matrices. Begin with a square matrix A andaugment it with the identity matrix 1, forming [All]. Then apply elementaryoperations on the left attempting to turn A into 1. If the process succeeds, youwill have produced A', where I was originally; in other words, [I IA-1 ] willbe the result. If you keep track of the elementary matrices used, you will alsobe able to express A as a product of elementary matrices.

2.2.1 The Minimal Polynomial

There is a natural and useful connection between matrices and polynomials.We know the dimension of C""' as a vector space is n2 so, given a matrixA, we must have its powers 1, A, A22, A3, eventually produce a dependentset. Thus there must exist scalars, not all zero, such that AP + at,-1 At"-' +

+ a, A + aol _ 0. This naturally associates to A the polynomial p(x) =xP +ap_,xP-1 + +alx + ao, and we can think of A as being a "root" of psince replacing x by A yields p(A) = 0. Recall that a polynomial with leadingcoefficient I is called monic.

THEOREM 2.9Every matrix in C""" has a unique monic polynomial of least degree that itsatisfies as a root. This unique polynomial is called the minimal (or minimum)polynomial of the matrix and is denoted p.,,.


PROOF Existence of such a polynomial is clear by the argument given abovethe theorem, so we address uniqueness. Suppose f (x) = xt'+a,,_,xt'-1 +- - +

a,x + ao and g(x) = xN + R,,_,x"-1 + R,x + 13o are two polynomials ofleast degree satisfied by A. Then A would also satisfy f (x) - g(x) = (a,,_, -13 p_, )x"- ' +. +(ao -13o). If any coefficient of this polynomial were nonzero,we could produce a monic polynomial of degree less than p satisfied by A, acontradiction. Hence aj = 13, for all j and so p(x) = q(x). This completes theproof. 0

You may be wondering why we brought up the minimal polynomial at thispoint. There is a nice connection to matrix inverses. Our definition of ma-trix inverse requires two verifications. The inverse must work on both sides.The next theorem saves us much work. It says you only have to check oneequation.

THEOREM 2.10Suppose A is a square matrix in CV". If there exists a matrix B in C"'such that AB = 1, then BA = I and so B = A-'. Moreover, a squarematrix A in C""" has an inverse if and only if the constant term of its minimalpolynomial is nonzero. If A` exists, it is expressible as a polynomial in A ofdegree deg(µA) - 1. In particular, if A commutes with a matrix C, then A-'commutes with C also.

PROOF Suppose that AB = 1. We shall prove that BA = I as well.Consider the minimal polynomial p.A(x) = 13o + PI X + 132x2 +.. + x"'. Firstwe claim 13o # 0. Otherwise, LA(x) = Rix + 132x2 + +x'° = x(p(x)),where p(x) = R, + R2x + + x"'-1. But then, p(A) = 13, + 02A + +A'-' =RII+132A1+...+A"' I =13,AB+132AAB+...+A""AB=

(13,A + 02A2 + + A"')B = pA(A)B = 0. But the degree of p is lessthan the degree of µA, and this is a contradiction. Therefore, 13o # 0. Thisallows us to solve for the identity matrix in the minimal polynomial equation;

1

13o1=-RIA-132A2- -A so that!= -130

01130 13z 130

Multiplying through by B on the right we get B = --AB - -A2B - -Ro 13o

A"' B = - Ri 1 - 132 A - - -A"'- . This expresses B as a polynomial00 0in A of degreeoone less than that of the minimal polynomial and hence Bcommutes with A. Thus I = AB = BA. The remaining details are left to thereader. 0

There is an algorithm that allows us to compute the minimum polynomial ofmatrices that are not too large. Suppose we are given an n-by-n matrix A. Start


multiplying A by itself to form A, A2, ... , A". Form a huge matrix B wherethe rows of B are 1, A, A2, ... strung out as rows so that each row of B has n2elements and B is (n + 1)-by -n22. We must find a dependency among the first prows of B where p is minimal. Append the identity matrix /n+1 to B and row re-duce [B 1 /n+l ].Look for the first row of zeros in the transformed B matrix. Thecorresponding coefficients in the transformed 1,, matrix give the coefficients

1 0 3

of a dependency. Let's illustrate with an example. Let A = 2 1 1

-1 0 3

-2 0 12 -14 0 30Then A2 = 3 1 10 and A3 = -5 1 40 . Then we form

-4 0 6 -10 0 6[B 114]=

1 0 0 0 1 0 0 0 1 1 0 0 01 0 3 2 1 1 -1 0 3 0 1 0 0

-2 0 12 3 1 10 -4 0 6 0 0 1 0-14 0 30 -5 1 40 -10 0 6 0 0 0 1

Now do you see what we mean about stringing out the powers of A into rowsof B? Next row reduce [B 1 /4].

1 0 0 0 1 0 0 0 1 0 5

3_5

6I

6

0 0 1 0 2 17 1 0 2 0 26 _L 7

5 5 3 3 45 90 90

0 0 0 1 1_3 6 0 0 0 _§ 4 _ I

5 5 5 5 5

0 0 0 0 0 0 0 0 0 15 5 I

3 6 6 J

The first full row of zeros in the transformed B part is the last row, so we readthat an annihilating polynomial of A is 1- 6x2 - 6x3. This is not a monicpolynomial, however, but that is easy to fix by an appropriate scalar multiple. Inthis case, multiply through by -6. Then we conclude p A(x) =x3-5x2+ lOx-6is the minimal polynomial of A. The reader may verify this. A more efficientway to compute the minimal polynomial will be described later.

Exercise Set 4

1. Argue that D;(6)Tj,(`) = Tj1(-c)D;(6) for a # 0.

2. Fill in the details of the proofs of Theorems 2.1 through 2.3.

3. Fill in the details of Theorem 2.4.

4. Complete the proof of Theorem 2.5 and Theorem 2.6.

60 Generating hmertible Matrices

5. Argue that zero cannot be a root of µA if A is invertible.

1 -l 2 5

6. Compute the inverse of A =2

32 -4

and express it as

2 1 -1 2a polynomial in A and as a product of elementary matrices.

7. If A= [a,j ], describe P(a)-1 AP(Q) for or, a permutation in S.

U11 U12 U13

8. Suppose U = 0 U22 U23 Find permutation matrices P, and0 0 1133

U33 0 0P2 such that P, U P2 = u23 1422 0

U13 1112 U11

9. Show that permutation matrices are, in a sense, redundant, since allyou really need are transvections and dilations. (Hint: Show P;j =D,(-1)T1(l) T,1j (- I )Tj; (1)). Explain in words how this sequence ofdilations and transvections accomplishes a row swap.

10. Consider the n-by-n matrix units Eij and let A be any n-by-n matrix.Describe AE;j, E;jA, and Eij AEL,,,. What is (E;j Ejk)T?

I I I

11. Find the minimal polynomial of A = 0 1 1

0 0 1 I.12. Make up a 4-by-4 matrix and compute its minimal polynomial by the

algorithm described on pages 58-59.

13. Let A= I a

d

bzc J

where a 54 0. Argue that A =10 0 w]

for suitable choices of x, u, v, and w.

14. What can you say about the determinant of a permutation matrix?

15. Argue that multiplying a matrix by an elementary matrix does not changethe order of its largest nonzero minor. Argue that nonzero minors go tononzero minors and zero minors go to zero minors. Why does this meanthat the rank of an m-by-n matrix is the order of its largest nonzerominor?


16. Let p(x) = ao + a1x + + anx" be a polynomial with complex co-efficients. We can create a matrix p(A) = aol + a, A + + ifA is n-by-n. Argue that for any two polynomials p(x) and g(x), p(A)commutes with g(A).

17. Let p(x) = ao + aix + + anx" be a polynomial with complex co-efficients. Let A be n-by-n and v be a nonzero column vector. Thereis a slick way to compute the vector p(A)v. Of course, you could justcompute all the necessary powers of A, combine them to form p(A),and multiply this matrix by v. But there is a better way. Form the Krylovmatrix JCn+, (A, v) = [v I Av I A2v I . . . I A"v]. Note you never have tocompute the powers of A since each column of ACn+1(A, v) is just A times

aoa,

the previous column. Argue that p(A)v = (A, v) Make up

an

a 3-by-3 example to illustrate this fact.

18. Suppose A^ is obtained from A by swapping two columns. Suppose thesame sequence of elementary row operations is performed on both A andA^, yielding B and B^. Argue that B^ is obtained from B by swappingthe same two columns.

19. Find nice formulas for the powers of the elementary matrices, that is, forany positive integer m, (T;j(k))"' =, (D1(c))'" _, and (P(Q))"' = ?

20. Later, we will be very interested in taking a matrix A and formingS-1 AS where S is, of course, invertible. Write a generic 3-by-3 matrix A

0 0 1 -,and form (T12(a))-'AT12(a), (D2(a))-1 AD2(a), and 1 0 0

0 1 0

0 0 1

A 1 0 0 J. Now make a general statement about what happens in0 1 0

the n-by-n case.

21. Investigate how applying an elementary matrix affects the determinantof a matrix. For example, det(T;j(a)A) = det(A).

22. What is the minimal polynomial of A =L 0 a ] ?

62 Generating Im'ertible Matrices

23. Suppose A is a square matrix and p is a polynomial with p(A) = 0. Arguethat the minimal polynomial p-A(x) divides p(x). (Hint: Remember thedivision algorithm.)

24. Exhibit two 3-by-3 permutation matrices that do not commute.

Group ProjectElementary matrices can he generalized to work on blocks of a partitionedmatrix instead of individual elements. Define a Type I generalized elemen-

tary matrix to be of the form [®

I®a Type II generalized elementary

matrix multiplies a block from the left by a nonsingular matrix of appropri-ate size, and a Type III generalized elementary matrix multiplies a block bya matrix from the left and then adds the result to another row. So, for ex-A

B Dample,) ® 10C D]-[ A

CB [® ®][ C D

Aand

I/OX

®][ C D

I

[ X A + C X B + D ] ' The project is to

develop a theory of generalized elementary matrices analogous to the theorydeveloped in the text. For example, are the generalized elementary matrices all

invertible with inverses of the same type? Can you write [ ® A' ] as a

product of generalized Type III matrices?

Further Reading


[Lord, 1987] N. J. Lord, Matrices as Sums of Invertible Matrices,Mathematics Magazine, Vol. 60, No. 1, February, (1987), 33-35.

2.3 The LU and LDU Factorization 63

upper triangular lower triangular, row echelon form,zero row, leading entry, LU factorization, elimination matrix,LDUfactorization, full rank factorization

2.3 The LU and LDU Factorization

Our goal in this section is to show how to factor matrices into simpler ones.What this section boils down to is a fancy way of describing Gauss elimination.You might want to consult Appendix B for notation regarding entries, rows, andcolumns of a matrix.

The "simpler" matrices mentioned above are the triangular matrices. Theycome in two flavors: upper and lower. Recall that a matrix L is called lowertriangular if ent;j(L) = 0 for i < j, that is, all the entries of L above themain diagonal are zero. Similarly, a matrix U is called upper triangular ifentij(U) = 0 for i > j, that is, all the entries of U below the main diagonalare zero. Note that dilations, being diagonal matrices, are both upper and lowertriangular, whereas transvections T;i(a) are lower triangular if i > j and uppertriangular if i < j. We leave as exercises the basic facts that the product oflower (upper) triangular matrices is lower (upper) triangular, the diagonal oftheir products is the product of the diagonal elements, and the inverse of a lower(upper) triangular matrices, if it exists, is again lower (upper) triangular.

For example,

2 0 01 0 0 r 2 0 0

3 4 0 I5 2 0

[ -1 3 4= I 23

8 0L

is the product of two lower triangular matrices, while

1 0 0 -, 1 0 01

5 2 0 = -5/2 1/2 0-1 3 4 17/8 -3/8 1 /4

is the inverse of the second matrix.When we reviewed Gauss elimination in Section 2.1, we noted how easily

we could solve a "triangular" system of linear equations. Now that we havedeveloped elementary matrices, we can make that discussion more completeand precise. It is clear that the unknowns in a system of linear equations areconvenient placeholders and we can more efficiently work just with the matrix


of coefficients or with this matrix augmented by the right-hand side. Thenelementary row operations (i.e., multiplying elementary matrices on the left)can be used to convert the augmented matrix to a very nice form called rowechelon form. We make the precise definition of this form next.

Consider an m-by-n matrix A. A row of A is called a zero row if all the entriesin that row are zero. Rows that are not zero rows will be termed (what else?)nonzero rows. A nonzero row of A has a first nonzero entry as you come fromthe left. This entry is called the leading entry of the row. An m-by-n matrix Ais in row echelon form if it has three things going for it. First, all zero rows arebelow all nonzero rows. In other words, all the zero rows are at the bottom ofthe matrix. Second, all entries below a leading entry must be zero. Third, theleading entries occur farther to the right as you go down the nonzero rows ofthe matrix. In other words, the leading entry in any nonzero row appears in acolumn to the right of the column containing the leading entry of the row aboveit. In particular, then, a matrix in row echelon form is upper triangular! Do yousee that the conditions force the nonzero entries to lie in a stair-step arrangementin the northeast corner of the matrix? Do you see how the word "echelon" cameto be used? For example, the following matrix is in row echelon form:

1 2 3 4 5 6 7 8 9

0 0 2 3 4 5 6 7 8

0 0 0 2 3 4 5 6 7

0 0 0 0 0 0 2 3 4

0 0 0 0 0 0 0 0 0

We can now formalize Gauss elimination.

THEOREM 2.11Every matrix A E C"" can be converted to a row echelon matrix by a finitenumber of elementary row operations.

PROOF First, use permutation matrices to swap all the zero rows below all thenonzero rows. Use more swaps to move row; (A) below rowj (A) if the leadingelement of row;(A) occurs to the right of the leading element of rowi(A). Sofar, all we have used are permutation matrices. If all these permutations havebeen performed and we have not achieved row echelon form, it has to be thatthere is a row,(A) above a row,(A) whose leading entry a81 is in the sameposition as the leading entry a,j. Use a transvection to zero out aq. That is,

/a,j) will put a zero where aq was. Continue in this manner until rowechelon form is achieved. 0

Notice that we did not use any dilations to achieve a row echelon matrix.That is because, at this point, we do not really care what the values of the


leading entries are. This means we do not get a unique matrix in row echelonform starting with a matrix A and applying elementary row operations. We canmultiply the rows by a variety of nonzero scalars and we still have a row echelonmatrix associated with A. However, what is uniquely determined by A are thepositions of leading entries. This fact makes a nice nontrivial exercise. Beforewe move on, we note a corollary to our theorem above.

COROLLARY 2.2Given any matrix A in C' " there exists an invertible matrix R such that RA

Gis a row echelon matrix, that is, RA = . . . and G has no zero rows.

Let's motivate our next theorem with an example. Let2 1 1 2

A = 4 -6 0 3 . Then, doing Gauss elimination,-2 7 2 1

2 1 1 2

T32(1)T31(1)T21(-2)A = 0 -8 -2 -1 = U,0 0 1 2

an upper triangular matrix. Thus, A = T21(-2) -1 T31(1)-1 T32(1) 1 U =T21(2)TI, (- I)TI2(- DU

1 0 0 2 1 1 2

2 1 0 0 -8 -2 -1 = L U, the product of a-1 -1 1 0 0 1 2

lower and an upper triangular matrix. The entries of L are notable. They arethe opposites of the multipliers used in Gauss elimination and the diagonal ele-ments are all ones. What we have illustrated works great as long as you do notrun into the necessity of doing a row swap.

THEOREM 2.12Suppose A E c"" can be reduced to a row echelon matrix without needing anyrow exchanges. Then there is an in-by-in lower triangular matrix L with oneson the diagonal and an m-by-n upper triangular matrix U in row echelon formsuch that A = LU. Such a factorization is called an LU factorization of A.

PROOF We know there exist elementary matrices E1, ... , Ek such thatEk ... E1 A = U is in row echelon form. Since no row swaps and no dilationswere required, all the Eis are transvections Tin(a) where i > j. These arelower triangular matrices with ones on their diagonals. The same is true fortheir inverses, so L = El 1 . . . Ek 1 is as required for the theorem. U


Instead of using elementary transvections one at a time, we can "speed up"the elimination process by clearing out a column in one blow. We now introducesome matrices we call elimination matrices. An elimination matrix E is a matrix

a,a2

of the form E = I - uvT = In - [U, u2 ... U] .

It,We know by Corollary 1.6 of Chapter I page 28 that E is invertible as long as

rvTu # 1, and, in this case, E-1 = I -

uv. We note that the elementaryv u- I

matrices already introduced are special cases of elimination matrices. Indeed,Tin(a) = I +ae;ej', DA(a) = / -(1 -a)ekeT and Pii = I -(e; -ei)(e, -ej )T.

all a12 ... a,,,

a21 a22 ... a2nLet A =

an,an,, an,2 ...0

a2i/aiiLet u, = and v,

ami/aj2, 3, ... , m and call µi,

0

a "multiplier." Then L, = I - uivi =1 0 ... .. . 0

-µ2i 1 0 .. . 0

-µs1 0 1 .. . 0 is lo we r tri angul ar a nd

0 ... .. . 1

all a12 ... a,,(2) (2)

LiA =0 a22 a2 = A(2),

(2) (2)0 a,n2 ... amn

where a;j) = a,1 - µi,a,j for i = 2, 3, ... , n, j = 2, ... , n.

1 I 2 4

Let's look at an example. Suppose A = 2 3 I -1 ].Then L, _3 -l 1 2

I 0 0 I I 2 4

and L, A = 0 I -3 -9 . The idea is to repeat-2 10 00

-3 0 1 0 -4 -5 -10

0 ail= e, Let µi, _ - for i =

all


100

1

0

1

the process. Let L2 = I - u2e2 , where u2 = µ32 and e2 = 0 . Here

µn2 0(2)

ai2 f o r =3,... n,µi2 = (2)a22

again assuming a22) A 0. With (m - I) sweeps, we reduce our matrix A to rowechelon form. For each k, Lk = I - ukeT where enti(uk) =

0(k)

µik = a(k)akk

ifi = 1,2,... ,k

ifi =k+ 1,... ,m'

These multipliers are well defined as long as the elements akk) are nonzero, inwhich case a+1) = a()- (k)t t µ;kak , i = k+1, ... , m, j = i, ... , n. To continue

1 0 0 1 1 2 4

our example, L2 = 0 1 0 . Then L2L, A = 0 1 -3 -90 4 1 0 0 -17 -46

= U. More generally we would find A = LU whereI 0 0 0

µ21 1 0 ... ... 0

L = µ31 µ32 1 0 ... 0

µn11 µm2 ... ... ... I J(k)

where entik(L) = µik =a(k) ; i = k + 1,... , m and µki = akk), j =akkk,...,n.

Before you get too excited about the LU factorization, notice we have beenassuming that nothing goes wrong along the way. However, the simple nonsin-

gular matrixL

0 1

Jdoes not have an LU factorization! (Why not?)

We do know a theorem that tells when a nonsingular matrix has an LUfactorization. It involves the idea of leading principal submatrices of a matrix.They are the square submatrices you can form starting at the upper left-handcorner of A and working down. More precisely, if A = (aid] E Cnxn the


leading principal submatrices of A are AI = [all ], A2 =

all a12 . . .alk

a2I a22 ... a2k

Ak= A,,=A.

akl ak2 ... akk

all a12

a21 a22

THEOREM 2.13Let A be a nonsingular inatrix in C" xa. Then A has an L U factorization if andonly if all the leading principal submatrices of A are nonsingular.

PROOF Suppose first that A has an LU factorization

A = LU - LII 0 U11 U12 LIIUII 'k

L21 L22 ] [ 0 U22 I _ [ # * ,

where LII and UII are k-by-k. Being triangular with nonzero diagonal en-tries, L11 and UII must be nonsingular, hence so is their product LIIUI1. Thisproduct is the leading principal submatrix Ak. This argument works for eachk=1,... ,n.

Conversely, suppose the condition holds. We use induction to argue thateach leading principal submatrix has an LU factorization and so A itself musthave one. If k = 1, AI = [all] = [I][aIl] is trivially an LU factorizationand all cannot he zero since AI is invertible. Now, proceeding inductively,suppose Ak = LkUk is an LU factorization. We must prove Ak+I has an

L U factorization as well. Now Ak = U' Lk 1 so Ak+I =v

L

AT U ] =ak+I

Lk ®l f

Uk Lk l u , where cTand b contain the first kVTUL I

I J L O ak+I - VTAL lu11components of row,+, (Ak+I) and colk+I(Ak+I ) But Lk+I =

L

VTU-II

]k

rand Uk+I -

L

Uk

L

Ik+u-vTAklu] gives an LU factorization for Ak+I.

The crucial fact is that ak+I -vT Ak -'u This is because Uk+I = Lk+1 Ak+1

is nonsingular. By induction, A has an LU factorization. 0

There is good news on the uniqueness front.


THEOREM 2.14If A is nonsingular and can be reduced to row echelon form without row swaps,then there exists a unique lower triangular matrix L with one's on the diagonaland a unique upper triangular matrix U such that A = L U.

PROOF We have existence from before, so the issue is uniqueness. NoteU = L-I A is the product of invertible matrices, hence is itself invertible. Intypical fashion, suppose we have two such LU factorizations, A = L1 U, _L2U2. Then U, U2 1 = L

11 L2. However, U, Uz 1 is upper triangular and L

11 L2

is lower triangular with one's down the diagonal. The only way that can be isif they equal the identity matrix. That is, U, U2 1 = 1 = L 1 1 L2. ThereforeL, = L2 and U, = U2, as was to be proved. D

Suppose we have a matrix A that does not have an L U factorization. Allis not lost; you just have to use row swaps to reduce the matrix. In fact, arather remarkable thing happens. Suppose we need permutation matrices inreducing A. We would have Ln_, P"_, Li_2 Pn_2 L2P2L, P, A = U wherePk = I if no row swap is required at the kth step. Remarkably, all we re-ally need is one permutation matrix to take care of all the swapping at once.Let's see why. Suppose we have L2 P2L, P, A = U. We know P,T Pk = Iso L2P2L,P, P2P,A = L2L,(P2P,)A where L, is just L, reordered (i.e.,

1 0 0_, = P2L, P2 T). To illustrate, suppose L, = a 1 0 and P = P(23).

b 0 1

Then

1 0 01 0 0 1 0 0 1 0 0

L,=11 [

0 0 1 a 1 01[

0 0 1 =[

b 1 0 .

0 1 0 b 0 1 0 1 0 a 0 1

M_ ore generally, L,,_1 Pn-, ... L, P,A = Ln_,Ln_2... L,(Pn-, ... P,)A =L P A = U where Lk = P,,_1 . Pk+, Lk P+i for k = 1, 2, ... , n - 2 and

L = L11_1 and P = Pn_1...P1. Then PA = L-'U = LU. Wehave argued the following theorem.

THEOREM 2.15For any A E C""', there exists a permutation matrix P, an m-by-m lowertriangular matrix L with ones down its diagonal, and an tipper triangular m-by-n row echelon matrix U with PA = LU. If A is n-by-n invertible, L and Uare unique.


0 2 16 2

For example let A = J 2 1 -1 3,

4 3 2 7

8 5 15 24

Clearly there is no way to get Gauss elimination started with a,, = 0. A rowswap is required, say

2 1 -I 3

P12A = 40 2 16 2

3 2 7Then T43(4)T42(-Z)TT2(-Z)T41(-4)

8 5 15 24

T3,(-2)P,2A =

The reader may v

2 1

0 2

0 00 0

erify

-116

-40

3

2

011

1 0 0 0 2 1 -1 3

0 1 0 0 0 2 16 2PI 2 A =

2 1 0 0 0 -4 0

4 1 0 0 0 11

How does this LU business help us to solve systems of linear equations?Suppose you need to solve Ax = b. Then, if PA = LU, we get PAx = Pb orLUx = Pb = b. Now we solve Ly = b by forward substitution and Ux = yby hack substitution since we have triangular systems. So, for example,

2x2 + 16x3 + 2x4 = 10

2x, + x2 - x3 + 3x4 = 17b i ttene wr4x, + 3x2 + 2x3 + 7x4 = 4 can

8x, + 5x2 + 15X3 + 24x4 = 5

X1 10 17

X2 17 10P12A

X3= Pie

4

_4

X4 5 5

There tore,

Yi l 1 0 0 0 y, 17

Y2 I 0 1 0 0 Y2 10

y3 0 y3 4

y4 4 4 1 y4 5

2.3 The LU and LDU Factorization

which yields

and

y1 17

Y2 10

Y3 = -35Y4

521

4

2 1 - 1 3 xl 17

0 2 1 6 2 x2 _ 10

0 0 - 4 0 x3 - -350 0 0 1 1 x4 -121

4

It follows that

XI

X2

X3

X4

34337

48678

35448521

44

71

It may seem a bit unfair that L gets to have ones down its diagonal but Udoes not. We can fix that (sometimes). Suppose A is nonsingular and A = LU.The trick is to pull the diagonal entries of U out and divide the rows of U byeach diagonal entry. That is,

0I EU EU ...

0 0 1 Ell ...d2 d2

d 0 0 ... 0 1

Then we have LDU1 = A where D is a diagonal matrix and UI is uppertriangular with ones on the diagonal as well. For example,

A --2248

21

3

5

16 2-1 3

2 715 24

1 0 0 0 -2 2 16 2-1 1 0 0 0 3 15 5

-2-4

7/3 1 013/3 -14 1

00

00

-1 -2/30 1

1 0 0 0 -2 0 0 0 1 -1 -8 -1-I 1 0 0 0 3 0 0 0 1 5 5/3-2 7/3 1 0 0 0 -1 0 0 0 1 2/3

We summ

-4arize.

13/3 -14 1 0 0 0 I 0 0 0 1


THEOREM 2.16 (LDU factorization theorem)If A is nonsingular, there exists a permutation matrix P such that P A = L D Uwhere L is lower triangular with ones down the diagonal, D is a diagonalmatrix, and U is upper triangular with ones down the diagonal. This factorizationof PA is unique.

One application of the LDU factorization is for real symmetric matrices (i.e.,matrices over R such that A = AT). First, A = LDU so AT = UT DT LT =UT DLT . But A = AT so LDU = UT DLT . By uniqueness U = LT . Thussymmetric matrices A can be written LDLT. Suppose D = diag(di, d2,... ,

has all real positive entries. Then D2 = diag( dl, d2, ... , makessense and (D2)2 = D. Let S = D1LT. Then A = LDLT = LD2D2LT =STS, where S is upper triangular with positive diagonal entries. Conversely, ifA = RRT where R is lower triangular with positive diagonal elements, thenR = LD where L is lower triangular with ones on the diagonal and D is adiagonal matrix with all positive elements on the diagonal. Then A = LD2LTis the LDU factorization of A.

Let's take a closer look at an LU factorization as a "preview of coming

attractions.' Suppose A= L U=

r I

IxyZ

0 0 0 1 r a b c d1 0 0 1 0 f g It

r 1 0 0 0 0 0s t

I

1

I

0 0 0 0a b c dxa xb + f xc + g xd + hya yb +rf yc+rg yd +rza zb + sf zc + sg zd + s

h

h

1 01

x rL

a0

f c

dJ. The block of zeros in U makes part of L

Z sL

irrelevant. We could reconstruct the matrix A without ever knowing the valueoft!More specifically A

I 1 0 1 0 1 0

= a x l b x + f 1 I c x +g I I d x +h IY y r y r y rZ z x z x z s


so we have another factorization of A different from LU where only thecrucial columns of L are retained so that all columns of A can be reconstructedusing the nonzero rows of U as coefficients. Later, we shall refer to this as a fullrank factorization of A. Clearly, the first two columns of L are independent, soGauss elimination has led to a basis of the column space of A.

Exercise Set 5

1. Prove that the product of upper (lower) triangular matrices is upper(lower) triangular. What about the sums, differences, and scalar multiplesof triangular matrices?

2. What can you say about the transpose and conjugate transpose of an upper(lower) triangular matrix?

3. Argue that every square matrix can be uniquely written as the sum of astrictly lower triangular matrix, a diagonal matrix, and a strictly uppertriangular matrix.

4. Prove that the inverse of an upper (lower) triangular matrix, when it exists,is again upper (lower) triangular.

5. Prove that an upper (lower) triangular matrix is invertible if the diagonalentries are all nonzero.

6. Argue that a symmetric upper (lower) triangular matrix is a diagonalmatrix.

7. Prove that a matrix is diagonal iff it is both upper and lower triangular.

8. Prove the uniqueness of the LDU factorization.

9. Prove A is invertible if A can be reduced by elementary row operationsto the identity matrix.

2 1 -1 3

10. LetA= 4 2 2 7

I

-2 -1 16 2

8 4 15 24


0 0 0 2 1 -1 3

21 0 0 0 0 4 1Multiply and

-1 3 1 0 0 0 3 2

4 1 5 1 0 0 0 1

1 0 0 U [ 2 1 -1 3

1 2 1 0 0 1 0 0 4 1

is What do you notice?-1 1 0 0 0 0 5/4

L 4 a. 5 1 J L 0 0 0 0Does this contradict any of our theorems? Of course not, but the questionis why not?

2 1 -1 3

11. LetA - 4 3 2 7

-2 2 16 2

8 5 15 24

(a) Find an L U factorization by multiplying A on the left by elementarytransvections.

(b) Find an LU factorization of A by "brute force" Set A1 0 0 0 2 1 -1 3

_ x 1 0 0 0 a b c- y r 1 0 0 0 d e

z s t 1 0 0 0 fMultiply out and solve. Did you get the same L and U ? Did you

have to?

12. The leading elements in the nonzero rows of a matrix in row echelon formare called pivots. A pivot, or basic column, is a column that contains apivot position. Argue that, while a matrix A can have many differentpivots, the positions in which they occur are uniquely determined by A.This gives us one way to define a notion of "rank" we call pivot rank.The pivot rank of a matrix A is the number of pivot positions in anyrow echelon matrix obtained from A. Evidently, this is the same as thenumber of basic columns. The variables in a system of linear equationscorresponding to the pivot or basic columns are called the basic variablesof the system. All the other variables are called free. Argue that an in-by-it system of linear equations, Ax = b with variables x1, ... , x,,, isconsistent for all b ill A has in pivots.

13. Argue that an LU factorization cannot be unique if U has a row of zeros.

14. If A is any m-by-n matrix, there exists an invertible P so that A =P-'LU.


Further Reading


[E&S, 2004] Alan Edelman and Gilbert Strang, Pascal Matrices,The American Mathematical Monthly, Vol. I1 1 , No. 3, March, (2004),189-197.

[Johnson, 2003] Warren P. Johnson, An LDU Factorization in ElementaryNumber Theory, Mathematics Magazine, Vol. 76, No. 5, December,(2003), 392-394.

[Szabo, 2000] Fred Szabo, Linear Algebra, An Introduction UsingMathematica, Harcourt, Academic Press, New York, (2000).

2.3.1 MATLAB Moment

2.3.1.1 The LU Factorization

MATLAB computes LU factorizations. The command is

[L, U, P] = lu(A)

which returns a unit lower triangular matrix L, an upper triangular matrix U,and a permutation matrix P such that PA = LU. For example,

>>A=[0 2 16 2;2 1 -13;4 3 2 7;8 5 15 241A=

0 2 16 2

2 1 -1 3

4 3 2 7

8 5 15 24

> [L,U,P]=Iu(A)L=

1 0 0 00 1 0 01/2 1/4 1 0

1/4 -1/8 11/38 1


U=8 5 15 24

0 2 16 2

0 0 -19/2 -22/19

P=0 0 0 1

1 0 0 0

0 0 1 0

0 1 0 0

adjugate, submatrix, principal submatrices, principal minors,leading principal submatrices, leading principal minors, cofactors

2.4 The Adjugate of a Matrix

There is a matrix that can be associated to a square matrix and is closelyrelated to the invertibility of that matrix. This is called the adjugate matrix, oradjoint matrix. We prefer to use the word "adjoint" in another context, so we gowith the British and use "adjugate." Luckily, the first three letters are the samefor both terms so the abbreviations will look the same.

Suppose A is an m-by-n matrix. If we erase m - r rows and n - c columns,what remains is called an r-by-c submatrix of A. For example, let

all a12 a13 a14 a15 a16 alla21 a22 a23 a24 a25 a26 a27

A = a31 a32 a33 a34 a35 a36 a37 . Suppose we strike outa41 a42 a43 a44 a45 a46 a47

a51 a52 a53 a54 a55 a56 a57two rows, say the second and fifth, and three columns, say the second, fourth,and fifth, then the (5 - 2)-by-(7 - 3) submatrix of A we obtain is

all a13 a16 all

a31 a33 a36 a37E C3x4

a41 a43 a46 a47

Them -r rows to strike out can be chosen in ways and the n -c columnsm -rcan be chosen in ( " ) ways, so there are (ni (n"c) possible submatrices thatn-c -v -can be formed from an in-by-n matrix. For example, the number of possiblesubmatrices of size 3-by-4 from A above is (;) (4) = 350. You probably wouldnot want to write them all down.

We are most interested in submatrices of square matrices. In fact, weare very interested in determinants of square submatrices of a square matrix.

2.4 The Adjugate of a Matrix 77

The determinant of an r-by-r submatrix of A E C" x" is called an r-by-r minorof A. There are ()2

such minors possible in A. We take the O-by-O minor of anymatrix A to be 1 for convenience. Note that there are just as many minors of orderr as of order n-r. For later, we note that the principal submatrices of A are ob-tained when the rows and columns deleted from A have the same indices. A sub-matrix so obtained is symmetrically located with respect to the main diagonal ofA. The determinants of these principal submatrices are called principal minors.Even more special are the leading principal submatrices and their determinants,

all a12 a13called the leading principal minors. If A = a21 a22 a23 , the leading

a3l a32 a33

all a12all a12 a13lprincipal submatrices are [all ],

azI azz] , and a21 a22 a23

a31 a32 a33Let M11(A) be defined to be the (n - l)-by-(n - 1) submatrix of A

obtained by striking out the ith row and jth column from A. For example,1 2 3

if A = 4 5 6 , then M12(A) =L 7

49 The (i, j)-cofactor of A is

7 8 9 L

defined by

cof,j(A) = (-1)`+idet(M;j(A)).

For example, for A above, cof12(A) _ (-1)1+2detL

4 9(-1)(-6)

6. L

We now make a matrix of cofactors of A and take its transpose to create anew matrix called the adjugate matrix of A.

adj(A) :_ [cof;j (A)]"

For example,

d etL

a22a32

all aI2 a13and if A = a21 a22 a23

a31 a32 a33

adj(A) =

adj\L c d J/- L

da J

-det a12a32L

det f a12 ]a13 -det [a21 a23 j det I a

21L a23a22 a a22

a23

a33]aI3

a33

-det [aa21 a23

31 a33 ] det[ a31 a32 ]

detL a31 a33

3 ] -det I alla32 ]

I

T


You may be wondering why we took the transpose. That has to do with a formulawe want that connects the adjugate matrix with the inverse of a matrix. Let'sgo after that connection.

Let's look at the 2-by-2 situation, always a good place to start. Let A =

a dJ .

Let's compute A(adj(A)) =L

a db

I I - abad + b(-c) a(-b) + ba _ ad - be 0cd + d(-c) c(-b) + da 0 ad - be

- r det(A) 00 det(A)

= det(A) [ 0 ].That is a neat answer! Ah, but does it persist with larger

matrices? In the 3-by-3 case, A =

A(adj(A))

alla21

a31

alt a13

a22 a23

a32 a33

alla21

a31

a12 a13

a22 a23

a32 a33

and so

a22a33 - a23a32 -al2a33 + a13a32 aI2a23 - al3a22-a21a33 +a3la23 aIIa33 -aI3a3I -a1Ia23 +a21a13a21a32 - a22a3l -a11a32 + a12a31 al Ia22 - al2a21

det(A) 0 0 1 0 00 det(A) 0 = det(A) 0 1 00 0 det(A) 0 0 1

Indeed, we have a theorem.

THEOREM 2.17For any n-by-n matrix A, A(adj(A)) = (det(A))I,,. Thus, A is invertible iff

det(A):0 and, in this case, A-1 =det

1

(A)adj(A).

PROOF The proof requires the Laplace expansion theorem (see Appendix Q.

We compute the (i, j)-entry: ent, (A(adj(A)) = >ent;k(A)entkl(adj(A)) _k=1

Eaik(-1)i+kdet(Mik (A)) =j det(A) ifi = j

k=1 l 0 ifi0j

2.4 The Adjugate of a Matrix 79

While this theorem does not give an efficient way to compute the inverse ofa matrix, it does have some nice theoretical consequences. If A were 10-by-10, just finding adj(A) would require computing 100 determinants of 9-by-9matrices! There must be a better way, even if you have a computer. We can, ofcourse, illustrate with small examples.

6 1 4 -4 6 2

Let A = 3 0 2 . Then adj(A) _ -8 16 0 and

-1 2 2 6 -13 -36 1 4 -4 6 2 -8 0 0

Aadj(A) = 3 0 2 -8 16 0 = 0 -8 0-1 2 2 6 -13 -3 0 0 -8

-4 6 2

so we see det(A) = -8 and A-' _ -8 -8 16 0 , as the reader6 -13 -3

may verify.

Exercise Set 6

1. Compute the adjugate and inverse of

3 5 1 2-1 0 1 06 4 2 7

5 3 1 1

3 6 J,

2 4 3

0 2 4 ,

0 0 0

, if they exist, using Theorem 2.17 above.

2. Write down the generic 4-by-4 matrix A. Compute M,3(A) andM24 (M,3(A)).

3. Establish the following properties of the adjugate where A and B are inCnxn:

(a) adj(A-') = (adj(A))-', provided A is invertible(b) adj(cA) = cn-'adj(A)(c) if A E C"xn with n > 2, adj(adj(A)) _ (det(A))i-2A(d) adj(AT) = (adj(A))T so A is symmetric itf adj(A) is(e) adj(A*) _ (adj(A))* so A is Hermitian iffadj(A) is(f) adj(AB) = adj(B)adj(A)(g) adj(adj(A)) = A provided det(A) = 1(h) adj(A) = adj(A)(i) the adjugate of a scalar matrix is a scalar matrix

80 Generating lnvertible Matrices

(j) the adjugate of a diagonal matrix is a diagonal matrix(k) the adjugate of a triangular matrix is a triangular matrix(1) adj(T;j(-a)) = T,j(a)

(m) adj(1) = I and adj((D) = G.

4. Find an example of a 3-by-3 nonzero matrix A with adj(A) = 0.

5. Argue that det(adj(A))det(A) = (det(A))" where A is n-by-n. So, ifdet(A) 0, det(adj(A)) (det(A))"-1.

6. Argue that detL

®®

= (-1)""' whenever n > 1 and in > I.

7. Prove that det I u 1 = (3det (A) - v*(adj(A))u = det(A)((3 -V* 0

v*A-lu) where (3 is a scalar and u and v are n-by-1. (Hint: Do a Laplaceexpansion by the last row and then more Laplace expansions by the lastcolumn.)

8. Argue that ad j(1 - uv*) = uv* + (I - v*u)/ where u and v are n-by-1.

9. Prove that det(adj(adj(A))) = (det(A))("-') .

10. If A is nonsingular, adj(A) = det(A)A-1.

Further Reading

[Aitken, 1939] A. C. Aitken, Determinants and Matrices, 9th edition,Oliver and Boyd, Edinburgh and London, New York: Interscience Pub-lishers, Inc., (1939).


[Bress, 1999] David M. Bressoud, Proofs and Confirmations: The Storyof the Alternating Sign Matrix Conjecture, Cambridge University Press,(1999).

[Bress&Propp, 1999] David Bressoud and James Propp, How the Alter-nating Sign Conjecture was Solved, Notices of the American Mathemat-ical Society, Vol. 46, No. 6, June/July (1999), 637-646.

2.5 The Frame Algorithm and the Cayley-Hamilton Theorem 81

Group ProjectFind out everything you can about the alternating sign conjecture and write

a paper about it.

characteristic matrix, characteristic polynomial, Cayley-Hamiltontheorem, Newton identities, Frame algorithm

2.5 The Frame Algorithm and the Cayley-HamiltonTheorem

In 1949, J. Sutherland Frame (24 December 1907 - 27 February 1997)published an abstract in the Bulletin of theAmerican Mathematical Society indi-cating a recursive algorithm for computing the inverse of a matrix and, as a by-product, getting additional information, including the famous Cayley-Hamiltontheorem. (Hamilton is the Irish mathematician William Rowan Hamilton(4 August 1805 - 2 September 1865), and Cayley is Arthur Cayley (16 August1821 - 26 January 1895.) We have not been able to find an actual paper with adetailed account of these claims. Perhaps the author thought the abstract suffi-cient and went on with his work in group representations. Perhaps he was toldthis algorithm had been rediscovered many times (see [House, 1964, p. 72]).Whatever the case, in this section, we will expand on and expose the detailsof Frame's algorithm. Suppose A E C"'. The characteristic matrix of A isxl - A E C[x J""", the collection of n-by-n matrices with polynomial entries.We must open our minds to accepting matrices with polynomial entries. For

exam le, x + I x - 3p 4x + 2 x3 - 7 ] E C[X]2,2. Determinants work just fine for

these kinds of matrices. The determinant ofxl - A, det(xl - A) E C[x], thepolynomials in x, and is what we call the characteristic polynomial of A:

XA(x)=det(xl -A)=x"+clx"-1+--.+Cn-Ix+c".

1 1 2 2For example, if A 3 4 5 E C3"3, then X13 - A =

6 7 8x -I 1 2 -- -2-3 x - 4 -5 E Thus XA(x) _C[x]3i3-6

I

-7 x -8.


x - 1 -2 -2det -3 x - 4 -5 J = x3- 13x2-9x-3. This is computed

-6 -7 x-8using the usual familiar rules for expanding a determinant.

You may recall that the roots of the characteristic polynomial are quite im-portant, being the eigenvalues of the matrix. We will return to this topic later.For now, we focus on the coefficients of the characteristic polynomial.

First, we consider the constant term c,,. You may already know the an-swer here, but let's make an argument. Now det(A) = (-1)"det(-A) _(-1)"det(01 - A) = (-1)"XA(0) = (-I)"c,,. Therefore,

det(A) = (-1)"c,,

As a consequence, we see immediately that A is invertible iff c, A 0, in whichcase

A-' = (-I)"adj(A),c'1

where adj(A) is the adjugate matrix of A introduced previously. Also recall theimportant relationship, Badj(B) = det(B)I. We conclude that

(xl - A)adj(xl - A) = XA(x)l.

To illustrate with the example above, (xl - A)adj(xl - A) =

x-1 -2 -2 x2- 12x-3 2x-2 2x+2-3 x-4 -5 3x +6 x2-9x-4 5x+1-6 -7 x- 8 6x -3 7x+5 x2-5x-2

x3-13x2-9x-3 0 00 x3- 13x2-9x-3 00 0 X3-13x2-9X-3

1 0 0=x3-13x2-9x-3 0 1 0

0 0 1

Next, let C(x) = adj(xl - A) E C[x]""". We note that the elements ofadj(xl - A) are computed as (n-1)-by-(n-l) subdeterminants of xl - A, sothe highest power that can occur in C(x) is x". Also, note that we can iden-tify C[x]""", the n-by-n matrices with polynomial entries with C"[x], thepolynomials with matrix coefficients, so we can view C(x) as a polynomial in

x'-+1 x-3 _x with scalar matrices as coefficients. For example, 4x+2 x3-7 ]

1-3

[0 01

]x3+[0 0]x2+[4 ]+[ 2 -71.

All you do is


gather the coefficients of each power of x and make a matrix of scalars as thecoefficient of that power of x. Note that what we have thusly created is anelement of Cnxn[x], the polynomials in x whose coefficients come from then-by-n matrices over C. Also note, xB = Bx for all B E tC[x]"x", so it doesnot matter which side we put the x on. We now view C(x) as such an expressionin Cnx"[x]:

C(x) = Box"-I +BI

These coefficient matrices turn out to be of interest. For example, adj(A) _(-1)n-Iadj(-A) = (-l)"-IC(0) = (-1) 'B,,-,, so

(-1)n-Ign-Iadj(A) =

Thus, if cn 54 0, A is invertible and we have

A-I = Bn-Icn

But now we compute

(xl - A)adj(xl - A) = (xl - A)C(x) = (xI - A)(Box"-I + B1xi-2+ ... + B.-2X + Bn-1)

= x" Bo + x"_' (BI - ABo) + xn-2(B2 - ABI) ++x(Bi_1 - ABn_2) - AB.-1

= x"I + x"-I C11 + ... + xci_1I + cnl

and we compare coefficients using the following table:

Compare Coefficients Multiply by on the Left on the Right

Bp=I All An Bo A"BI - ABo = c1l An-' A"-I BI - AnBo CI An-1

B2 - A BI = C21 A"-2 An-2B2 - An-'BIC2An

-2

Bk - ABk_I = ckI

Bn_2 - ABn-3 = Cn_2I A2 A2Bn-2 - A3B,i-3 Cn-2A2

Bn-1 - ABn_2 = cn_1l A ABn_1 - A2Bn_2-ABn_I = -ABn-Icolumn sum = 0 = XA(A)


So, the first consequence we get from these observations is that the Cayley-Hamilton theorem just falls out as an easy consequence. (Actually, Liebler[2003] reports that Cayley and Hamilton only established the result for matricesup to size 4-by-4. He says it was Frohenius (Ferdinand Georg Frobenius[26 October 1849 - 3 August 19171 who gave the first complete proof in1878.)

THEOREM 2.18 (Cayley-Hamilton theorem)For any n-by-n matrix A over C, XA(A) = 0.

What we are doing in the Cayley-Hamilton theorem is plugging a matrix intoa polynomial. Plugging numbers into a polynomial seems reasonable, almost in-evitable, but matrices? Given a polynomial p(x) = ao+a, x+ - +akxk E C[x],we can create a matrix p(A) = a0 l + a, A + + ak Ak E C1 '". For example,

1 0 0

ifp(x) = 4 + 3x - 9x 3, then p(A)=41+3A-9A33=4 00

I 00 1

+

1 2 2 1 2 2 ; -2324 -2964 -34323 3 4 5 - 9 3 4 5 = -5499 -7004 -8112

6 7 8 6 7 8 -9243 -11 778 -13634The Cayley-Hamilton theorem says that any square matrix is a "root" of itscharacteristic polynomial.

But there is much more information packed in those equations on the left ofthe table, so let's push a little harder. Notice we can rewrite these equations as

Bo = I

B, = ABo+c,lB2 = AB, +C21

B,_1 =A B" _, + c 1.

By setting B, := 0, we have the following recursive scheme clear fromabove: for k = 1, 2, ... , n,

Bo = IBk = A Bk _, + 1.

In other words, the matrix coefficients, the Bks are given recursively in termsof the B&_,s and cks. If we can get a formula for ck in terms of Bk_,, we willget a complete set of recurrence formulas for the Bk and ck. In particular, if weknow B"_, and c, we have A-1, provided, of course, A-1 exists (i.e., provided


c A 0). For this, let's exploit the recursion given above:

Bo=IBI = ABo + c1I = AI + c11 = A + c11B2 = ABI +C21 = A(A+c11)+C21 = A2+c1A+C21B3 = = A3+CIA2+C2A+C31

Inductively, we see for k = 1, 2, ... , n,

Bk = Ak+CIAk_1 +...+Ck_IA+CkI.

Indeed, when k = n, this is just the Cayley-Hamilton theorem all over again.Now we have fork =2,3,... , n + 1,

Bk_I = Ak-I +CIAk-2+...+Ck-2A+ck_11.

If we multiply through by A, we get for k = 2, 3, ... , n + 1,

ABk_I = Ak+c1Ak_1 +...+Ck_2A2+Ck_IA.

Now we pull a trick out of the mathematician's hat. Take the trace of both sidesof the equation using the linearity of the trace functional.

tr(ABk_1) = tr(Ak) + c1tr(Ak-1) + + ck_2tr(A2) + ck_Itr(A)fork=2,3,...,n+1.

Why would anybody think to do such a thing? Well, the appearance of thecoefficients of the characteristic polynomial on the right is very suggestive.Those who know a little matrix theory realize that the trace of A' is the sumof the rth powers of the roots of the characteristic polynomial and so Newton'sidentities leap to mind. Let Sr denote the sum of the rth powers of the roots ofthe characteristic polynomial. Thus, for k = 2, 3, ... , n + 1,

tr(A Bk_I) = Sk + cISk_I + ' + Ck_2S2 + Ck_ISI.

2.5.1 Digression on Newton's Identities

Newton's identities go back aways. They relate the sums of powers of the rootsof a polynomial recursively to the coefficients of the polynomial. Many proofsare available. Some involve the algebra of symmetric functions, but we do notwant to take the time to go there. Instead, we will use a calculus-based argumentfollowing the ideas of [Eidswick 1968]. First, we need to recall some facts aboutpolynomials. Let p(x) = ao + a I x + + a"x". Then the coefficients of p


can he expressed in terms of the derivatives of p evaluated at zero (rememberTaylor polynomials?):

0 cn t

2 P n i 0)xnP(x) = P(0) + p'(0)x + p )x2 + ... +

Now here is something really slick. Let's illustrate a general fact. Supposep(x) = (x - I)(x - 2)(x - 3) = -6 + 1 Ix - 6x2 + x-t. Do a wild andcrazy thing. Reverse the rolls of the coefficients and form the new reversedpolynomial q(x) _ -6x3 + I1x2 - 6x + I. Clearly q(l) = 0 but, moreamazingly, q(;) _ -

s+ a - ; + 1 = -6+2

8

24+8 = 0. You can also checkq(!) = 0. So the reversed polynomial has as roots the reciprocals of the roots ofthe original polynomial. Of course, the roots are not zero for this to work. Thisfact is generally true. Suppose p(x) = ao + aix + + anx" and the reversedpolynomial is q(x) = an + an_ix + + aox". Note

gc,u(0)

n

Then r # 0 is a root of p iff I is a root of q.r

Suppose p(x) = ao + aix + + anx" = an(x - ri)(x - r2) ... (x - rn).The ri s are, of course, the roots of p, which we assume to be nonzero but notnecessarily distinct. Then the reversed polynomial q(x) = an +an-tx + +aox" = ao(x - -)(x - -) . . . (x - -). For the sake of illustration, suppose

ri r2 rnit = 3. Then form f (x)

_ q'(x) _ (x - ri IH(x - rz ') + (x - r; ' )] + [(x - r2 + (x - r3 )]q(x) I I (x - ri)(x - r2)(x - r3)

_i + + _i . Generally then,x-r x-r2 x-r3

f(x) 4_1(x-rk

n,

Let's introduce more notation. Let s,,, = >r" for rn = 1, 2, 3, .... Thus,t=i

sn, is the sum of the mrh powers of the roots of p. The derivatives of f are


intimately related to the ss. Basic differentiation yields

f (0) = -S I

f'(x )n

-II(rPn

f'(0) = -s2

(x-r2r f "(O) _ -2s3

nf(k)(x) (x- ') +' f (*)(0) = -k!Sk+l

The last piece of the puLzle is the rule of taking the derivative of a product;this is the so-called Leibnitz rule for differentiating a product:

D'1(F(x)G(x)) = j(n) F(i)(x)G(n-i)(x).

right, let's do the argument. We have f (x) = q'(x), so q'(x) = f (x)q(x).All

Therefore, using the Leibnitz rule

m-Iq('n)(x) = [f(x)q(x)](m-1) =

E(mk

11 f(k)(x)q(m-I-k)(x).

k=0

Plugging in zero, we get

III - I(m) /n - l \ (k) (m-I-k)

9 (0) _ k ff (0)9 (0)

k=0 /

rrl (m - I )!Ok!(m - I -k)!(-k!)sk+lq(m-I-k)(0).

Therefore,

q(In)(O) I "'-I q(In-I-k)(0)= an-m = -- Sk+lm! mk-o (m - I - k)!

One more substitution and we have the Newton identities

n1_1

0 = man_m + Ean-m+k+I Sk+I if I < m < Ilk=0

m-I

0 = an-m+k+ISk+I if m > n.k=m-n-l


For example, suppose n = 3, p(x) = ao + a1x + a2x22 + a3x3.

in=l a2+s1a3=0in = 2 2a1 + s1a2 + s2a3 = 0

in = 3 Sao + s,a1 + a2s2 + a3s3 = 0

in = 4 slap + s,a1 + s3a2 + S4a3 = 0

in = 5 aoS2 + aIS3 + a2s4 + a3s5 = 0

in = 6 ansi + a1 S4 + a2 S5 + a3 S6 = 0

That ends our digression and now we go back to Frame's algorithm. We needto translate the notation a bit. Note ak in the notation above. So from

tr(A Bk-I) = Sk + CISk_I + ... + Ck_2S2 + Ck_ISI

fork =2,3,... n+lwe see

tr(A Bk_i) + kck = 0

fork=2,3,... n.

In fact, you can check that this formula works when k = I (see exercise 1), sowe have succeeded in getting a formula for the coefficients of the characteristicpolynomial: fork = 1, 2, ... , n,

-Ick =

ktr(ABk_i).

Now we are in great shape. The recursion we want, taking Bo = /, is given bythe following: fork = 1, 2, ... , n,

-1Ck =

ktr(ABk_i)

Bk = A Bk-I + Ckf.

Note that the diagonal elements of are all equal (why?).Let's illustrate this algorithm with a concrete example. Suppose

2 3 5 2

7A = 40

3

;04 . The algorithm goes as follows: first, find cl.

2 5 1 -27

c1 = -tr(A) = -(-17) = 17.

Next find BI :

1 19 3 5 2

B1=ABo+171= 4 24 10 3

0 3 18 4

2 5 1 -10


Then compute A Bi :

54 103 132 13

1110 225 273 39

AB1 =20 95 52 -274 -6 51 293

Now start the cycle again finding c2:

c2=-Itr(ABi)=-2624=-312.

Next comes B2:

1 258 103 1 32 13-

B = A B + (-312)1 =110 -87 2 73 39

2 ,20 95 -2 60 -274 -6 5 1 -19

Now form A B2:

F -78 408 -115 -30

AB2 = -50 735 -8 - 2366 -190 763 14

-54 28 -8 70 7

Starting again, we find c3:

C3 tr(AB2) _ -(2127) = -709.

Then we form B3:

-787 408 -115 -30-50 26 -8 -2

B =AB2-7091=3366 -190 54 14

-54 28 -8 -2

Now for the magic; form AB3 :

2 0 0 0-0 -2 0 0

AB3 =0 0 -2 0

0 0 0 -2

Next form c4:

c4 = -4tr(AB3) = -2(-8) = 2.


Now we can clean up. First

det(A) _ 1)4C4 = 2.

Indeed the characteristic polynomial of A is

XA(x) = xa + 17x; - 312X2 - 709x + 2.

Immediately we see A is invertible and

-787 408 -1 15 -30_ a( )

'1 -50 26 -8 -2A- = adj(A) _ ;(-1)3B =

c4 2

366 -190 54 14

-54 28 -8 -27Z7 -204 ii 15

25 -13 4 1

-183 95 -27 -727 -14 4 1

Moreover, we can express A-' as a polynomial in A with the help of thecharacteristic polynomial

709 312 17 IA-' = -1 +2

A - 2 A2-2A;.

2.5.2 The Characteristic Polynomial and the Minimal Polynomial

We end with some important results that connect the minimal and character-istic polynomials.

THEOREM 2.19The minimal polynomial IAA (x) divides any annihilating polynomial of the n-by-n matrix A. In particular, P-A(x) divides the characteristic polynomial XA(x).However, XA(x) divides (ILA(x))" .

PROOF The first part of the proof involves the division algorithm and is left asan exercise (see exercise 21). The last claim is a bit more challenging, so we offeraproof. Write p.A(x) = [3r+[3r_I x+ +p,x'-I+xr. Let B0 = 1,,, and let BiA'+01 Itor-I.Itiseasytoseethat B;-AB;_, =oi/for i = I to r - 1. Note that ABr-1 = Ar + [3, Ar-i + + 1r-I A =µA (A) - 0,,1,. = -[3rIn. Let C = Box' - I + B,xr-2 + ... + Br-2x + Br-1-Then C E C[x]""' and (xI - A)C = (xI, - A)(Boxr-i + B,xr-2 + ... +Br_2x+Br_,) = Boxr+(B, -ABO)xr-I + . +(Br-, -ABr_2)x-ABr_, =

2.5 The Frame Algorithm and the Cayley-Harnilton Theorem 91

Or In + (3, -ix!" +- + 3ixr-11 +x` I,, = µA (X)1". Now take the determinantof both sides and get XA(x)det(C) = (µA (X))" . The theorem now follows. 0

A tremendous amount of information about a matrix is locked up in itsminimal and characteristic polynomials. We will develop this in due time. Forthe moment, we content ourselves with one very interesting fact: In view of thecorollary above, every root of the minimal polynomial must also be a root ofthe characteristic polynomial. What is remarkable is that the converse is alsotrue. We give a somewhat slick proof of this fact.

THEOREM 2.20The minimal polynomial and the characteristic polynomial have exactly thesame set of roots.

PROOF Suppose r is a root of the characteristic polynomial XA. Thendet(rl - A) = 0, so the matrix rl - A is not invertible. This means theremust be a dependency relation among the columns of rl - A. That meansthere is a nonzero column vector v with (rl - A)v = 6 or what is the same,Av = rv. Given any polynomial p(x) = ao + aix + + akxk, we havep(A)v = aov+a,Av+ +akAkv = aov+airv+ +akrkv = p(r)v. Thissays p(r)!-p(A) is not invertible, which in turn implies det(p(r)!-p(A)) = 0.Thus p(r) is a root of Xp(A). Now apply this when p(x) = liA(x). Then, for anyroot r of XA, iJA(r) is a root of Xµ,(A)(X) = XO(x) = det(xI - 0) = x". Theonly zeros of x" are 0 so we are forced to conclude p.A(r) = 0, which says r isa root of µA. 0

As a consequence of this theorem, if we know the characteristic polynomialof a square matrix A factors as

XA(X) _ (x - r, )d' (X - r2)d2 ... (X - rk )d ,

then the minimal polynomial must factor as

11A(X) = (x - ri )'(x - r2)e2 ... (x - rk)r"

wheree, <d; foralli = 1,2,... ,k.

Exercise Set 7

1. Explain why the formula tr(ABk-1) + kck = 0 for k = 2, 3, 4, .., n alsoworks for k = 1.

2. Explain why the diagonal elements of are all equal.

92 Generating Inertible Matrices

3. Using the characteristic polynomial, explain why A ' is a polynomial inA when A is invertible.

4. Explain how to write a program on a handheld calculator to compute thecoefficients of the characteristic polynomial.

5. Use the Newton identities and the fact that sk = tr(Ak) to find formulasfor the coefficients of the characteristic polynomial in terms of the trace ofpowers and powers of traces of a matrix. (Hint: c2 =

z[tr(A)2 - tr(A2)],

tr(A3), .... )c3 = Ztr(A)tr(A2) - -3

6. Consider the polynomial p(x) = ao + aix + + a,-Ix"-1 + x". Finda matrix that has this polynomial as its characteristic polynomial. (Hint:

0 I 0 ... 0

0 0 1 0

Consider the matrix

0 0 0 ... 1

-ao -a1 -a2 ... -a,,_,

7. Show how the Newton identities for k > n follow from the Cayley-Hamilton theorem.

8. (D. W. Robinson) Suppose A E C"I and B E C'"I", where m < n.Argue that the characteristic polynomial of A is x "' times the character-istic polynomial of B if and only if tr(Ak) = tr(B') fork = 1, 2, ... , n.

9. (H. Flanders, TAMM, Vol. 63, 1956) Suppose A E C""". Prove that A isnilpotent iff tr(Ak) = 0 fork = 1, 2, ... , n. Recall that nilpotent meansAP = 0 for some power p.

10. Suppose A E C""1" and B E C"'"", where m < n. Prove that the char-acteristic polynomial of AB is x"-"' times the characteristic polynomialof BA.

11. What can you say about the characteristic polynomial of a sum of twomatrices? A product of two matrices?

12. Suppose tr(AL) = tr(Bk) for k = I, 2, 3, .... Argue that XA(x)Xe(x)

13. Suppose A = I B C0

. What, if any, is the connection between the

characteristic polynomial of A and those of B and C?


1 1 I

14. Verify the Cayley-Hamilton theorem for A = 0 1 l

0 0 1

15. Let A E C0". Argue that the constant term of XA(x) is (-1)" (productof the roots of XA(x)) and the coefficient of x" is -Tr(A) = -(sumof the roots of XA(x)) If you are brave, try to show that the coefficientof x"-i is (-1)"-i times the sum of the j-by-j principal minors of A.

16. Is every monic polynomial of degree n in C[x] the characteristic polyno-mial of some n-by-n matrix in C"""?

17. Find a matrix whose characteristic polynomial is p(x) = x4 + 2x3 -3x2+4x-5.

18. Explain how the Cayley-Hamilton theorem can be used to provide amethod to compute powers of a matrix. Make up a 3-by-3 example andillustrate your approach.

19. Explain how the Cayley-Hamilton theorem can be used to simplify thecalculations of matrix polynomials so that the problem of evaluating apolynomial expression of an n-by-n matrix can be reduced to the prob-lem of evaluating a polynomial expression of degree less than n. Makeup an example to illustrate your claim. (Hint: Divide the characteristicpolynomial into the large degree polynomial.)

20. Explain how the Cayley-Hamilton theorem can be used to express theinverse of an invertible matrix as a polynomial in that matrix. Argue thata matrix is invertible if the constant term of its characteristic polynomialis not zero.

21. Prove the first part of Theorem 2.19. (Hint: Recall the division algorithmfor polynomials.)

22. Suppose U is invertible and B = U-'AU. Argue that A and B have thesame characteristic and minimal polynomial.

23. What is wrong with the following "easy" proof of the Cayley-Hamiltontheorem: XA(x) = det(xI - A), so replacing x with A one gets XA(A) _det(AI - A) = det(O) = 0?

24. How can two polynomials be different and still have exactly the same setof roots'? How can it be that one of these polynomials divides the other.


25. What are the minimal and characteristic polynomials of the n-by-n iden-tity matrix How about the n-by-n zero matrix?

26. Is there any connection between the coefficients of the minimal polyno-mial for A and the coefficients of the minimal polynomial for A-' for aninvertible matrix A?

27. Suppose A = BC

What, if any, is the connection between the

minimal polynomial of A and those of B and C?

28. Give a direct computational proof of the Cayley-Hamilton theorem forany 2-by-2 matrix.

Further Reading


[B&R, 1986(4)] T. S. Blyth and E. F. Robertson, Linear Algebra, Vol. 4,Chapman & Hall, New York, (1986).

[Eidswick, 1968] J. A. Eidswick, A Proof of Newton's Power Sum Formu-las, The American Mathematical Monthly, Vol. 75, No. 4, April, (1968),396-397.

[Frame, 1949] J. S. Frame, A Simple Recursion Formula for Inverting aMatrix, Bulletin of the American Mathematical Society, Vol. 55, (1949),Abstracts.

[Gant, 1959] F. R. Gantmacher, The Theory of Matrices, Vol. I, ChelseaPublishing Co., New York, (1959).

[H-W&V, 1993] Gilbert Helmherg, Peter Wagner, Gerhard Veltkamp,On Faddeev-Leverrier's Methods for the Computation of the Character-istic Polynomial of a Matrix and of Eigenvectors, Linear Algebra and ItsApplications, (1993), 219-233.

[House, 1964] Alston S. Householder, The Theory of Matrices in Numer-ical Analysis, Dover Publications Inc., New York, (1964).


[Kalman, 2000] Dan Kalman, A Matrix Proof of Newton's Identities,Mathematics Magazine, Vol. 73, No. 4, October, (2000), 313-315.

[LeV, 18401 U. J. LeVerrier, Sur les Variations Seculaires des ElementsElliptiques des sept Planetes Principales, J. Math. Pures Appl. 5, (1840),220-254.

[Liebler, 2003] Robert A. Liebler, Basic Matrix Algebra with Algorithmsand Applications, Chapman & Hall/CRC Press, Boca Raton, FL, (2003).

[Mead, 1992] D. G. Mead, Newton's Identities, The American Mathe-matical Monthly, Vol. 99, (1992), 749-751.

[Pennisi, 1987] Louis L. Pennisi, Coefficients of the Characteristic Poly-nomial, Mathematics Magazine, Vol. 60, No. 1, February, (1987), 31-33.

[Robinson, 19611 D. W. Robinson, A Matrix Application of Newton'sIdentities, The American Mathematical Monthly, Vol. 68, (1961), 367-369.


2.5.3.1 The Frame Algorithm

As beautiful as the Frame algorithm is, it does not provide a numericallystable means to find the coefficients of the characteristic polynomial or theinverse of a large matrix.

2.5.4 MATLAB Moment

2.5.4.1 Polynomials in MATLAB

Polynomials can be manipulated in MATLAB except that, without The Sym-bolic Toolbox, all you see are coefficients presented as a row vector. So iff (x) = a, x" + a2x"- I + + a"x + a,+,, the polynomial is represented as arow vector

[a, a2 a3 ... an+I]

For example, the polynomial f (x) = 4x3 + 2x2 + 3x + 10 is entered as

f =[42310].

If g(x) is another polynomial of the same degree given by g, the sum off (x) and g(x) is just f + g; the difference is f - g. The product of any two

96 Generating !m'ertible Matrices

polynomials f and g is conv(f,g). The roots of a polynomial can be estimatedby the function roots( ). You can calculate the value of a polynomial f at a givennumber, say 5, by using polyval(f,5). MATLAB uses Horner's (nested) methodto do the evaluation. You can even divide polynomials and get the quotient andremainder. That is, when a polynomial f is divided by a polynomial h there isa unique quotient q and remainder r where the degree of r is strictly less thanthe degree of h. The command is

[q, r] = deconv(f, h).

For example, suppose f (x) = x4 + 4x2 - 3x + 2 and g(x) = x2 + 2x - 5.Let's enter them, multiply them, and divide them.

>> f=[104-32]f=

1 0 4 -3 2

>> g=[I2-5]9=

1 2 -5>> conv(f,g)ans =

1 2 -1 5 -24 19 -10>> [q r] =deconv(fg)q=

r=1 -2 13

0 0 0 -39 67

>> roots(g)ans =

-3.33951.4495

>> polyval(f,0)ans =

2

Let's be sure we understand the output. We see

f (x)g(x) = x6 + 2x5 - x4 + 5x3 - 24x22 + l9x - 10.

Next, we see

f (x) = g(x)(x2 - 2x + 13) + (-39x + 67).

Finally, the roots of g are -3.3395 and 1.4495, while f (0) = 2.

2.5 The Frame Algorithm and the cayley-Hamilton Theorem 97

One of the polynomials of interest to us is the characteristic polynomial,XA(x) = det(xI - A). Of course, MATLAB has this built in as

poly(A).

All you see are the coefficients, so you have to remember the order in whichthey appear. For example, let's create a random 4-by-4 matrix and find itscharacteristic polynomial.

>>format rat>>A=fix(11 *rand(4))+i*fix(i I *rand(4))A=

Columns 1 through 310+ 10i 9 9+ li2+ 101 8+3i 4+2i6+4i 5+8i 6+2i5+9i

Column 40 8+6i

10 + 2i8 + 2i1

4 + 8i>>p= poly(A)p=

Columns l through 31 -28 - 231 -4 + 1961

Columns 4 through 5382 - 12101 -176 + 766i

So we see the characteristic polynomial of this complex 4-by-4 matrix is

XA(X) = x4 - (28 + 23i )x3 + (-4 + 196i )x2 + (382 - 1210i )x

+(-176 + 766i).

We can form polynomials in matrices using the polyvalm command. You caneven check the Cayley-Hamilton theorem using this command. For example,using the matrix A above,

>> fix(polyvalm(p,A))ans =0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

Finally, note how we are using the fix command to get rid of some messydecimals. If we did not use the rat format above, the following can happen.

98 Generating hrvertible Matrices

>> A = fix( I I *rand(4)) + i *fix( I I *rand(4))

A=2.0000 + 3.00001 1.0000 + 7.00001 9.0000 + 4.00001 9.0000 + 7.00001

7.0000 + 3.0000i 7.0000 + 3.00001 6.0000 + 7.00001 7.0000 + 6.00001

3.0000 + 3.0000i 4.0000 + 9.0000i 5.0000 + 6.0000i 8.0000 + 8.000015.0000 + 5.00001 9.0000 + 6.00001 9.0000 + 4.00001 7.0000 + 10.00001

>> p = poly(A)

p=I.0e+003*

Columns I through 40.0010 -0.0210 - 0.02201 -0.0470 - 0.1180i -0.273 - 1.04501l 5umnCo

-2.087-1.3060i>> polyvalm(p,A)ans =I.Oe-009*0.1050 + 0.0273i 0.1398 + 0.05551 0.1525 + 0.03461 0.1892 + 0.0650i0.1121 + 0.01321 0.1560 + 0.03801 0.1598 + 0.0102i 0.2055 + 0.0351 i0.1073+0.0421i 0.1414+0.07891 0.1583+0.05391 0.1937+0.091610.1337+0.0255i 0.1837+0.0600i 0.1962+0.0277i 0.2447 + 0.06181

Notice the scalar factor of 1000 on the characteristic polynomial. That leadingcoefficient is after all supposed to be 1. Also, that last matrix is supposed to bethe zero matrix but notice the scalar factor of 10_9 in front of the matrix. Thatmakes the entries of the matrix teeny tiny, so we effectively are looking at thezero matrix.

Chapter 3

Subspaces Associated to Matrices

subspace, null space, nullity, column space, column rank, row space,row rank, rank-plus-nullity theorem, row rank=column rank

3.1 Fundamental SubspacesIn this section, we recall, in some detail, how to associate subspaces to a

matrix A in C' " xn . First, we recall what a subspace is.

DEFINITION 3.1 (subspace of C")A nonempty subset M C C" is called a subspace of C" if M is closed

under the formation of sums and scalar multiples. That is, if u, v E M, thenu + v E M, and if u E M and a is any scalar, then au E M.

Trivial examples of subspaces are ('), the set consisting only of the zerovector and C" itself. Three subspaces are naturally associated to a matrix. Wedefine these next.

DEFINITION 3.2 (null space and nullity)Let A E C"' ". We define the null space of A as Null (A) = {x E C' j Ax =

-6). The dimension of this subspace of C" is called the nullity of A and isdenoted nlty(A).

DEFINITION 3.3 (column space and column rank)Let A E C'""". We define the column space of A as Col(A) = (Ax I X E C" }.

The dimension of this subspace of Cis called the column rank of A and isdenoted c-rank(A).

99

100 Subspaces Associated to Matrices

DEFINITION 3.4 (row space and row rank)Let A E C""". We define the row space of A as the span of the rows of A in

C". In symbols, we write 7Zow(A) for the row space and the dimension of thissubspace is the row rank of A and is denoted r-rank(A).

THEOREM 3.1Let A E CIII ' ". Then

I. Null(A) is a subspace of C" and nlty(A) < n.

2. Col(A) is a subspace of C' and c-rank(A) < m.

3. Col(A) equals the span of'the columns of A in C"'.

4. 7Zow(A) is a subspace of C' and r-rank(A) < n.

5. 7Zow(A) = Col(AT) and Col(A) = 7Zow(AT ).

6. Row(A) = Col(A*) and Col(A) = 7Zow(A*).

7. Null(A*A) = Null(A), so nlty(A*A) = nlty(A).

PROOF We leave (1) through (6) as exercises. We choose to prove (7) toillustrate some points. To prove two sets are equal, we prove each is includedin the other as a subset. First pick x E Mull(A). Then Ax = _0 so A*Ax =A* 6 = -6 putting X E NUll(A*A). To get the other inclusion, we need to note

xi

X2that ifx = I, then x*x = [xi 12 ... Y,,] I . I = Ixi I2+1x212+ + I

L x,,, J L x,,, J

XI2, using zz = Iz12 for a complex number z. Thus, if x*x = V, thenIj I jxi I2 = 0, implying all xis are zero, making x = . Therefore, if x E

Null(A*A), then A*Ax = 0 , so x*A*Ax = x*' = l e. Then (Ax)*(Ax) _e. But (Ax)*(Ax) = -6 implies Ax = -6 by the above discussion. This puts

x E Null(A). Thus, we have proved Null(A) e NUll(A*A) above and nowNull(A*A) S Null(A). These together prove Null(A) = JVull(A*A). 0

xi

x,

Note that we strongly used a special property of real and complex numbersto get the result above. There are some other useful connections that will beneeded later. We collect these next.

3.1 Fundamental Subspaces 101

THEOREM 3.2Let A E Cm ", B E Cy x M. Then

1. Null(A) C Null(BA)

2. If U E C",xm is invertible, then Null(UA) = AIull(A) so nlty(UA) _nlty(A).

PROOF The proof is left as an exercise. 0

We are often interested in factoring a matrix, that is, writing a matrix as aproduct of two other "nice" matrices. The next theorem gives us some insightson matrix factors.

THEOREM 3.3LetAECmx",BECmxP,DECyxn.Then

1. Col(B) S; Col(A) iff there exists a matrix C in C",1 such that B = AC.

2. Col(AC) C Col(A)for any C in C"", so c-rank(AC) < c-rank(A).

3. Row(D) a Row(A) iff there exists a matrix K in Cg,, such that D =KA.

4. Row(KA) C_ Row(A) for any K in Cg,,, so r-rank(KA) < r-rank(A).

5. LetS E Cnxk T E C"*P.IfCol(S) c Col(T), thenCol(AS) C Col(AT).Moreover, ifCol(S) = Col(T), then Col(AS) = Col(AT).

6. Let S E Cgxm and T E C_'xm. Then, if Row(S) C_ Row(T), thenRow(SA) g Row(TA). Moreover, if Row(S) = Row(T), thenRow(SA) = Row(T A).

7. If U E is invertible, then Row(UA) = Row(A).

8. If V E Cnxn is invertible, then Col(AV) = Col(A).

PROOF The proofs are left as exercises. El

Next, we consider a very useful result whose proof is a bit abstract. It usesmany good ideas from elementary linear algebra. It's a nice proof so we aregoing to give it.


THEOREM 3.4 (rank plus nullity theorem)Let A E Cl"". Then the column rank of A plus the nullity of A equals the

number of columns of A. That is, c-rank(A) + nlty(A) = it.

PROOF Take a basis (vi, V2, , Vq } ofNull (A). Extend this basis to a basisof all of C', say f v, , V2, , vq, w1 , , w, } is the full basis. Note n = q + r.Now take y in Col(A). Then, y = Ax for some x in C". But then, x can beexpressed in the basis (uniquely) as x = al vi +a2v2+ +ayv, +b, w, + +brWr.Thusy= +brAWr =

+ b, Awl + +brAWr. This says the vectors Awl, Awe, , AWr spanCol(A). Now the question is, are these vectors independent? To check, we setc,Aw,+C2Aw2+ Then +Crwr) = -6.This puts the vector c,w, + C2w2 + + CrWr in Null(A). Therefore, thisvector can be expressed in terms of the basis vectors v,, v2, , vy. Then,C,W,+C2W2+ d,v,+d2v2++dyv,q-c,W, -C2W2- -c,.w, = '. But now, we are looking at theentire basis which is, of course, independent, so d, = d2 = .. = d, = c, _C2 = = Cr = 0. Thus, the vectors Awl, Awe, , Awr are independentand consequently form a basis for Col(A). Moreover, r = c-rank(A) andq = nlty(A). This completes the proof. Isn't this a nice argument? 0

COROLLARY 3.1If A E CnIxn and U E C0""' is invertible, then c-rank(UA) = c-rank(A).

PROOF By Theorem 3.2(2)A(ull(UA) = Null(A), sonlty(UA) = nlty(A).By Theorem 3.4, n = c-rank(A)+nlty(A) = c-rank(U A)+nlty(UA). Canceland conclude c-rank(A) = c-rank(UA). 0

Notice that this corollary does not say Col(A) = Col(UA). That is be-

cause this is not true! Consider A = ( 28

Now U =L

-2 01

is invertible and UA = [0

1. But Col(A) _ {aL

2]Ia E C} and

1J

Col(B) _ ((30 J

10 E C} which are, evidently, different subspaces.

THEOREM 3.5If U and V are invertible, then UA V has the same row rank and the samecolumn rank as A.


PROOF We have c-rank(A) = c-rank(UA) by Corollary 3.1 and c-rank(UA) = c-rank(UA V) by Theorem 3.4(8). Thus c-rank(A) = c-rank(UA V).Also, Row(UAV) = Col((UAV )T) = Col(V T ATUT), so r-rank(A)= c-rank(AT) = r-rank(UAV) = c-rank(V T AT UT ). 0

We next look at a remarkable and fundamental result about matrices. Youprobably know it already. The row rank of a matrix is always equal to its columnrank, even though A need not be square and Null(A) and Co!(A) are containedin different vector spaces! Our next goal is to give an elementary proof of thisfact.

THEOREM 3.6 (row rank equals column rank)Let A E C"'X". Then r-rank(A) = c-rank(A).

PROOF Let A be an m-by-n matrix of row rank r over C. Let rl =rowl(A), ... , r", = row (A). Thus r1 = row;(A) = [ail ai2 . fori = 1, 2, ... , m. Choose a basis for the row space of A, Row(A), say bl, b2,... , br. Suppose b1 = [b11b12 ... b;"] for i = 1, 2, ... , r. It follows that eachrow of A is uniquely expressible as a linear combination of the basis vectors:

rl = cl1bl + c12b2 + ... + Clrbr

r2 = C21 bl + C22b2 + ... + C2rbr

rm = Cmibl +Cm2b2+...+C,nrbr

Now [a,Ial2 ... aln] = rl = cl1[bllbl2 ... bin]+C12[b21b22 ... b2n]+ .+Clr[brlbr2 ... brn]. A similar expression obtains for each row. By equating entries,we see for each j,

all

a21

= cl I bl j + c12b2j + . + Cl rbr j

= C21 bl j + C22 b21 + ... + C2rbr1

a,, ,j = Cm I b 1 j + Cm 2 b2j + ... + Cn, r brj .

As a vector equation, we get

Cl1 Cir

C21 C2r

+ +brj j = 1, 2, ...

Cml Cmr


This says that every column of A is a linear combination of ther vectors of cs. Hence the column space of A is generated by r vectors and sothe dimension of the column space cannot exceed r. This says c-rank(A) < r-rank(A). Applying the same argument to AT, we conclude c-rank(AT) < r-rank(AT). But then, we have r-rank(A) < c-rank(A). Therefore, equalitymust hold. a

From now on, we will use the word rank to refer to either the row rank or thecolumn rank of a matrix, whichever is more convenient, and we use the notationrank(A) or r(A). For some matrices, the rank is easy to ascertain. For example,

n. If A = diag(di, d2, , then r(A) is the number of nonzerodiagonal elements. For other matrices, especially for large matrices, the rankmay not be so easily accessible.

We have seen that, given a matrix A, we can associate three subspaces and twodimensions to A. We have the null space Null(A), the column space Col(A),the row space Row(A), the dimension dim(Null(A)), which is the nullity ofA, and dim(Co!(A)) = dim(Row(A)) = r(A), the rank of A. But this is notthe end of the story. That is because we can naturally associate other matricesto A. Namely, given A, we have the conjugate of A, A; the transpose of A, AT;and the conjugate transpose, A* _ (A)T. This opens up a number of subspacesthat can be associated with A:

Null(A) Col(A) Row(A)

Null(A) Col(A) Row(A)

Afull(AT) Col(AT) Row(AT)

NUll(A*) Col(A*) Row(A*).

Fortunately, not all 12 of these subspaces are distinct. We have Col(A) =Row(AT ), Col(A) = Row(A*), Col(AT) = Row(A), and Col(A*) = Row(A).Thus, there are actually eight subspaces to consider. If we wish, we can elim-inate row spaces from consideration altogether and just deal with null spacesand column spaces. An important fact we use many times is that rank(A) =rank(A*) (see problem 15 of Exercise Set 8). This depends on the fact thatif there is a dependency relation among vectors in C', there is an equivalentdependency relation among the vectors obtained from these vectors by tak-ing complex conjugates of their entries. In fact, all you have to do is takethe complex conjugates of the scalars that effected the original dependencyrelationship.

We begin developing a heuristic picture of what is going on with the followingdiagram.

3.1 Fundamental Subspaces

Cn

Amxn, A

FAT, AT = A*

Figure 3.1: Fundamental subspaces.

Exercise Set 8

105

Cm

1. Give the rank and nullity of the following matrices:

1 0 0 2 0 0

0 1 0 , 0 3

],0

1 ], [ 1 a b 0

0 0 1 0 0 01 i O c d l

2. Let A = f 2 2+2i 2-2i 4

.

Fin d th e ra nk of A.3+3i 6i 6 6 +61 JJAlso, compute AA* and A*A and find their ranks.

3. If A E C","" argue that r(A) < m and r(A) < n.

4. Fill in the proofs of Theorems 3.1, 3.2, and 3.3.

5. Argue that if A is square, A is invertible iff A*A is invertible.

6. Prove that Col(AA*) = Col(A), so c-rank(AA*) = c-rank(A).

7. Prove that c-rank(A) = c-rank(A) and r-rank(A) = r-rank(A).

8. Argue that Null(A)f1Col(A*) = (') andNull(A*) f1Col(A) = (76).

9. Let B be an m-by-n matrix. Argue that the rows of B are dependent inC" if there is a nonzero vector w with wB = '. Also the columns aredependent in C"' iff there is a nonzero vector w with Bw = e.

10. Consider a system of linear equations Ax = b where A E C"'"", X EC", and b E C01"'. Argue that this system is consistent (i.e., has a


solution) iii rank(A) = rank([A I bO, where [A I b] is the augmentedmatrix (i.e., the matrix obtained from A by adding one more column,namely b).

11. Define a map J : C" - C" by J(xi, x2, ... , (xi, x2, ... , x").That is, J just takes the complex conjugate of each entry of a vector.Argue that J2 = I. Is J a linear map? How close to a linear map is it?Is J one-to-one? Is J onto? If M is a subspace of C", argue that J(M) isa subspace and dim(J(M)) = dim(M). If A is a matrix in C"', whatdoes J do to the fundamental subspaces of A?

12. Suppose A is an n-by-n matrix and v is a vector in C". We defined theKrylov matrix IC,(A, v) = [v I Av I A2v I I A`-'v]. Then the Krylovsubspace K.c(A, v) = span{v, Av,... , A`v} = Co1(K,(A, v)). Sup-pose Ax = b is a linear system with A invertible. Suppose deg(µA) = m.Argue that the solution to Ax = b lies in K,,,(A, b). Note, every x inK(A, b) is of the form p(A)b, where p is a polynomial of degree m - Ior less.

13. Make up a concrete example to illustrate problem 12 above.

14. Argue that Null(A) iff the columns of A are independent.

15. Prove that r(A) = r(AT) = r(A*) = r(A).

16. Argue that AB = 0 iff Col(B) c_ Null(A), where A and B are con-formable matrices. Recall "conformable" means the matrices are of asize that can be multiplied.

17. Prove that A2 = 0 iff Col(A) c Null(A). Conclude that if A2 = 0,then rank(A) < z if A is n-by-n.

18. Let Fix(A) = {x I Ax = x}. Argue that Fix(A) is a subspace of C"I I 1

when A is n-by-n. Find Fix ( 0 2 3 ).

0 3 2

19. Is W = ((z, z, 0) 1 Z E C) a subspace of C3?

20. What is the rank of1 2

4 5

7 8

3

69

? What is the rank of

1 2 3 4

5 6 7 8 '? Do y ou see a patte rn? Can you generalize?9 10 11 12

13 41 15 16

3. / Fundamental Subspaces 107

21.

22.

23.

24.

25.

26.

Let S= ab

J

-bc

a, b, c C). Is S a subspace of 2 x 2

Argue that rank(AB) < min(rank(A), rank(B)) and rank(B) _rank(-B).

Prove that if one factor of AB is nonsingular, then the rank of AB is therank of the other factor.

Argue that if S and T are invertible, then rank(SAT) = rank(A).

Prove that rank(A + B) < rank(A) + rank(B) and rank(A + B) >Irank(A) - rank(B)I.

Argue that if A E C'"'" and B E C"xm where n < n, then AB cannotbe invertible.

27. For a block diagonal matrix A = diag[A11, A22, ... , Akk], argue thatk

rank(A) = >rank(A;;).

28. Suppose A is an n-by-n matrix and k is a positive integer. Argue thatJVull(Ak-1) e_ Null(Ak) e_ Null(Ak+' ). Let {b,, ... , b,} be a basisofl%full(Ak_l); extend this basis toAful/(A")and get {b,, ... , b,, Cl, C2,... ,c, } . Now extend this basis to get a basis {b, , ... , br, CI, , . . . , C ,s,d, , ... , d, } of Null (Ak+' ). Argue that {b, , ... , b Ad,, ... , Ad,) isa linearly independent subset of Null(Ak).

29. Argue that Col(AB) = Col(A) iff rank(AB) = rank(A).

30. Prove that Null(AB) = Null(B) iff rank(AB) = rank(B).

3 1 . (Peter Hoffman) Suppose A , , A2, ... , An are k-by-k matrices over Cwith A, + A2 + + A" invertible. Argue that the block matrix

A, A2 A3 ... An 0 ... 0® A, A2 ... An-, A.

® 0 ... A, A2 ... ...A,,

What is this rank?

has full rank.

32. (Yongge Tian) Suppose A is an m-by-n matrix with real entries. What isthe minimum rank of A + iB, where B can be any m-by-n matrix withreal entries?

33. Prove that if AB = 0, then rank(A) + rank(B) < i, where A andB E C "


34. Suppose A is nonsingular. Argue that the inverse of A is the unique matrix

X such that rank [ X ] = rank(A).

35. If A is m-by-n of rank m, then an LU factorization of A is unique. Inparticular, if A is invertible, then LU is unique, if it exists.

36. Suppose A is n-by-n.

(a) If rank(A) < n - 1, then prove adj(A) = 0.(b) If rank(A) = it - I, then prove rank(adj(A)) = 1.(c) If rank(A) = it, then rank(adj(A)) = it.

37. Argue that Col(A + B) e Col(A) + Col(B) and Col(A + B) =Col(A)+Col(B) iff Col(A) C Col(A + B). Also, Col(A + B) = Col(A)+Col(B)iffCol(B) g Col(A + B).

38. Prove that Col([A I B]) = Col(A) + Col(B) so that if A is m-by-n andB is m-by-p, rank([A I B]) < rank(A) + rank(B).

39. Suppose A E C"", X E C""t', and AX = 0. Argue that the rank of Xis less than or equal to n - r.

40. Suppose A E C"""' and m < n. Prove that there exists X # 0 such thatAX=0.

41. Prove the Guttman rank additivity formula: suppose M = ICA B

D Jwith det(A) # 0. Then rank(M) = rank(A) + rank(M/A).

Further Reading


[Lieheck, 19661 H. Liebeck, A Proof of the Equality of Column and RowRank of a Matrix, The American Mathematical Monthly, Vol. 73, (1966),1114.

[Mackiw, 1995] G. Mackiw, A Note on the Equality of Column and RowRank of a Matrix, Mathematics Magazine, Vol. 68, (1995), 285-286.


3.1.1 MATLAB Moment

3.1.1.1 The Fundamental Subspaces

MATLAB has a built-in function to compute the rank of a matrix. The com-mand is

rank(A)

Unfortunately, MATLAB does not have a built-in command for the nullity ofA. This gives us a good opportunity to define our own function by creating anM-file. To do this, we take advantage of the rank plus nullity theorem. Here ishow it works.

Assuming you are on a Windows platform, go to the "File" menu, choose"New" and "M-file." A window comes up in which you can create your functionas follows:

1 function nlty= nullity(A)2- [m,n]= size(A)

3- nlty = n - rank(A).

Note that the "size" function returns the number of rows and the number ofcolumns of A. Then do a "Save as" nullity.m, which is the suggested name.Now check your program on

1 -6 1 4

A = -2 14 1 -92 -6 11 5

>> A=[ 1 -6 14;-2 14 1 -9;2 -6 11 5]

1 -6 1 4A= -2 14 1 -9

2 -6 11 5

>> rank(A)

ans =

2

>> nullity(A)

ans =

2

Now try finding the rank and nullity of

1 +i 2+2i 3+iB= 2+2i 4+4i 9i

3+3i 6+6i 8i


It is possible to get MATLAB to find a basis for the column space and a basisfor the null space of a matrix. Again we write our own functions to do this. Weuse the rref command (row reduced echelon form), which we will review later.You probably remember it from linear algebra class. First, we find the columnspace. Create the M-file

I function c = colspace(A)2- [m,n] = size(A)3- C = [A' eye(n)]4- B = rref(C)'5- c = B([I:m],[1:rank(A)]);

which we save as colspace.m. Next we create a similar function to produce abasis for the nullspace of A, which we call nullspace.m.

I function N = nullspace(A)2- [m,n] = size(A)3- C = [A' eye(n)];4- B = rref(C)'5- N = B([m+l:m+n],[rank(A)1+1:n]);

Now try these out on matrix A above.

>> A=[ 1 -6 1 4;-2 14 1 -9;2 -6 11 51

I -6 1 4A = -2 14 1 -9

2 -6 11 5

>> colspace(A)

ans =

1 0

0 1

8 3

>> format rat>>nullspace(A)

ans =I 00 1

-1/13 -2/13-3/13 20/13

Now, determine the column space and nullspace of B above.Finally we note that MATLAB does have a built-in command null(A), which

returns an orthonormal basis for the nullspace, and orth(A), which returns an

3.2 A Deeper Look at Rank I I I

orthonormal basis for the column space. A discussion on this is for a laterchapter.

By the way, the World Wide Web is a wonderful source of M-files. Just goout there and search.

Further Reading

[L&H&F, 1996] Steven Leon, Eugene Herman, Richard Faulkenberry,ATLAST Computer Exercises for Linear Algebra, Prentice Hall, UpperSaddle River, NJ, (1996).

Sylvester's rank formula, Sylvester's law of nullity,the Frobenius inequality

3.2 A Deeper Look at RankWe have proved the fundamental result that row rank equals column rank.

Thus, we can unambiguously use the word "rank" to signify either one of thesenumbers. Let r(A) denote the rank of the matrix A. Then we know r(A) =r(A*) = r(AT) = r(A). Also, the rank plus nullity theorem says r(A) +nlty(A) = the number of columns of A. To get more results about rank, wedevelop a really neat formula which goes back to James Joseph Sylvester(3 September 1814 - 15 March 1897).

THEOREM 3.7 (Sylvester's rank formula)Let A E Cl " and B E C""P. Then

r(AB) = r(B) - dim(Null(A) fCol(B)).

PROOF Choose a basis of Null (A) fl Col(B), say {bi , b2, ... , b, }, and ex-tend this basis to a basis of Col (B). Say B = {b,, b2, ... , b c, , c2, ... , c, } isthis basis for Col(B). We claim that {Ac, , Ace, ... Ac, } is a basis for Col(AB).As usual, there are two things to check. First, we check the linear indepen-

1 12 Subspaces Associated to Matrices

dence of this set of vectors. We do this in the usual way. Suppose a linearcombination a,Ac, + a2Ac2 + + a,Ac, = 6. Then, A(a,c, +a2c2 +

+ a,c,) = '. This puts a,c, + a2c2 + + a,c, in the null spaceof A. But the cs are in Col(B), so this linear combination is in there also.Thus, a,c, + a2c2 + + arc, E A(ull(A) fl Col(B). But we have a ba-sis for this intersection, so there must exist scalars Ri , (32, ... , (3, so thata, c, + a2c2 + + atc, =Ribs +02b2 + + R,b,. But then, a i ci + a2c2 +. . + a,c, - R,b, - 02b2 - - R,b,. = -6. Now the cs and the bs togethermake up an independent set so all the scalars, all the as, and all the Ps, must bezero. In particular, all the a's are zero, so this establishes the independence of{Aci, Ace,... Ac,}.

Is it clear that all these vectors are in Col(AB)? Yes, because each ci is inCol(B), so ci = Bxi for some xi; hence Ac, = ABxi E Col(AB). Finally, weprove that our claimed basis actually does span Col(A B). Let y be in Col(A B).Then y = ABx for some x. But Bx lies in Col(B) so Bx = aibi + a2b2 +

then, y=ABx=a,Ab,+a2Ab2+_iT+a,,Ab, +a,.+,Ac, +...+a,+tAc, = +a,+,Ac, +...+a,+tAc,.

Thus, we have established our claim. Now notice t = dim(Col(AB)) = r(AB)and r(B) = dim(Col(B)) = s + t = dim(Null(A) fl Col(B)) + r(AB). Theformula now fellows. 0

The test of a good theorem is all the consequences you can squeeze out of it.Let's now reap the harvest of this wonderful formula.

COROLLARY 3.2For AEC"'X"and BEC' ,

nlty(AB) - nlty(B)+dim(Null(A) flCol(B)).

PROOF This follows from the rank plus nullity theorem.

COROLLARY 3.3For A E C'nxn and B E C"xn, r(AB) < min(r(A), r(B)).

0

PROOF First, r(AB) = r(B) - dim(Null(A) fl Col(B)) < r(B). Alsor(AB) = r((AB)T) = r(BT AT) < r(AT) = r(A). 0

COROLLARY 3.4For A E C"' x" and B E CnXP, r(A) + r(B) - n < r(AB) < min(r(A), r(B)).

3.2 A Deeper Look at Rank 113

PROOF First, Null(A)fCol(B) c A(ull(A) so dim (Arull(A) fl Col(B)) <dim(Null(A)) = nlty(A) = n - r(A) so r(AB) = r(B) - dim(Null(A) nlCol(B)) > r(B) - (n - r(A)) so r(AB) > r(B) + r(A) - it.

COROLLARY 3.5Let A E C"' x"; then r(A*A) = r(A).

PROOF It takes a special property of complex numbers to get this one. Letx E Null(A*) fl Col(A). Then A*x = -6 and x = Ay for some y. But thenx*x = y* A*x = -6 so E Ix; 1Z = 0. This implies all the components of x arezero, so x must be the zero vector. Therefore, JVull(A*) fl Col(A) _ (v), andso r(A*A) = r(A)- dim(Null(A*) fl Col(A)) = r(A). 0

COROLLARY 3.6Let A E Cm x'1; then r(AA*) = r(A*).

PROOF Replace A by A* above. 0

COROLLARY 3.7Let A E C"; then Col(A*A) = Col(A*) and JVull(A*A) = JVull(A).

PROOF Clearly Col(A*A) C Col(A*) and Null(A) e Afull(A*A). Butdim(Col(A*A)) = r(A*A) = r(A) = r(A*) = dim(Col(A*)). Also,dim(Null(A)) = n - r(A) = n - r(A*A) = dim(JVull(A*A)) soArull(A*A)= Null(A). 0

COROLLARY 3.8Let A E C'nxn; then Col(AA*) = Col(A) and JVull(AA*) = JVull(A*).

PROOF Replace A by A* above. 0

Next, we get another classical result of Sylvester.

COROLLARY 3.9 (Sylvester's law of nullity [1884])For square matrices A and B in C"',

max(nlty(A), nlty(B)) < nlty(AB) < nlty(A) + nlty(B).


PROOF FirstArull(B) c J%Iull(AB)sodint (Arull(B)) < dint (Null(AB))and so nlty(B) < nlty(AB). Also, nlty(A) = it - r(A) < it - r(AB)nlty(AB).Therefore, max(nlty(A),nlty(B)) <nlty(AB).Also, by Corollary3.4, r(A)+r(B)-n < r(AB)= n-nlty(AB)son-nlty(A)+n-nlty(B)-n < n - nlty(AB). Canceling the its gives the "minus" of the inequality wewant. Thus, nlty(AB) < nlty(A) + nlty(B). 0

Another classical result goes back to F. G. Frobenius, whom we have previ-ously mentioned.

COROLLARY 3.10 (the Frobenius inequality 11911DAssume the product ABC exists. Then r(AB) +r(BC) < r(B) +r(ABC).

PROOF Now Col(BC)nA(ull(A) C Col(B)tlNull(A)so dim(Col(BC)flNull(A)) < dim(Col(B) fl A(ull(A)). But ditn(Col(BC) fl NUII(A)) =r(BC) - r(ABC) by Sylvester's formula. Also, dim(Col(B) fl JVull(A)) =r(B) - r(AB) so r(BC) - r(ABC) < r(B) - r(AB). Therefore, r(BC) +r(AB) < r(B) + r(ABC). 0

There is one more theorem we wish to present. For this we need some notation.The idea of augmented matrix is familiar. For A, B in C"', define [A:B] asthe nt-by-2n matrix formed by adjoining B to A on the right.

A

Similarly, define to be the 2m-by-n matrix formed by putting BB

under A.

THEOREM 3.8A

Let A and B be in 0' Then r (A + B) = r (A) + r(B) - dint (Col( )flB

AIull([1",:1])) - dim(Col(A*) flCol(B*)). In particular r(A + B) < r(A) +r(B).

A

PROOF We note the A + B = [I,,,:1,,,]B

3.2 A Deeper Look at Rank 115

A A

so r(A + B) = r( ... ) - dim(Col( ) (1 ButB B

A A

r( ... ) = r( ... ) = r([A*:B*]) = dim(Col([A*:B*]) _B B

dim(Col(A*) + Col(B*)) = dim(Col(A*)) + dim(Col(B*) - dim(Col(A*) flCol(B*)) = r(A*) + r(B*) - dim(Col(A*) fl Col(B*)) = r(A) + r(B) -dim(Col(A*) fl Co!(B*)) using the familiar dimension formula. 0

Exercise Set 9

1. Can you discover any other consequences of Sylvester's wonderfulformula?

2. Consider the set of linear equations Ax = b. Then the set of equationsA*Ax = A*b are called the normal equations.

(a) Argue that the normal equations are always consistent.(b) If Ax = b is consistent, prove that Ax = b and the associated

normal equations A*Ax = A*b have the same set of solutions.(c) IfNull(A) the unique solution to both systems is (A*A)-I

A*b.

3. In this exercise, we develop another approach to rank. Suppose A EemXn

r

(a) Suppose A is m-by-p and B is m-by-q. Argue that if the rows of Aare linearly independent, then the rows of [AB] are also linearlyindependent.

(b) If the rows of [A:B] are dependent, argue that the rows of A arenecessarily dependent.

(c) Suppose rank(A) = r. Argue that all submatrices of order r + Iare singular.

(d) Suppose rank(A) = r. Argue that at least one nonsingular r-by-rsubmatrix of A exists.

(e) If the order of the largest nonsingular submatrix of A is r-by-r, thenargue A has rank r.


4. (Meyer, 20W] There are times when it is very handy to have a basis forNull(A) fl Col(B). Argue that the following steps will produce one.

(a) Find a basis forCol(B), say {XI, x2, ... , xr}.(b) Construct the matrix X in C""r; X = [x, I X2 I . . . I x.](c) Find a basis forAfull(AX), say { v i , v2, ... , v,.}.(d) Argue that B = (Xv1, Xv2,... Xv,} is a basis for Null(A) fl

Col(B). (Hint: Argue that Col(X) =Col(B) and Mull(X)then use Sylvester's formula.)

All A12 ... AIk0 A22 ... A2k5. Prove the rank( Erank(Aii). Compute

r-i® 0 ... AAA

0 I 0 0

I n 0 1 0the rank of A = and com are it to the ranks of

0 0 0 0p

0 0 0 1

the 2-by-2 diagonal blocks. What does this tell you about the previousinequality?

6. Suppose A is n-by-n and invertible and D is square in M = ICA B

DArgue that rank(M) = n iff the Schur complement of A in M is zero.

7. If A is m-by-n and B is n-by-m and rn > n, argue that det(AB) = 0.

8. Argue that the linear system Ax = b is consistent iff rank[A I b]rank(A).

9. Suppose that rank(CA) = rank(A). Argue that Ax = b and CAx = Cbhave the same solution set.

10. Suppose A is m-by-n. Argue that Ax = 0 implies x = iff m > nand A has full rank.

11. Prove that Ax = b has a unique solution iff m > n, the equation isconsistent, and A has full rank.

12. Argue that the rank of a symmetric matrix (or skew-symmetric matrix)is equal to the order of its largest nonzero principal minor. In particular,deduce that the rank of a skew-symmetric matrix cannot be odd.

13. Let T be a linear map from C" to C0", and let M be a subspace of C".Argue thatdim(T(M)) = dim(M)- dim(Mfl Ker(T)) so, in particular,dim(T(M)) < dim(M).

3.3 Direct Sums and Idempotents 117

14. Suppose Ti and T2 are linear maps from C" to C"'. Argue that

(a) Ker(Tj) fl Ker(T2) e Ker(T1 + T2).(b) Im(T1 + T2) c Im(T1)+Im(T2).(c) Irank(Ti) - rank(T2)I < rank(Ti + T2) < rank(Ti) + rank(T2).

15. Argue that Ax = b has no solution if rank(A) # rank([A I b]). Other-wise, the general solution has n - rank(A) free variables.

16. For A,Bin C""", argue that r(AB-I)<r(A-/)+r(B-I).

17. For A, B in Cm"n suppose B*A = 0. Prove that r(A + B) = r(A) +r(B) < n.

18. For matrices of appropriate size, argue that

r([ C D ]) < r(A) + r(B) + r(C) + r(D).

19. If A and rB E C""",, argue that

rank([ A BJ) > rank(A) + rank(B).

Further Reading

[Meyer, 2000] Carl Meyer, Matrix Analysis and Applied Linear Algebra,SIAM, Philadelphia, (2000).

[Zhang, 1999] Fuzhen Zhang, Matrix Theory: Basic Results and Tech-niques, Springer, New York, (1999).

complementary subspaces, direct sum decomposition,idempotent matrix, projector, parallel projection

3.3 Direct Sums and IdempotentsThere is an intimate connection between direct sum decompositions of C"

and certain kinds of matrices. This correspondence plays an important role inour discussion of generalized inverses. First, let's recall what it means to have a


direct sum decomposition. Let M and N be subspaces of C". We say M and Nare disjoint when their intersection is as small as it can be, namely, m fl N -

We can always form a new subspace M + N = (x I x = m + n wherem c M and n E N}. Indeed, this is the smallest subspace of C" containingboth M and N. When M and N are disjoint and M + N = C", we say M andN are complementary subspaces and that they give a direct sum decompositionof C. The notation is C" = M (D N for a direct sum decomposition. What isnice about a direct sum decomposition is that each vector in C" has a uniquerepresentation as a vector from M plus a vector from N. For example, considerthe subspace M, = {(z, 0) 1 z E C) and M2 = {(0, w) I W E C). Clearly, anyvector v = (z, w) in C'- can be written as v = (z, 0)+(0, w), so C2 = M, +M2.Moreover, if V E M, fl M2, the second coordinate of v is 0 since v E M1, and thefirst coordinate of v is 0 since v E M2, so v = (0, 0) = 6. Thus C2 = M, ®M2.

This example is almost too easy. Let's try to be more imaginative. This time,let M, = {(x, y, z) I x + 2y + 3z = 0} and M2 = {(r, s, s) I r, s E C).These are indeed subspaces of C3 and (-5, 1, 1) EM, fl M2. Now any vectorv = (x, y, z) E C3 can be written

(x, y, z) = x,-1

Sx+ 3sy- 3Sz,-

1Sx- 2Sy+ 2Sz

+ (0, s(x +2y+ 3z), 5(x +2y +3z)

so C3 = M, + M2, but the sum is not direct.Can we extend this idea of direct sum to more than two summands? What

would we want to mean by C" = M, ® M2 ® M3? First, we would surely wantany vector v E C" to be expressible as v = v, + v2 + v3, where v, E M1,V2 E M2, and V3 E M3. Then we would want this representation to be unique.What would it take to make it unique? Suppose v = V1 +v2+v3 = wi +W2+W3,where v w; E M, for i = 1, 2, 3. Then v, - w, = (w2 - v2) + (w3 - v3) EM2 + M3. Thus v, - w, EM, fl (M2 + M3). To get v, - w, we need

M2 flto get v2 = w2 and M3 fl (M, + M2) = (6) to get v3 = W3. Now that we havethe idea, let's go for the most general case.

Suppose MI, M2, ... , Mk are subspaces of C". Then the sum of these sub-k

spaces, written M, + M2 + + Mk = r_M1, is defined to he the collection ofi=i

all vectors of the form v = v, +v2+ +vk, where v; E M; f o r i = 1, 2, ... , k.

THEOREM 3.9k

Suppose M,, M2, ... , Mk are subspaces of C". Then EM; is a subspace of

C". Moreover, it is the smallest subspace of C" containing all the M;, i = 1, 2,k k

k. htdeed, > M1 = span( U M,).i=1 i_1

3.3 Direct Sums and Idempotents

PROOF The proof is left as an exercise.

Now for the idea of a direct sum.

THEOREM 3.10Suppose Mi, M2, ... , Mk are subspaces of C". Then TA.E.:

119

a

k

1. Every vector v in E Mi can be written uniquely v = vt + v2 + + vki=1

where vi E Mi for i = 1, 2, ... , k.

k

2. If vi = with v; E M, for i = 1, 2, ... , k, then each vi _ fori=t

i = 1, 2, ... k.

3. For every i = 1, 2, ... , k, Mi fl (>2 M1)i#i


k k

We write ® Mi for Mi, calling the sum a direct sum when any one (andi=t i=t

hence all) of the conditions of the above theorem are satisfied. Note that condi-tion (3) above is often expressed by saying that the subspaces MI, M2, ... , Mkare independent.

COROLLARY 3.11k

Suppose M1, M2, ... , Mk are subspaces of C". Then C" = ® Mi iff C" _i=

k

Mi and for each i = 1, 2, ... , k, Mi fl (EM1)i=1 1#i

Now suppose E is an n-by-n matrix in C" (or a linear map on C") withthe property that E2 = E. Such a matrix is called idempotent. Clearly, !"and O are idempotent, but there are always many more. We claim that eachidempotent matrix induces a direct sum decomposition of C". First we notethat, if E is idempotent, so is ! - E. Next, we recall that Fix(E) = (v E C" Iv = Ev}. For an idempotent E, we have Fix(E) = Col(E). Also Null(E)= Fix(! - E) and Null(!- E) = Fix(E). Now the big claim here is thatC" is the direct sum of JVull(E) and Col(E). Indeed, it is trivial that any x in(C" can be written x = Ex + x - Ex = Ex + (! - E)x. Evidently, Ex isin Col(E) and E((! - E)x) = Ex -EEx = -6, so (I - E)x is in AIull(E).


We claim these two subspaces are disjoint, for ifz E Col(E) fl Afull(E), thenz = Ex for some x and Ez = 6. But then, -6 = Ez = EEx = Ex = z. Thisestablishes the direct sum decomposition. Moreover, if B = {b, , b2, ... , bk } is

a basis for Col(E) = Fix(E) and C = { c 1 , c2, ... , Cn-k } is a basis of Mull (E),then S UC is a basis for C", so the matrix S = [b,

I ... I bk I C1 I I cn-k 1 isinvertible. Thus, ES = [Eb,

I . . I Ebk I Ec, I I Ecn-k] = [bl I I bk I

V 1...1-61=[BI®], andsoE=[BI®]S-',where B=[b, I...IbkMoreover, if F is another idempotent and Null(E) = Null(F) and Col(E) _Col(F), then FS = [B 101, so FS = ES, whence F = E. Thus, there is onlyone idempotent that can give this particular direct sum decomposition.

Now the question is, suppose someone gives you a direct sum decompo-sition C" = M ® N. Is there a (necessarily unique) idempotent out therewith Col(E) = M and Null(E) = N? Yes, there is! As above, select abasis {b,, b2, ... , bk} of M and a basis {c,, c2, ... , of N, and formthe matrix S = [B I C]. Note {b,,b2, ... bk, c,, c2, ... , is a basisof C", B E C" and C E C""("-k). Then define E = [B I ®nx(n-k)]S-1 =

r[B I Cl I lk ®"x("-k)

J

S-1 = SJS-'.ThenE2 = SJS-1SJS-1L ®(n-k)xk ®(n-k)x(n-k)

= SJJS-' = SJS-) = E, so E is idempotent. Also we claim Ev = v ifv E M, and Ev = 6 iffy E N. First note m E M iff m = Bx for some x in1Ck.

Then Em = [B I C]J[B I C]-'Bx = [B I C]J[B I C]-)[B I Cl IX J

=_6

r[B I C]J I ]=[BIC][ -x 1 = Bx = m. Thus M C Fix(E). Next if

n E N, then n = Cy for some y in C"-k. Then En = [B I C]J[B I C1`[B I

C]L

] = [B I C]J IJ

= [B I C] = 0 . Since E is idem-

potent, C" = Fix(E) e Null(E) = Col(E) ® Null(E). Now M CFix(E)and k = dim(M) = rank(E) = dim(Fix(E)) so M = Fix(E). A dimensionargument shows N = Null(E) and we are done.

There is some good geometry here. The idempotent E is called the projec-tor of C" onto M along N. What is going on is parallel projection. To see

1 i 1

this, let's look at an example. Let E = T J. Then E = E2, so Ez z

must induce a direct sum decomposition of C2. We check that v = I X

JY

is fixed by E iff xrr = 1y and Ev = 0 iff y = -x. Thus, C2 = M

N, where M = (L X J

I x = y ) and N = (L X J

I y = -x }. They y

unique representation of an arbitrary vector isf xy _

L

_ f 2x +21

y +L zx + Zy


X + ZWe now draw a picture in R22 to show the geometry lurking in

[ 'X -the background here.

Figure 3.2: Parallel projection.

Of course, these ideas extend. Suppose C" = M, ® M2 ® M3. Select bases13i = [bi, , b12, ... , bik, ] of Mi for i = 1, 2, 3. Then 13, U 132 U 133 is a ba-sis for C". Form the invertible matrix S - bi,t, I b2ib22 .. b2k2 I

b3lb32.. b3k,].FormEl = [bii I biz .. I bik, I

®1S- = S f /® ®1 S ,® ® ® J

E2 = [® I b21 I b22 I I b2U2 I ®] S- l = S ® Ik2 ® S', and

rl® 0 01

E3 = [ ® I b31 I b32 I I b3k,] = S I 0 ®J S-1. Then E, + E2 +1 '

E3=SI /k' ®JS-,+S ® 'k2 0 S-I+SL ® ® JS-t _L Ik'

JIL0 0

J ® 0 0


11O O O

SI S 1 = I,, . Also, E, E2 = S [ ® ®J

S ' S 0 1k. 0 S

OSOS-, = 0 = E2E1. Similarly, E,E3 = 0 = E3E, and E3E2 = 0 = E2E3.Next, we see E, = E, t = E, (E, + E2 + E3) = E, E, +0+0 = E2 .

Thus, E, is idempotent. A similar argument shows E2 and E3 are also idem-potent. The language here is that idempotents E and F are called orthogonaliff EF = 0 = FE. So the bottom line is that this direct sum decompositionhas led to a set of pairwise orthogonal idempotents whose sum is the iden-tity. Moreover, note that Col(E1) = Col([b11 I b12 I b,k, I (D]S-1) _Col([b I b12 I ... I blk, I O]) = b17, , blk, }) = M, . Simi-larly, one shows Col(E2) = M2 and Col(E3) = M3. We finish by summarizingand extending this discussion with a theorem.

THEOREM 3.11Suppose M1, M2, ... , Mk are subspaces of C" with C" = M,®M2®...®Mk.Then there exists a set o f idempotents (El, E2, ... , Ek ) such that

1. E;Ej =0ifi $ j.

2.

3. Col(E,)=M,fori=1,2,...,k.

Conversely, given idempotents { El, E2, ... , Ek } satisfying (1), (2), and (3)above, then C" = M, ® M2 s ... ED Mk .

PROOF In view of the discussion above, the reader should be able to fill inthe details. 0

To understand the action of a matrix A E O"", we often associate a directsum decomposition of C" consisting of subspaces of C", which are invariantunder A. A subspace M of C" is invariant under A (i.e., A-invariant) iff A(M) cM. In other words, the vectors in M stay in M even after they are multiplied byA. What makes this useful is that if we take a basis of M, say m1,m2, ... , mk,and extend it to a basis of C", 5 = (m1, m2, .... mk, 9---,... , b"), we canform the invertible matrix S. whose columns are these basis vectors. Thenthe matrix S'1 AS has a particularly nice form; it has a block of zeros in it.Let us illustrate. Suppose M = spun{ml, m2, m3), where m,, m2, and m3are independent in C5. Now Am, is a vector in M since M is A-invariant.Thus, Am, is uniquely expressible as a linear combination of ml, m2, and m3.Say

Am, =a, , m, + a21 m2 + a31 m3.


Similarly,

Am2 = a12m1 + a22m2 + a32m3

Am3 = a13m1 + a23m2 + a33m3

Ab4 = a14m1 + a24m2 + a34m3 + 044b4 + 1354b5

Ab5 = a15m1 + a25m2 + a35m3 + 1345b4 + 1355b5

Since Ab4 and Ab5 could be anywhere in C5, we may need the entire basisto express them. Now form AS = [Am, I Am2 I Am3 I Ab4 I Ab51 =

[allm1 +a12m2+a13m3 I a21 m1 +a22m2+a23m3 I a31m1 +a32m2+a33m3

IAb4IAb5]=[ml Im2Im3Ib4Ib5]

all a12 a13 a14 a15

a21 a22 a23 a24 a25

a3I a32 a33 a34 a35

0 0 0 044 R45

0 0 0 054 055

all a12 a13 a14 a15

a21 a22 a23 a24 a25

S a3I a32 a33 a34 a35_

SAl A2

® A3Jo 0 0 1344 045

o 0 0 054 055

It gets even better if C is the direct sum of two subspaces, both of whichare A-invariant. Say 0 = M ® N, where M and N are A-invariant. Take abasis of M, say 5 = {bl, ... , bk } and a basis of N, say C = {ck+l, ... , c}.Then 13 U C is a basis of C' and rS = [b1 I I bk I ek+l I ... I cn] is an

invertible matrix with S-l AS = I Al ®J .

Let's illustrate this important

idea in C5. Suppose C5 = Ml ® M2, where A(Ml) c Mi and A(M2) c M2.In order to prepare for the proof of the next theorem, let's establish somenotation. Let Bl = {bill, bz ), b31} be a basis for MI and B2 = (b12j, b2 )}be a basis for M2. Form the matrices BI = [b(1I) I b2 I b3 )] E CSX3 andB2 = [bill 1621] E CSX2. Then AM; = ACol(B1) a Col(B;) for i = 1, 2.Now A acts on a basis vector from Z31 and produces another vector in MI.


Therefore,

a,Ab(i' = a bc" + Abu' + y,b(I' = [b(I1) I b" I bc'']

1 1 (3 2 3 2 3

Y' Ja')

Ab2' = 02bi" + [3262" + 1'zb3 = lbiI' I b2' I b3 1

[32

Y2

a3

Ab3' = a3b') + 13b2' + Y3b3' - [b" 1 b2' I b3)] R3

Y3

Form the matrix A, =a, a2 a3

R' 02 03

YI Y2 Y3

AB,=[Abi"IAb2'IAb2']=

= B,

= B,

= B,

We compute that

a, a2 a3

B, 13' I B,02

I B, [031] = B, A,.Yz Y3Y '

A similar analysis applies to A acting on the basis vectors in B2;

Ab12) = X b(2' + , b(2) = [b(2) I b(2']' i ' µ z ' z

Ab2'`, = X2biz' +µ2b2

=[biz' I b2

)]

XI

µ'

X2

µ2

= B2

= B2

Form the matrix Az = 1\' I\ 2 Again compute thatµ' µ2

ABz=A[bi2jIb2z']_[Abiz'IAb2']= 1B2L

X

' I B2L

X2

J J

=B2A2.

µ' J µz

Finally, form the invertible matrix S = [B, I B2] and compute

AS = A[B, I B21 =l [AB, I ABz] [BI A, I B2A21 = [B, I Bz]A®

0=S[ ® ® J

The more invariant direct summands you can find, the more blocks of zerosyou can generate. We summarize with a theorem.


THEOREM 3.12Suppose A E C"xn and C" = Mi®M2® ®Mk, where each M; isA -invariantfor i = 1, 2, ... , k. Then there exists an invertible matrix S E Cnxn such that

S-'AS = Block Diagonal [A,, A2, ... , Ak] =

AI 0 0 0

0 A2 000 0 ... ... At

PROOF In view of the discussion above, the details of the proof are safely leftto the reader. We simple sketch it out. Suppose that M; are invariant subspacesfor A for i = 1, 2, ... , k. Form the matrices B; E Cnx";, whose columnscome from a basis of M; . Then M; = Col (B; ). Note Abj = B; y(' ), wherey, E C'1' . Form the matrices A; = [y;1

I Y(i) I I E C' xd, Computethat AB; = B; A,, noting n = di + d2 + dk. Form the invertible matrixS = [Bi

I B2 I ... I Bk]. Then S-BAS = S-'[AB1 I AB2 I I ABk] =S-'[B1A1

I B2A2 I I BkAk] = S-'[B1 IB2

I I Bk]I Ai 0 0 01 r Al 0 0 ... 01

0 A2 0 ..: 1 1 0 A 2

0 .

0 0 ...

0L 0 0

Exercise Set 10

1. If M ®N = C", argue that dim(M) + dim(N) = n.

El

2. If M ®N = C", B is a basis for M, and C is a basis for N, then proveCi U C is a basis for C".

3. Argue the uniqueness claim when C" is the direct sum of M and N.

4. Prove that M + N is the smallest subspace of Cl that contains M and N.

5. Argue that if E is idempotent, so is I - E. Moreover, Col(I - E) _Null(E), and Nnll(1 - E) = Col(E).

6. Prove that if E is idempotent, the Fix(E) = Co!(E).

7. Verify that the constructed E in the text has the properties claimed for it.


8. Suppose E is idempotent and invertible. What can you say about E'?

9. Prove that

E

= E2 iffCol(E) f11Col(/ - E) _ (Vl).

10. ConsiderL

0 10

J, f 1 0J

, andL

J. Show that theseL S 5

matrices are all idempotent. What direct sum decompositions do theyinduce on C2?

11. Show that if E is idempotent, then so is Q = E + AE - EAE andQ = E - A E + EA E for any matrix A of appropriate size.

l_

J= i ]. Argue that E is idempotent and12. Let E

zsymmetric. What is the rank of E? What direct sum decomposition does

it induce on a typical vector'? Let E Discuss the same

issues for E. Do you see how to generalize to E n-by-n? Do it.

13. Recall from calculus that a function f : R -+ R is called even ifff (-x) = f (x). The graph of such an f is symmetric with respect to they-axis. Recall f (x) = x2, f (x) = cos(x), and so on are even functions.Also, f : ]l8 -+ lib is called odd iff f (-x) = -f (x). The graph of anodd function is symmetric with respect to the origin. Recall f (x) = x',f (x) = sin(x), and so on are odd functions Let V = F(R) be the vectorspace of all functions on R. Argue that the even (resp., odd) functionsform a subspace of V. Argue that V is the direct sum of the subspace ofeven and the subspace of odd functions. (Hint: If f : R -± R, fe(x)'[f(x) + f (-x)] is even and f °(x) = Z [ f (x) - f(-x)] is odd.)

14. Let V = C""", the n-by-n matrices considered as a vector space. Recalla matrix A is symmetric iff A = AT and skew-symmetric ill AT =-A. Argue that the symmetric (resp., skew-symmetric) matrices form asubspace of V and V is the direct sum of these two subspaces.

15. Suppose M1, M2, ... , Mk are nontrivial subspaces of C" that are inde-pendent. Suppose B is a basis of Mi for each i = 1 , 2, ... , k. Argue thatk k

U 5i is a basis of ®Mi.i=1 i_i

16. Suppose MI, M2, ... , Mk are subspaces of C" that are independent. Ar-gue that dim(Mi ® M2 ... ED Mk) = dim(Mi) + dim(M2) + +dim(Mk).


17. Suppose E2 = E and F = S-1 ES for some invertible matrix S. Arguethat F2 = F.

18. Suppose M1, M2, ... , Mk are subspaces of C". Argue that these sub-spaces are independent iff for each j, 2 < j < k, Mi n (M, + +Mi_1) = (76). Is independence equivalent to the condition "pairwisedisjoint," that is, Mi n Mi = (') whenever i # j?

19. Suppose E is idempotent. Is I + E invertible? How about I + aE?Are there conditions on a? Suppose E and F are both idempotent. Findinvertible matrices S and T such that S(E + F)T = aE + 13F. Are thereconditions on a and R?

20. Suppose A is a matrix and P is the projector on M along N. Argue thatPA = A iffCol(A) C M and AP = A iffMull(A) ? N.

21. Suppose P is an idempotent and A is a matrix. Argue that A(Col(P)) cCol(P) iff PAP = AP. Also prove that Nul!(P) is A-invariant ifPAP = PA. Now argue that A(Col(P)) C Col(P) and A(JVull(P)) cNUII(P) if PA = AP.

22. Prove the dimension formula: for M,, M2 subspaces of C", dim(M, +M2) = dim(MI) + di m(M2) - di m(M, n M2). How would this formularead if you had three subspaces?

23. Suppose M, and M2 are subspaces of C" and d i m(M, +M2) = di m(M, nM2) + 1. Argue that either M, C M2 or M2 c M,.

24. Suppose M1, M2, and M3 are subspaces of C". Argue that dim(M, nM2 nM3)+2n > dim(M,)+dim(M2)+dim(M3).

25. Suppose T:C" -+ C" is a linear transformation. Argue that dim(T(M))+dim(Ker(T) n M) = dim(M), where M is any subspace of C".

26. Prove Theorem 3.10 and Theorem 3.11.

27. Fill the details of the proofs of Theorem 3.13 and Theorem 3.14.

28. (Yongge Tian) Suppose P and Q are idempotents. Argue that (1) rank(P+Q) > rank(P - Q) and (2) rank(PQ + QP) > rank(PQ - QP).

29. Argue that every subspace of C" has a complementary subspace.

30. Prove that if M; n E Mi = ( ), then M; n Mi = (') for i # j, wherei#i

the Mi are subspaces of C".


31. Suppose p is a polynomial in C[x]. Prove thatCol(p(A)) andNull(p(A))are both A-invariant for any square matrix A.

32. Argue that P is an idempotent iff P' is an idempotent.

33. Suppose P is an idempotent. Prove that (I + P)" = I +(2" - 1) P. Whatcan you say about (I - P)?

34. Determine all n-by-n diagonal matrices that are idempotent. How manyare there?

Further Reading

[A&M, 2005] J. Arat%jo and J. D. Mitchell, An Elementary Proof that Ev-ery Singular Matrix is a Product of Idempotent Matrices, The AmericanMathematical Monthly, Vol. 112, No. 7, August-September, (2005),641-645.

[Tian, 2005] Yongge Tian, Rank Equalities Related to Generalized In-verses of Matrices and Their Applications, ArXiv:math.RA/0003224.

index, core-nilpotent factorization, Weyr sequence, Segre sequence,conjugate partition, Ferrer's diagram, standard nilpotent matrix

3.4 The Index of a Square Matrix

Suppose we have a matrix A in C""". The rank-plus-nullity theorem tells usthat

dim(Col(A))+ditn(Null (A)) = n.

It would be reasonable to suspect that

C" = Null(A) ® Col(A).

3.4 The Index of a Square Matrix 129

Unfortunately, this is not always the case. Of course, if A is invertible, then this

is trivially true (why?). However, if A =00

20 J ,then 2

JE Null(A) n

Col(A). We can say something important if we look at the powers of A. First,

it is clear that Mull(A) C Ifull(A2), for if Ax = 0, surely A2x = A(Ax) =

AO = 0. More generally, Arull(A)") C NUll(A)'+'). Also it is clear thatCol(A2) c Col(A) since, if y E Col(A2), then y = A2x = A(Ax) E Col(A).These observations set up two chains of subspaces in C" for a given matrix A:

(0) = Null(A°) c Null(A) C Null(A2) C ... C Mull(AP)C Null(AP+') C ...

and

C" = Col(A°) D Col(A) D Col(A2) D . . 2 Col(AP) 2 Col(A)'+1) 2 , .

Since we are in finite dimensions, neither of these chains can continue tostrictly ascend or descend forever. That is, equality must eventually occur inthese chains of subspaces. Actually, a bit more is true: once equality occurs, itpersists.

THEOREM 3.13Let A E C. Then there exists an integer q with 0 < q < n such that

C" = Mull(A") ®Col(Ag).

PROOF To ease the burden of writing, let Nk = Null(A"). Then we have

(0) = NO C N' C ... C cC". At some point by the dimension argumentabove, equality must occur. Otherwise dim(N°) < dim(N') < ... so it <dim(N"+'), which is contradictory. Let q be the least positive integer whereequality first happens. That is, Ng-' C Ng = Ng+1. We claim there is nothing

but equal signs in the chain from this point forward. In other words, we claimNg = Ng+i for all j = 1, 2, .... We prove this by induction. Evidently theclaim is true for j = 1. So, suppose the claim is true for j = k - 1. Weargue that the claim is still true for j = k. That is, we show Ng+(k-1) = Nq+k

Now Aq+kx = 0 implies Aq+1(Ak-'x) = 0. Thus, Ak-1x E Nq+1 = Nq so

AgAk-'x = 0, which says Aq+(k-1)x = 0, putting x in Nq+(k-1). Thus, Nq+k CNq+(k-1) e Nq+k. Therefore, Ng+(k-1) = Nq+k and our induction proof iscomplete. By the rank-plus-nullity theorem, n = dim(Ak) + dim(Col(Ak)) so


dim(Col(A")) = dim(Col(A"+1 )) = .... That is, the null space chain stopsgrowing at exactly the same place the column space chain stops shrinking.

Our next claim is that Ny n Col(A") = (0). Let V E Ny n Col(A").Then v = Ayw for some w. Then AZyw = Ay Ayw = Ayv = 0. ThusW E NZy = Ny+q = Ny+(q-1) = .. = N". Thus, 0 = Ayw = v. Thisgives us the direct sum decomposition of C" since dim(Ny + Col(A")) =dim(N")+dim(Col(Ay))-dim(NynCol(A")) =dim(N")+dim(Col(A")) _it, applying the rank plus nullity theorem to Ay.

Finally, it is evident that q < n since, if q > n + I, we would have aproperly increasing chain of subspaces N' C N2 C ... C Nnt1 C ... C N",which gives at least it + I dimensions, an impossibility in C". Thus, C" =Null(A") ® Col(A"). 0

DEFINITION 3.5 (index)For an n-by-n matrix A, the integer q in the above theorem is called the index

of A, which we will denote index(A) orlnd(A).

We now have many useful ways to describe the index of a matrix.

COROLLARY 3.12Let A E C"". The index q of A is the least nonnegative integer k such that

1. Null(A4) = Null(Ak+l )

2. Col(AL) = Col(Ak+I )

3. rank(AA) = rank(Ak+I)

4. Co!(Ak) nNull(Ak) = (0)

5. C" = Null(AA) ®Col(Ak).

Note that even if you do not know the index of a matrix A, you can always takeC" = Null(A") ® Col(A"). Recall that a nilpotent matrix N is one where Nk= 0 for some positive integer k.

COROLLARY 3.13The index q of a nilpotent matrix N is the least positive integer k with Nk = ®.

3.4 The Index of a Square Matrix131

PROOF Suppose p is the least positive integer with N" = 0. Then NN-'® so

Null(N) C .AIull(N) C C Null(Nt"-') C Null(NP) = A/ull(NP+i)=C".

Thus, p = q. 0

Actually, the direct sum decomposition of C" determined by the index ofa matrix is quite nice since the subspaces are invariant under A; recall W isinvariant under A if V E W implies Av E W.

COROLLARY 3.14Let A E C"x" have index q. (Actually, q could be any nonnegative integer.)Then

1. A(JVull(A4)) C A(ull(Ay)

2. A(Col(Ay)) c Col(Ay).

The index of a matrix leads to a particularly nice factorization of the matrix.It is called the core-nilpotent factorization in Meyer, [2000]. You might call thenext theorem a "poor man's Jordan canonical form" You will see why later.

THEOREM 3.14 (core-nilpotent factorization)Suppose A is in C0'", with index q. Suppose that rank(Ay) = r. Then there

exists an invertible matrix S such that A = SL

C®` ®J

S-' where C is

invertible and N is nilpotent of index q. Moreover, RA(x) = X' t µC(x).

PROOF Recall from the definition of index that C'1 = Col(A') ®JVull(A").Thus, we can select a basis of C" consisting of r independent vectors fromCol(Ay) and n - r independent vectors from JVull(Ay). Say x1, x2, ... , x,is a basis f o r Col(Ay) and Y1, Yz, ... , Yn-r is a basis of JVull(Ay). Form amatrix S by using these vectors as columns, S = [xi I X2 I I Xr I

Yi I I Y,,-.] = [S1 I S2]. Evidently, S is invertible. Now recall thatCol(Ay) and AIull(Ay) are invariant subspaces for A, and therefore, S-I AS =f C ®1.

Raising both sides to the q power yields with appropriate par-0 N


0qTi

titioning (S-'A S)q = S-' AqS = 1

®= Ay 1S, : S-1

JL T2 J

I lAgS AgS =Ti.. ®l[AS =

1Ti AgSI 0Com-

T,j Z

J T

I

J LTZAgSi ® J

.

paringparing the lower right corners, we see Nq = 0, so we have established that Nis nilpotent.

Next we note that Cq is r-by-r and rank(Aq) = r = rank(S-' AqS)

rank ® ®j = rank[C']. Thus Cy has full rank and hence C is in-

vertible. Finally, we need to determine the index of nilpotency of N. But ifindex(N) < q, then Ny-' would already be 0. This would imply rank(Aq-1)

r r= rank(S-' Aq-' S) = rank(I C

i

0 N0 I) = rank(L C , 0rank (Cq-1) = r = rank(Ay), contradicting the definition of q. The rest isleft to the reader. 0

Now we wish to take a deeper look at this concept of index. We are aboutto deal with the most challenging ideas yet presented. What we prove is thecrucial part of the Jordan canonical form theorem, which we will meet later.We determine the structure of nilpotent matrices. Suppose A in C" <" has indexq. Then

(0)=Null(A°) Null(A) Null(AZ) Null(A')

We associate a sequence of numbers with these null spaces.

DEFINITION 3.6 (Weyr Sequence)Suppose A in C""" has index q. The Weyr sequence of A, Weyr(A)

(w , w ... , wy), where w, = nlty(A), w, = nlty(AZ) - nlty(A), .... Gen-erally, w; = nlty(A') - nlty(A' ' ) for i = l , 2, ... , q. Note w4 = 0 fork2q.

In view of the rank-plus-nullity theorem, we can also express w; as

w; = rank(A' ') - rank(A').

Note that wl + w2 + + w; = dim(Null(A')) = n - dim(Col(A')),n > w, +w2+ +Wq and rank(A) > w2 +w3+ +wy. We may visualizewhat is going on here pictorially in terms of chains of subspaces.


(0)

L

N nilpotent, Ind(N) = q

C NUII(N) C Null(N) CWI 1 I fJt I I fJ1

+(Jt+fJi

133

' C Null(Nq)=Null(Nq+))=.... _C

Figure 3.3: The Weyr sequence.

We may also view the situation graphically.

Tnt

W) +(JZ + 4)3+

4)) + Wz

nlty(N) = a)w)

a3

(0) Null(N) Null(N2) Null(N3) Nul1(Nq) = Cn

Figure 3.4: Another view of the Weyr sequence.

Let's look at some examples. We will deal with nilpotent matrices since theseare the most important for our applications. First, let's define the "standard" k-by-k nilpotent matrix by

0 1 0 0 ... 0 00 0 1 0 0 0

Nilp[k] = E

0 1

Ckxk

0 0 0 0 ... 0 0


Thus we have zeros everywhere except the superdiagonal, which is all ones.It is straightforward to see that rank(Nilp[k]) = k - 1, index(Nilp[k]) = k,and the minimal polynomial p.Niit,lkl(x) = xk. Note Nilp[I ] _ [0].

0 1 0 0

Let N = Nilp[4] _ 0 0 0 Clearly rank(N) = 3, nullity(N)

0 0 0 0

0 0 1 0

];N 20 0 0 0 , rank(N2) = 2, nullity(N2) = 2; N3 =

0 0 0 0

0 0 0 1

0 0 0 0 rank(N3) = 1, nullity(N3) = 3; N4 = 0, rank(N;)0 0 0 00 0 0 0

0, nulliry(N4) = 4. Thus, we see Weyr(N) = (1, 1, 1, I) and index(N) = 4.Did you notice how the ones migrate up the superdiagonals as you raise N tohigher powers? In general, we see that

Weyr(Nilp[k]) = (1, 1, 1, 1,... , 1).k times

The next question is, can we come up with a matrix, again nilpotent, where the0 1 0 0

0 0 1 0

Weyr sequence is not all ones'? Watch this. Let N =0 0 0 1

0 0 0 0

000 1 Jwhere the blank spaces are understood to he all zeros. Now N is rank 4, nullity

0 0 1 0

0 0 0 1

2. Now look at N2. N2 =0 0 0 0 We see rank(N2) = 2,

0 0

0 0

0 0 0 1

0 0 0 0

nlty(N2) = 4. Next N3 =0 0 0 0

Clearly, rank(N3) = I ,

0 0

L 0 0nullity(N3) = 5 and finally N4 = 0. Thus, Weyr(N) = (2, 2, 1, 1).


Now the question is, can we exhibit a matrix with prescribed Weyr sequence?The secret was revealed above in putting standard nilpotent matrices in a blockdiagonal matrix. Note that we used Nil p[4] and Nilp[2J to create the Weyr se-quence of (2, 2, 1, 1). We call the sequence (4,2) the Segre sequence of N aboveand write Segre(N) = (4, 2) under these conditions. It may not be obvious toyou that there is a connection between these two sequences, but there is!

DEFINITION 3.7 (Segre sequence)Let N = Block Diagonal[Nilp[p,I, Nilp[p2], ... , Nilp[pk]]. We write

Segre(N) = (p,, p2, ... , pk). Let's agree to arrange the blocks so that p,k

p2 > > Pk. Note N E C"' where t = >2 pi.i_1

First, we need a lesson in counting. Suppose n is a positive integer. How manyways can we write n as a sum of positive integers? A partition of n is an m-tupleof nonincreasing positive integers whose sum is n. So, for example, (4, 2, 2, 1, 1)is a partition of 10 = 4 + 2 + 2 + I + 1. There are many interesting countingproblems associated with the partitions of an integer but, at the moment, weshall concern ourselves with counting how many there are for a given n. It turnsout, there is no easy formula for this, but there is a recursive way to get as manyas you could want.

Let p(n) count the number of partitions of n. Then

P(1) = 1 I

p(2) = 2 2, 1 + 1p(3)=3 3,2+1,1+I+1p(4)=5 4,3+1,2+2,2+1+1,1+I+1+1p(5)=7 5,4+1,3+2,3+1+ 1,2+2+1,2+1+1+ 1,1+1

+1+1+1p(6) = 11p(7) = 15

p(8) = 22

There is a formula you can find in a book on combinatorics or discrete mathe-matics that says

p(n)=p(n-1)+p(n-2)-p(n-5)-p(n-7)+....For example, p(8) = p(7)+ p(6)- p(3)- p(1) = 15+11-3-1 = 22. Now, in

the theory of partitions, there is the notion of a conjugate partition of a given par-tition and a Ferrer's diagram to help find it. Suppose a = (m,, in,, ... , ink) isa partition of n. Define the conjugate partition of a to be a* = (ri , r2 .*, r,'),where ri * counts the number of mjs larger than or equal to i for i = 1, 2, 3, ... .

So, for example, (5, 3, 3, 1)* = (4, 3, 3, 1, 1). Do not panic, there is an easy


way to construct conjugate partitions from a Ferrer's diagram. Given a partitiona = (m, , m_...... , mk) of n, its Ferrer's diagram is made up of dots (somepeople use squares), where each summand is represented by a horizontal rowof dots as follows. Consider the partition of 12, (5, 3, 3, 1). Its Ferrer's diagramis

Figure 3.5: Ferrer's diagram.

To read off the conjugate partition, just read down instead of across. Do yousee where (4, 3, 3, 1, 1) came from? Do you see that the number of dots in rowi of a* is the number of rows of a with i or more dots? Note that a** = of, sothe mapping on partitions of n given by a H a* is one-to-one and onto.

THEOREM 3.15Suppose N = Block Diagonal[ Nilp[p1 ], NilpIP2]. , Nilp[pk]].

Then Weyr(N) is the conjugate partition o f Segre(N) = (P,, P2, ... , Pk)

PROOF To make the notation a little easier, let N = Block Diagonal[N,, N2, ... , Nk], where Ni = Nilp[p,], which is size p,-by-p, for i =1,2, 3, ... , k. Now N2 =Block Diagonal[ N1 , N21 ... , Nk ], and generally

Ni = Block Diagonal [N, j, N 2 , ... , NA ] for j = 1, 2, 3, .... Next note that,for each i, rank(N,) = pi - 1, rank(N?) = pi - 2, ... , rank(N/) = pi - jas long as j < pi; however, rank(N;) = 0 if j > pi. Thus, we conclude

rank(N; -I) - rank(N!) I if i pi . The one acts like a counter sig-l0ifj>pinifying the presence of a block of at least size p,. Next, we have rank(Ni) _

>rank(N, ),sowi =rank(Ni')-rank(Ni)=1:rank(N;-,)->rank(Ni

)

_ >(rank(N/-') - rank(N; )). First, w, = rank(N°) - rank(N) = n-

rank(N) = nullity(N) =the number of blocks in N since each N; has nullity I(note the single row of all zeros in each N;). Next, w, = rank(N)-rank(N2) _

k

>(rank(N,)-rank(N'-)) = (rank(N,)-rank(N? ))+(rank(N2)-rank (N2))


+- + (rank(Nk) - rank(Nk )). Each of these summands is either 0 or I.A I occurs when the block Ni is large enough so its square does not exceedthe size of Ni. Thus, w2 counts the number of Ni of size at least 2-by-2. Next,

A

w; = rank(N2) - rank(N3) = >(rank(N2) - rank(N, )) = (rank(N2) -i=1

rank(N3 )) + (rank(N?) - rank(NZ )) + + (rank(NN) - rank(NN )). Again,a I occurs in a summand that has not yet reached its index of nilpotency, so aI records the fact that Ni is of size at least 3-by-3. Thus w3 counts the numberof blocks Ni of size at least 3-by-3. Generally then, wj counts the number ofblocks Ni of size at least j-by-j. And so, the theorem follows. 0

COROLLARY 3.1 SWith N as above, the number of size j-by-j blocks in N is exactly wj - wj +,

rank(Ni-1) - 2rank(Ni) + rank(Ni+') == [nlty(NJ) - nlty(N)-t )] - [nlty(NJ+') - nity(Ni)].

PROOF With the notation of the theorem above, (rank (N; -')-(rank(N/ ))-

(rank(NJ) - rank(N; +i )) = { I if I = Pi . Thus, a 1 occurs exactly when0 otherwisea block of size pi-by-pi occurs. Thus the sum of these terms count the exactnumber of blocks of size j-by- j. 0

Let's look at an example to be sure the ideas above are crystal clear. Consider0 1 0 00 0 1 0

0 0 0 1

0 0 0 0

N = 0 1

0 0

. Clearly, w, = 9 - rank(N) =

0 1

0 0

L 0J9 - 5 = 4, so the theorem predicts 4 blocks in N of site at least 1-by-1. That is, it predicts (quite accurately) exactly 4 blocks in N. Next w2 =rank(N) -rank(N2) = (rank(N,) - rank(N? )) + (rank(N2) - rank(N2 )) -(rank(N3) - rank(N2))+ (rank(N4) - rank(NN )) = I + 1 + 1 +0 = 3 blocksof size at least 2-by-2, 03 = rank(N2) - rank(N3) = I + 0 + 0 + 0 = 1block of size at least 3-by-3, w4 = rank(N3) - rank(N4) = I + 0 + 0 +0 = I block of size at least 4-by-4, and w5 = rank(N4) - rank(N5) = 0,


which says there are no blocks of size 5 or more. Moreover, our formulasw, - w, = 4 - 3 = I block of size I-by-Iw, - w = 3 - 1 = 2 blocks of size 2-b -2yw; - w4 = I - I = U blocks of size 3-by-3w4 - ws = J - 0 = I block of size 4-by-4

Next let's illustrate our theorem. Suppose we wish to exhibit a nilpotentmatrix N with Weyr(N) _ (4, 3, 2, 1, l). We form the Ferrer's diagram

Figure 3.6: Ferrer's diagram for (4, 3, 2, 1, 1).

and read off the conjugate partition (5, 3, 2, 1). Next we form the matrixN = Block Diagonal[Nilp[5], Nilp[3], Nilp[2], Nilp[l]] _

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 0

0 1 0

0 0 1

0 0 0

0 1

0 0

0

Then Weyr(N) = (4, 3, 2, 1, I).The block diagonal matrices we have been manipulating above look rather

special but in a sense they are not. If we choose the right basis, we can makeany nilpotent matrix look like these block diagonal nilpotent matrices. This isthe essence of a rather deep theorem about the structure of nilpotent matrices.Let's look at an example to motivate what we need to achieve.

Suppose we have a nilpotent matrix N and we want N to "look like"0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 00 0 0 0 0

1

1 0 0That is, we seek an invertible matrix S

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0


such that S-1 NS will be this 8-by-8 matrix. This is the same as seeking a specialbasis of C8. If this is to work, we must have NS = S

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0

[ ISet ISe2ISe31-6 ISesI ISell=ISi IS2IS3I IS5I IS71,

where si is the ith column of the matrix S. Equating columns we see

Ns,=-6 Nss=-6 Ns7=-6Nsg = S1 Ns6 = s5 Ns8 = s7Ns3 = s2Nsa = $3

In particular, we see that s1, s5, and s7 must form a basis for the null space ofN. Note N is rank 5 so has nullity 3. Also note

Ns,=-6 Ns5 =-6 Ns7

N2s2 = ' N2s6 =N3s3 =N a sa=

and so

N zsg=

Null(N) = sp(s1, s5, s7}Null(N2) = sp(s1, s5, s7, s2, s6, SONull(N3) = sp{SI, s5, s7, S2, $6, S8, s3}1%full(N4) = sp{sl, S5, S7, S2, S6, S8, S3, S4} = C8

Finally, note that

S4 S6 S8

Nsa = s3 Ns6 = s5 Ns8 = S7N2sa = S2N3Sa = Si

The situation is a bit complicated but some clues are beginning to emerge. Thereseem to be certain "Krylov strings" of vectors that are playing an important role.We begin with a helper lemma.


LEMMA 3.1For any square matrix A and positive integer k, Null(Ak-') e Nu1l(At) CNUll(A"+' ) . Let 13k_i = { b i , b2, , ... , br} be a basis forArull(Ak-). ExtendL34-1 to a basis o f Null(A"); say 5k = { b , , b2, , ... , br, c , , c), ... , c, } . Fi-

nally, extend B k to a basis o f Null (A"+' ); say Lik+, = {bi , b2, , ... , b,, c, ,

c2,...,c,, di,d2,...,d,}.Then T=(b,,b2...... br,Ad,,Ad,,...,Ad,)is an independent subset of Null(Ak) though not necessarily a basisthereof.

PROOF First we argue that T _e 11ull(Ak). Evidently all the bis are inJVull(Ak-') so Akbi = AAk-'bi = A ' = -6, putting the bis inlull(A").Next, the dis are in Null(Ak+') so = Ak+'di = Ak(Adi), which puts theAdis in Null(Al) as well. Next, we argue the independence in the usual way.Suppose (3,b, '.Then A(a,d, +

+a,d,) = - ((31 b, + + (3,b,.) E Null(Ak-' ). Thus, Ak-' A(a, d,+ +a,d,) _ ' putting aid, + + aid, E Null(A"). Therefore, a,d, + +a,d, =y,b, But then aid,y, b, - - y, br - S, c, - - Bsc, = 6. By independence, all the as, ys,and bs must be zero. Thus 01b, + + (3rbr = so all the (3s must bezero as well by independence. We have proved T is an independent subset ofJVull(Ak). 0

Now let's go back to our example and see if we could construct a basis ofthat special form we need for S.

You could start from the bottom of the chain and work your way up, oryou could start at the top and work your way down. We choose to buildfrom the bottom (i.e., from the null space of N). First take a basis B, forNull(N); say B, = {b,, b2, b3}. Then extend to a basis of Null(N2); sayLie = {b,, b2, b3, b4, b5, b6}. Extend again to a basis of Null(N3); say 133 ={b,, b2, b3, b4, b5, b6, b7). Finally, extend to a basis 84 of Null(N4) = C8,say E4 = {b,, b2, b3, b4, b5, b6, b7, b8}. The chances this basis has the specialproperties we seek are pretty slim. We must make it happen. Here is wherethings get a hit tricky. As we said, we will start from the full basis and work ourway down. Let T4 = {b8}; form L32 U (Nb8) = (b,, b2, b3, b4, b5, b6, Nb8).This is an independent subset of Null(N3) by our lemma. In this case, it mustbe a basis of A(u!l(N3). If it was not, we would have had to extend it to a basisat this point. Next, let T; = (Nb8); form B, U {N2b8} = {b,, b2, b3, N2b8} cNull(N2). Again this is an independent set, but here we must extend to geta basis of Null(N2); say {b,, b2, b3, N2b8, z,, z2}. Next, let T = {N3b8}U{Nz,, Nz2} _ (N3b8, Nz,, Nz2} c_ Null(N). This is abasis forMull(N). Sohere is the modified basis we have constructed from our originalone: {N3b8, Nz,, Nz2, N2b8, z,, z2, Nb8, b8). Next, we stack it


carefully:

b8 in A(ull(N4)Nb8 in Null(N3)N2b8 z, z2 in .Afull(N2)N3b8 Nz, Nz2 in Null(N)

Now simply label from the bottom up starting at the left.

b8 = S4Nb8 = S3

N2b8 = S2 Z, = S6 Z2 = S8N3b8 = s, Nz, = s5 Nz2 = s7

141

The matrix S = [N3b8 I NZb8 I Nb8 I b8 I Nz, I zi I Nz I z21 has thedesired pro erty; namely, NS = [76 I N3b8 I N2b8 I Nb8 I I Nz, INz2] = 17 1 s, I S2 I s3 16 I s5 16 I s7], which is exactly what we wanted.Now let's see if we can make the argument above general.

THEOREM 3.16Suppose N is an n-by-n nilpotent matrix of index q. Then there exists an invertiblematrix S with S`NS =Block Diagonal[Nilp[pi], Nilp[p2], ... , Nilp[pk]].The largest block is q-by-q and we may assume p, > P2 > > Pk . Of course,p, + P2 + + Ilk = n. The total number of blocks is the nullity of N, and thenumber of j-by-j blocks is rank(NJ-1) - 2rank(Ni) + rank(NJ+, ).

PROOF Suppose N is nilpotent of index q so we have the proper chain ofsubspaces

(0) (; Null(N) C, Null(N2) C ... C Null(Nq-') C, Null(N?) = C".

As above, we construct a basis of C" starting at the bottom of the chain.Then we adjust it working from the top down. Choose a basis Ci, of Null(N);say Iii = {b,, ... ,bk,}. Extend to a basis of Null(N2); say B2 = {b,,... bk, , bk,+l , ... , bk2 }. Continue this process until we successively produce abasis of C", 1 3 q = {b, , ... bk, , bk,+,, ... , b2..... bky -2+, .... , bky , ,

bky_,+, , ... , bkq J. As we did in our motivating example, we begin at the topand work our way down the chain. Let T. = {bky_,+,, ... , bky}; form 13q_2 U(Nbky_,+,, ... , Nbky}. This is an independent subset of Null(Nq-1) by ourhelper lemma. Extend this independent subset to a basis of Null(Nq-1); say8q_2 U {Nbky ,+, , ... , Nbky) U {c , ... , c,,, }. Of course, we may not need


any cs at all as our example showed. Next take 7t-I = (Nbk,, ,+I . . , Nbkq ) U

{c11, ... , clt, } and form 8,,-3 U {N2bk ,+I , , N2bk,, } U {Nc11, .... Ncl,, ).This is an independent subset ofNu!!(Ny-22). Extend this independent subset to

+ 1 ,_ .basis ... , N2bky}U{Ncll, ... , Ncl,, IU(c21, ... , c2t2 J. Continue this process until we produce a basis for Null(N).Now stack the basis vectors we have thusly constructed.

bk,

N (bj,_,+l) ... Nbk C11... c1,

N2 (bkq +I) ... N2 (bky) Nc11 ... Ncl,

Ny-I(bk,-,+I) ... Ny-1 (bk,,) Ny-2(CII) ... Ny-2 (elti)

C21 ... C2r2

Ny-3 (c,1) ... Ny-3 (e212) Cy-I I ... Ctt-Iry

Note that every column except the last is obtained by repeated application ofN to the top vector in each stack. Also note that each row of vectors belong tothe same null space. As in our example, label this array of vectors starting atthe bottom of the first row and continue up each column. Then S =

_(bk,,i+I) I ... I bk,t_,+I I Ny (bky) I

...[Nt (bk, i+I) I Ny,

... I Cy-II I... I Cy-lt ,}

This matrix is invertible and brings N into the block diagonal form as adver-tised in the theorem. The remaining details are left to the reader. 0

This theorem is a challenge so we present another way of looking at theproof due to Manfred Dugas (I I February, 1952). We illustrate with a nilpo-tent matrix of index 3. Let N E C""" with N3 = ®i4 N22 . Now (V0 )

Null(N) (; Nul!(N2) (; Null(N3) = C. Since Null(N2) is a proper sub-space of'C", it has a complementary subspace M3. Thus, C" = Null(N2)®M3.Clearly, N(M3) c Null(N2) and N2(M3) c Null(N). Moreover, we claimthat N(M3) fl Mull(N) = (t), for if v E N(M3) fl Null(N), then v =Nm3 for some m3 in M3 and 6 = Nv = N2m3. But this puts m3 inNull(N2) fl M3 = (v). Therefore, m3 = so v = Nm3 = -6. NowNull(N) ® N(M3) e Null(N2) and could actually equal Nul!(N2). In anycase, we can find a supplement M2 so thatNull(N2) = J%/u11(N)®N(M3)M2.Note N(M2) S; Arul!(N). Even more, we claim N(M2) fl N2(M3) _ (v),for if v E N(M2) fl N2(M3), then v = Nm2 = N2m3 for some m2 in


M2 and M3 in M3. But then, N(m2 - Nm3) putting m2 - Nm3 inJ\/ull(N) fl (N(M3) ( E ) = (-d'). Thus, M2 = Nm3 E M2 fl N(M3) = (e ),making v = Nm2 ='. Now N(M2) ® N2(M3) c_ AIull(N), so there is asupplementary subspace M, with Null(N) = N(M2) ® N2(M3) ® M1. Thus,collecting all the pieces, we have C" = M3 ®N M3 ®M2 ®N M2 ® N2 M3 ®M 1.We can reorganize this sum as

C" =[M3eNM3ED N2M3]®[M2®NM2]®MI = L3ED L2®LI,

where the three blocks are N-invariant. To create the similarity transforma-tion S, we begin with a basis B3 = (bi3j, b23), ... , b(3)} of M3. We claim

2 (3) 2 (3) 2 (3)d;

IN bI , N b2 , ... N bd3 } is an independent set. As usual, set = Eajj=1

dN2b(. ), which implies ajb(" E Jfull(N2) fl M3 = This means

j=1d3

all the ajs are 0, hence the independence. Now if 0 = thenj=I

d,d 0 so, as above, all the ajs are 0. Thus we havej=1

L3 = M3 ® NM3 ® N2M3 = span{b(,3), b2 ), ... , b3), NbI3 Nb2), ... ,

Nb(3), N2b(3) N2b(3) ... , N2b(3)}

L2 = M2 ® NM2 = spun{b'2), b(2) , ... , bd2), N2b12j,Nb2), ... , Nbd2)}

L3 = M1span{b(,g), b2 (l), ... , b(l)),

again noting these spans are all N-invariant. Next we begin to construct S. LetS3) _ [N2bj3j I Nbj3) I bi3)] E Ci 3 for j = I, ... , d3. We compute that

10 I 0NSj3 = IV I

I [N2bj3) I Nb(j3) I 0 0 1 =0 0 0

S(3)Nilp[3]. Similarly, set Sj2j =1 [Nb;2) I b(2)] and find NS(.2) = [ INb(2)] _ [Nb(z)

Ib(2)] L 0

0J =

S(.2'Nilp[2]. Finally, set S(') = [b(il)].

0

Compute that NS(1) [b(-1)][0]. We are ready to take S = [S,(3)

0ISd)IS2)I...IS2)IS')I...IS(d')].Then


Nilp[31

Nilp[3]

NS=SNilp[21

Nilp[2]0

L 0Jwhere there are d3 copies of Nilp[3], d2 copies of Nilp[2], and d, copies ofNilp[ 1 ]. So we have another view of the proof of this rather deep result aboutnilpotent matrices.

This theorem can be considered as giving us a "canonical form" for anynilpotent matrix under the equivalence relation of "similarity" Now it makessense to assign a Segre sequence and a Weyr sequence to any nilpotent matrix.We shall return to this characterization of nilpotent matrices when we discussthe Jordan canonical form.

Exercise Set 111 3 1 0

0 1 0 1

1. Compute the index of1 4 1 1

0 0 2 1

s - t -2s s+t2. Argue that N = -t 0 t is nilpotent. Can you say

-s - t 2s -s + tanything about its index?

3. Fill in the details of the proof of Corollary 3.12.


5. What is the index of 0? What is the index of What is the index of aninvertible matrix?

6. Suppose p = P2 but P # I or 0. What is the index of P?


7. Let M = R

1 2 0 0 0

3 4 0 0 0

0 0 0 1 0

0 0 0 0 1

0 0 0 0 0

R`. What is the index of M?

8. Let L =L

® ®J , where M is from above. What is the index of L?

9. Find the conjugate partition of (7, 6, 5, 3, 2).

10. Construct a matrix N with Weyr(N) = (5, 3, 2, 2, 1).

11. Argue that the number of partitions of n into k summands equals thenumber of partitions of n into summands the largest of which is k.

12. Argue that if N is nilpotent of index q, then so is S- 1 N S for any invertiblesquare matrix S.

13. Argue that if B = S- 'AS, then Weyr(B) = Weyr(A).

14. Consider all possible chains of subspaces between (') and C4. Howmany types are there and what is the sequence of dimensions of eachtype? Produce a concrete 4-by-4 matrix whose powers generate eachtype of chain via their null spaces.

15. Consider the differentiation operator Don the vector space V = C[x]`"of polynomials of degree less or equal to n. Argue that D is nilpotent.What is the index of nilpotency? What is the matrix of D relative to thestandard basis 11, x, x2, ... , x")?

16. An upper(lower) triangular matrix is called strictly upper(lower) triangu-lar iff the diagonal elements consist entirely of zeros. Prove that strictlyupper(lower) triangular matrices are all nilpotent. Let M be the matrix

A m12 ... MD,0 A ... mz

0 0 ... A

Argue that M - hl is nilpotent.

17. Can you have a nonsingular nilpotent matrix? Suppose N is n-by-nnilpotent of index q. Prove that I + N is invertible. Exhibit the inverse.

18. Compute p(9), p(10), p(l 1), and p(12).

19. Is the product of nilpotent matrices always nilpotent?


20. Suppose N is nilpotent of index q and cx is a nonzero scalar. Argue that aNis nilpotent of index q. Indeed, if p(x) is any polynomial with constantterm zero, argue that p(N) is nilpotent. Why is it important that p(x)have a zero constant term?

21. Suppose Ni and N2 are nilpotent matrices of the same size. Is the sumnilpotent? What can you say about the index of the sum'?

22. Suppose N is nilpotent of index q. Let V E C" with Ny-Iv # 0. Thenq < n and the vectors v, Nv, N2v, ... , Ny-1 v are independent.

23. Suppose N is nilpotent of index q. Prove that ColM-1) e Null(N).

24. (M. A. Khan, CMJ, May 2003). Exponential functions such as f (x) = 2'have the functional property that f (x + y) = f (x) + f (y) for all x andy. Are there functions M:C[x]" " -* C[x]" that satisfy the samefunctional equation (i.e., M(x + y) = M(x) + M(y))? If there is suchan M, argue that I = M(0) = M(x)M(-x), so M(x) must be invert-ible with M(-x) = (M(x))-1. Also argue that (M(x))' = M(rx), sothat M(X)' = M(x). Thus, the rth root of M(x) is easily found by

rreplacing x by X. Suppose N is a nilpotent matrix. Then argue that

rN2x2 NY

M(x) = I + Nx + +3!

+ is a matrix with polynomial

entries that satisfies the functional equation M(x + y) = M(x) + M(y).7 -10 7 -4

Verity Khan's example. Let N = 4 -418

5Argue

4 -13 16 -7that N is nilpotent and find M(x) explicitly and verify the functionalequation.

25. Suppose N is nilpotent of index q. What is the minimal polynomial ofN? What is the characteristic polynomial of Nilp[k]'?

26. Find a matrix of index 3 and rank 4. Can you generalize this to any rankand any index?

27. Suppose A is a matrix of index q. What can you say about the minimalpolynomial of A? (Hint: Look at the core-nilpotent factorization of A.)

28. Draw a graphical representation of the Weyr sequence of a matrix usingcolumn spaces instead of null spaces, as we did in the text in Figure 3.4.

29. What is the trace of any nilpotent matrix?


30. Show that every 2-by-2 nilpotent matrix looks likeL

a[32 R2

L -a a[3

31. Suppose N = Nilp[n] and A is n-by-n. Describe NA, AN, NT A, ANT ,NT AN, NANT, and N' AN`.

32. Suppose A is a matrix with invariant subspaces (') and C. Prove thateither A is nilpotent or A is invertible.

Further Reading

[Andr, 1976] George E. Andrews, The Theory of Partitions, Addison-Wesley, Reading, MA, (1976). Reprinted by Cambridge University Press,Cambridge, (1998).

[B&R, 1986(4)] T. S. Blyth and E. F. Robertson, Linear Algebra, Vol. 4,Chapman & Hall, New York, (1986).

[G&N, 2004] Kenneth Glass and Chi-Keung Ng, A Simple Proof of theHook Length Formula, The American Mathematical Monthly, Vol. 111,No. 8, October, (2004), 700-704.

[Hohn, 19641 E. Hohn, Elementary Matrix Theory, 2nd Edition, TheMacmillan Company, New York, (1958, 1964).

[Shapiro, 19991 Helene Shapiro, The Weyr Characteristic, The AmericanMathematical Monthly, Vol. 106, No. 10, December, (1999), 919-929.

3.4.1 MATLAB Moment

3.4.1.1 The Standard Nilpotent Matrix

We can easily create a function in MATLAB to construct the standard nilpo-tent matrix nilp[n]. Create the following M-file:

I function N = nilp(n)2- if n == O.N = [].else,3- N = diag(ones(n - 1), 1); end

This is an easy use of the logical format "if ... then ... else". Note, if n = 0,the empty matrix is returned. Try out your new function with a few examples.


There is a function to test if a matrix is empty. It is

isempty(A).

How could you disguise the standard nilpotent matrix to still he nilpotent butnot standard? (Hint: If N is nilpotent, so is SNS-t for any invertible S.)

3.5 Left and Right Inverses

As we said at the very beginning, the central problem of linear algebra is theproblem of solving a system of linear equations. If Ax = b and A is squareand invertible, we have a complete answer: x = A-'b is the unique solution.However, if A is not square or does not have full rank, inverses make no sense.This is a motivation for the need for "generalized inverses." Now we face up tothe fact that it is very unlikely that in real-life problems our systems of linearequations will have square coefficient matrices. So consider Ax = b, where Ais m-by-n, x is n-by-1, and b is m-by-I. If we could find a matrix C n-by-mwith CA = I", then we would have a solution to our system, namely x = Cb.This leads us to consider one-sided inverses of a rectangular matrix, which is afirst step in understanding generalized inverses.

DEFINITION 3.8 (left, right inverses)Suppose A is an m-by-n matrix. We say B in Cnx"' is a left inverse for A if]

BA = I". Similarly we call C in C' xm a right inverse for A if AC = I",.

The first thing we notice is a loss of uniqueness. For example, let A =1 00 1 . Then any matrix B =

0 0x

Jis a left inverse for any choice

0 0 y

of x and y. Next we consider existence. Having a one-sided inverse makes amatrix rather special.

THEOREM 3.17Suppose A is in C"' x" . Then

I. A has a right inverse iff A has full row rank m.

2. A has a left inverse iff A has full column rank n.

PROOF (I) Suppose A has a right inverse C. Then AC = /",. But partitioningC into columns, AC = [Act IAc2I I Ac",] = I", _ [et I e2 I I e,,,] so

3.5 Left and Right Inverses 149

each Ac; is the standard basis vector ei. Thus, (Acl, Ace, , Ac) is a basisof C0". In particular, the column space of A must equal C". Hence the columnrank of A is in. Therefore the row rank of A is m, as was to be proved.

Conversely, suppose A has full row rank m. Then its column rank is alsoin. so among the columns of A, there must be a basis of the column space,which is C". Call these columns dl, d2, , d",. Now the standard basis vectorse: , e2, , e,,, belong to C"' so are uniquely expressible in terms of the ds: say

el =aiidi +a12d2+...+almdn,e2 = a21d1 + a22d2 + ... + a2mdm

en, = a,,,1dI +am2d2 +... +ammdn,

Now we will describe how to construct a right inverse C with the help of theseas. Put a11, a21, , a,,,I in the row corresponding to the column of dl inA. Put a12, a22, , amt in the row corresponding to the column d2 in A.Keep going in this manner and then fill in all the other rows of C with zeros.Then AC = [e1 1e21 ... le",] = I,,,. Let's illustrate in a concrete example what

just happened. Suppose A =L

b d e h1

has rank 2. Then there must

be two columns that form a basis of the column space C2, say d1 = [ d

d2=1 ].menel=L 01=a11 d ]+a12[ ]ande2=[

01=

«21[d I +a221

h].musAC=[: ef

hh

0 0

all a21

0 0

a12 a22

1 0 j'20

l1

(2) A similar argument can be used as above or we can be more clever. A hasfull column rank n iff A' has full row rank n iff A' has a right inverse iff A hasa left inverse. 0

Now we have necessary and sufficient conditions that show how special youhave to be to have a left inverse. The nonuniqueness is not totally out of controlin view of the next theorem.

THEOREM 3.18If A in Cm- has a left inverse B, then all the left inverses of A can be writtenas B + K, where K A = ®. A similar statement applies to right inverses.


PROOF Suppose BA = I and B, A = 1. Set K, = B, - B. Then K, A =(B, -B)A=B,A-BA=I - I = 0. Moreover, B, =B+K,. 0

Okay, that was a pretty trivial argument, so the theorem may not be thathelpful. But we can do better.

THEOREM 3.19Let A be in C"'.

1. Suppose A has full column rank n. Then A* A is invertible and (A* A )- ' A*is a left inverse of A. Thus all left inverses of A are of the form (A* A)- I A*+K, where KA = 0. Indeed, we can write K = W[1,,, - A(A*A)-' A*],where W is arbitrary of appropriate size. Hence all left inverses of A looklike

(A*A)-'A* + W[I,,, - A(A*A)-IA*]

2. Suppose A has full row rank m. Then AA* is invertible and A*(AA*)-iis a right inverse of A. Thus all right inverses of A are of the formA*(AA*)-' + K, where AK = 0. Indeed K = [I - A*(AA*)-'A]V,where V is arbitrary of appropriate size. Hence all right inverses of Alook like

A*(AA*)-' + [1 - A*(AA*)-' A]V.

PROOF Suppose A has full column rank n. Then A*A is n-by-n andr(A*A) = r(A) = n. Thus A*A has full rank and is thus invertible. Then[(A*A)-'A*]A = (A*A)-'(A*A) = I,,. The remaining details are left asexercises.

A similar proof applies. 0

As an example, suppose we wish to find all the left inverses of the matrix A =1 2 ( -72 1 ].WeflndA*A = [ 17

6, and (A*A)-' = isL

_6 143 1

Now we can construct one left inverse of A, namely (A*A)-l A* _, 8 5 11

T, 21 0 -7 ] The reader may verity by direct computation that this

matrix is indeed a left inverse for A. To get all left inverses we use

C = (A*A) ' A* + W[I,11 - A (A*A)-i A*].

3.5 Left and Right Inverses

Let W _ a bd e

34 5 -3

151

fc ] he a parameter matrix. Now A (A*A)-' A* =

' 5 10 15 . ThenI -3 15 26

I -8 5 11 a bC 35 21 0 -7 ]+[ d e

I 5 39g 3 3

55

35 3255

I5SfC

] T IS 39 =35 35 35

15 +a -5b+3c 35 -5a+25b- 15c 'S +3a - 15b+9c[ 35+d-5e+3f -5d+25e-15f -5+3d-15e+9f]

2?--L+ t 35 - St - + 3t ] , where t = a - 5b + 3c and

35 + S -5s35

+3ss =d -5e+3f.

The reader may again verify we have produced a left inverse of A.So where do we stand now in terms of solving systems of linear equations?

We have the following theorem.

THEOREM 3.20Suppose we have Ax = b, where A is m-by-n of rank n, x is n-by-I, and b,necessarily m-by-I. This system has a solution if and only if A(A*A)-' A*b =b (consistency condition). If the system has a solution, it is (uniquely) x =(A*A)-' A*b.

PROOF Note n = rank(A) = rank(A*A), which guarantees the existenceof (A*A)-', which is n-by-n and hence of full rank. First suppose the conditionA (A*A)-' A*b = b. Then evidently x0 = (A*A)-' A*b is a solution for Ax =b. Conversely, suppose Ax = b has a solution x,. Then A*Ax, = A*b so(A*A)-'(A*A)x, = (A*A)-'A*b sox, = (A*A)-'A*b whence b = Ax, _A(A*A)-' A*b.

Now suppose Ax = b has a solution. A has a left inverse C so x = Cb is asolution. But C can be written as C = (A*A)-' A* + W[I,,, - A (A*A)-' A*].So, Cb = (A*A)-' A*b+W[I,,, - A (A*A)-' A*]b = (A*A)-' A*b+O usinthe consistency condition. Thus, x = (A*A)-' A*b.

For example, consider the system

x+2y=I2x+y = 1 .

3x+y=1

152

134 5 -3 I

j i4A(A*A)-'A* I

J= s 5 10 15 1 1

35

1 -3 15 26 1 I

conclude this system is inconsistent (i.e., has no solution). However, for

x+2y=32x+y=3,3x+y=4

35

SO we

34 5 -3 3

I I

3

5 10 15 3 = 3 so the system is consistent and has-3 15 26 4 4

3

the unique solution i5L

21

0

51

-7 J3 = J.

4

Exercise Set 12

1. (a) Suppose A has a left inverse and AB = AC. Prove that B = C.(b) Suppose A has a right inverse and suppose BA = CA. Prove that

B = C.

2. Prove that rank(SAT) = rank(A) if S has full column rank and T hasfull row rank.

3. Argue that KA = 0 iffthere exists W such that K = W[1-A(A*A)-lA*].What can you say about the situation when AK = 0?

4. Argue that A has a left inverse iffNull(A) = (0).

5. Suppose A = LK where L has a left inverse and K has a right inverse.Argue that r(A) = r(L) = r(K).

1 06. Construct all left inverses of A = 0 1

1 I

Subspaces Associated to Matrices

xl

1

We check consistency with1

I.J 1

7. Argue that A has a left inverse if AT has a right inverse.

8. Argue that a square singular matrix has neither a left nor a right inverse.

3.5 Left and Right Inverses 153

9. Let A be an m-by-n matrix of rank m. Let B = A*(AA*)-I. Show B isa right inverse for A and A = (B*B)-I B*. Moreover, B is the only rightinverse for A such that B* has the same row space as A.

10. If A is m-by-n and the columns of A span C"', then A has a right inverseand conversely.

1 0 0 2 1

1 I. Find all right inverses of A 0 1 1 0 1

1 0 1 2 1

12. Give an example of a matrix that has neither a left inverse nor a rightinverse.

13. Suppose A is rn-by-n, B is n-by-r, and C = AB. Argue that if A and Bboth have linearly independent columns, then C has linearly independentcolumns. Next argue that if the columns of B are linearly dependent, thenthe columns of C must be linearly dependent.

14. Let T : C" - C"' be a linear map. Argue that the following statementsare all equivalent:

(a) T is left invertible.(b) Ker(T) = (6).(c) T :C1 - Im(T) is one-to-one and onto.(d) n < m and rank(A) = n.(e) The matrix of T, Mat(T;C3,C) in Cmxn has n < m and has full rank.

15. Let T : C" - Cbe a linear map. Argue that the following statementsare all equivalent:

(a) T is right invertible.(b) rank(T) = m.(c) T : M C"' is one-to-one and onto where M ® Ker(T) = C".(d) n > m and nlt y(T) = n - m.(e) The matrix of T, Mat(T;t3,C) in Cm"" has n > m and has full

rank.

16. Suppose A is m-by-n and A = FG. Suppose F is invertible. Argue thatA has a right inverse iff G has a right inverse.

17. Suppose A has a left inverse C and the linear system Ax = b has asolution. Argue that this solution is unique and must equal Cb.

18. Suppose A has a right inverse B. Argue that the linear system Ax = bhas at least one solution.


19. Suppose A is an m-by-n matrix of' rank r. Discuss the existence anduniqueness of left and right inverses of A in the following cases: r =m<n,r<m<n,r=in =n,r<in =n,and r=n<m,r<n<m.

20. Argue that A has a left inverse iff A* has a right inverse.

Further Reading

[Noble, 1969] Ben Noble, Applied Linear Algebra, Prentice Hall, Inc.,Englewood Cliffs, NJ, (1969).

[Perlis, 1952] Sam Perlis, Theory of Matrices, Dover Publications Inc.,New York, (1952).

Chapter 4

The Moore-Penrose Inverse

RREF, leading coefficient, pivot column, matrix equivalence, modifiedRREF, rank normal form, row equivalence, column equivalence

4.1 Row Reduced Echelon Form and Matrix EquivalenceThough we have avoided it so far, one of the most useful reductions of a matrix

is to bring it into row reduced echelon form (RREF). This is the fundamentalresult used in elementary linear algebra to accomplish so many tasks. Evenso, it often goes unproved. We have seen that a matrix A can be reduced tomany matrices in row echelon form. To get uniqueness, we need to add somerequirements. First, a little language. Given a matrix A, the leading coefficientof a row of A is the first nonzero entry in that row (if there is one). Evidently,every row not consisting entirely of zeros has a unique leading coefficient. Acolumn of A that contains the leading coefficient of at least one row is called apivot column.

DEFINITION 4.1 (row reduced echelon form)A matrix A in C111-1 is said to be in row reduced echelon form iff

1. For some integer r > 0, the first r rows are nontrivial (not totally filledwith zeros) and all the remaining rows (if there are any) are totally filledwith zeros.

2. Row 1, row 2,. up to and including row r has its first nonzero entry aI (called a leading one).

3. Suppose the leading ones occur in columns c1, c2, , Cr. Then ci <C2 < ... < cr.

155

156 The Moore-Pen rose Inverse

4. In an)' column with a leading one, all the other entries in that column arezero.

For example,0 0 1 4 6 0 0 7 5

0 0 0 0 0 1 0 6 3

0 0 0 0 0 0 1 4 8 is in row reduced echelon form.0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

In other words, we have RREF if each leading coefficient is one, any zerorow occurs at the bottom, in each pair of successive rows that are not totallyzero, the leading coefficient of the first row occurs in an earlier column thanthe leading coefficient of the later row and each pivot column has only onenonzero entry, namely a leading one. In particular, all entries below and to theleft of a leading one are zero. Do you see better now why the word "echelon" isused? Notice, if we do not demand condition (4), we simply say the matrix is inrow echelon form (REF) so that it may not be "reduced" In the next theorem,we shall prove that each matrix A in C"I can he reduced by elementary rowoperations to a unique matrix in RREF. This is not so if condition (4) is notrequired.

THEOREM 4.1Let A be in C"'". Then there exists a finite sequence of elementary matricesE, , E2, , Ek such that Ek ER-, . . . E2 E, A is in RREF. Moreover this matrixis unique and we denote it RREF(A), though the sequence of elementary matrices

Grxnthat produce it is not. Moreover, if r is the rank of A, then RA =

®(m-r)xnwhere R = Ek Ek_I ... E( is in Cx"' and is invertible. In fact, the rank of G is

Gr where G is r-by-n. In particular A = R-' . Moreover, Row(A)

Row(G).

PROOF If A = 0, then A is in RREF and RREF(A) = 0. So suppose A# 0. Then A must have a column with nonzero entries. Let be the numberof the first such column. If the (1, c,) entry is zero, use a permutation matrix Pto swap a nonzero entry into the (I, c,) position. Use a dilation, if necessary,to make this element 1. If any element below this I is nonzero, say Of in the(j, c,) position, use the transvection T i (-a) to make it zero. In this way, all theentries of column c, except the (1, c,) entry can be made zero.Thus far we have

4. 1 Row Reduced Echelon and Matrix Equivalence 157

A -> TI D, P, A =

*

** . Now, if all the rows below the

L 0 * *jfirst row consist entirely of zeros, we have achieved RREF and we are done. Ifnot, there will be a first column in the matrix above that has a nonzero entry, 0say, below row 1. Suppose the number of this column is c2. Evidently c, < c2.By a permutation matrix (if necessary), we can swap rows and move R intorow 2, keeping 0 in column c2. By a dilation, we can make R be 1 and we canuse transvections to "zero out"the entries above and below the (2, c2) position.Notice column c, is unaffected due to all the zeros produced earlier in thatcolumn. Continuing in this manner, we must eventually terminate when we getto a column c which will be made to have a I in the (r, Cr) position with zerosabove and below it. Either there are no more rows below row r or, if there are,they consist entirely of zeros.

Now we argue the uniqueness of the row reduced echelon form of a matrix.As above, let A be an m-by-n matrix. We follow the ideas of Yuster [1984]and make an inductive argument. For other arguments see Hoffman and Kunze[1971, p. 561 or Meyer [2000, p. 134]. Fix m. We proceed by induction on thenumber of columns of A. For n = 1, the result is clear (isn't it always?), soassume n > 1. Now suppose uniqueness of RREF holds for matrices of sizem-by-(n-l). We shall show uniqueness also holds for matrices of size m-by-nand our result will follow by induction.

Suppose we produce from A a matrix in row reduced echelon form in twodifferent ways by using elementary row operations. Then there exist R, andR2 invertible with R,A = Ech, and R,A = Ech2, where Ech, and Ech2are matrices in RREF. We would like to conclude Ech, = Ech2. Note that ifwe partition A by isolating its last column, A = [A' I then R, A =[R,A' I Ech, and R2A = [R2A' I Ech2. Here isa key point of the argument: any sequence of elementary row operations thatyields RREF for A also puts A' into row reduced echelon form. Hence, by theinduction hypothesis, we have R, A' = R2A' = A" since A' is m-by-(n-1). Wedistinguish two cases.

CASE 4.1Every row of A" has a leading 1. This means there are no totally zero rows in thismatrix. Then, by Theorem 2.7 of Chapter 2, the columns of A corresponding tothe columns with the leading one's in A" form an independent set and the lastcolumn of A is a linear combination of corresponding columns in A with thecoefficients coming from the last column of Ech 1. But the same is true about thelast column of Ech2. By independence, these scalars are uniquely determined

158 The Moore-Penrose Inverse

so the last column of Ech, must equal the last column of Ech2. But the lastcolumn is the only place where Ech, could differ from Ech2, so we concludein this case, Ech, = Ech2.

CASE 4.2A" has at least one row of zeros. Let's assume Ech, A Ech2 and seeka contradiction. Now again, the only place Ech, can differ from Ech2 isin the last column, so there must exist a j with bj # cj,,, where bj,, =entjn(Ech,) and cj = entjn(Ech2). Recall from Theorem 3.2 of Chapter3 that Null(Ech,) = Null(A) = Null(Ech2). Let's compare some null

x,

spaces. Suppose x = E Nllll(A). Then Ech,x = Ech2x =

xn

so (Ech, - Ech2)x = 0 . But the first n - 1 columns of Ech, -Ech2 are zerobin - cin xl 0

so [0 I ]

=

, which implies (bin - cin)xn = 0

bmn - Cn"i x 0for all i- in particular, when i = j, (bj -cjn )x = 0. By assumption, bjn -Cj

# 0, so this forces xn = 0, a rather specific value. Now if u

Null (Ech, ), then ' = Ech, u =

U,

Un

b(k+i ),,Un

b(k+2)n U n

L bmnun j

where k + I is the first full row of zeros in A". If b(k+,),,, ... , bn,,, all equalzero, then u can be any number and we can construct vectors u in Nu ll (Ech, )without a zero last entry. This contradicts what we deduced above. Thus, some bin that list must be nonzero. If b(k+, ) were zero, this would contradict that Ech,is in RREF. So, b(k+,) must he nonzero. Again quoting row reduced echelonform, b(k+,) must be a leading one hence all the other bs other than b(k+,)nmust be zero. But exactly the same argument applies to Ech2 so C(k+I)n mustbe one and all the other cs zero. But then the last column of Ech, is identicalto the last column of Ech2, so once again we conclude, Ech, = Ech2. This isour ultimate contradiction that establishes the theorem. 0

4.1 Row Reduced Echelon Form and Matrix Equivalence

For example,

i 2 - 4i 3 4i 03 2-7i 1 2i 0

RREF 3+2i 6- 15i 7 10i 0

0 0 0 0 0

= T12(4 + 2i)D2114 - i T32(-1)DI (-i)T31(-2 + 3i )i 2 -4i 3 4i 03 2-7i 1 2i 0

T21(31) 3 +2i 6 - 151 7 10i 00 0 0 0 0

-L i 3 4i 0196 l 9+ i i 0 0 i 2-4197 + 9 t 197 + 1971 0 3 2 - 7l I 21 0

-2 -1 1 0 3+21 6- 151 7 10i 0

0 0 0 1 0 0 0 0 0

1 0 -234 7gq3i 7yy677 2761

0157 177 170 19601 197 + 1971 197 + 197 i

0 0 0 0 0

0 0 0 0 0

159

Now, do you think we used a computer on this example or what'?The reader will no doubt recall that the process of Gauss elimination gives

an algorithm for producing the row reduced echelon form of a matrix, and thisgave a way of easily solving a system of linear equations. However, we have adifferent use of RREF in mind. But let's quickly sketch that algorithm for usingRREF to solve a system of linear equations:

1. Write the augmented matrix of the system Ax = b, namely [A I b].

2. Calculate RREF([A I b]).

3. Write the general solution, introducing free variables for each nonpivotcolumn.

Note that if every column of the RREF coefficient matrix has a leadingone, the system can have at most one solution. If the final column also has aleading one, the system is inconsistent (i.e., has no solution). Recall that thenumber of leading ones in RREF(A) is called the pivot rank (p-rank(A)) ofA. Finally, recall that a system Ax = b with n unknowns has no solution ifp-rank(A) # p-rank([A I b]). Otherwise the general solution has n - (p-rank(A)) free variables.


4.1.1 Matrix Equivalence

Next, we consider matrix equivalence. Recall that we say that matrix A ECis equivalent to matrix B E C",- if B can be obtained from A by applyingboth elementary row and elementary column operations to A. In other words,A is equivalent to B if there exist invertible matrices S E C"" and T in C0`1such that SAT = B. Recall that invertible matrices are the same as products ofelementary matrices so we have not said anything different.

DEFINITION 4.2 (matrix equivalence)Let A and B be m-by-n matrices. We say A is equivalent to B and write in

symbols A B if there exist invertible matrices S in C`111 and T in C""such that B = SAT.

This is our first example of what mathematicians call an equivalence relationon matrices. There are in fact many such relations on matrices. But they allshare the following crucial properties:

Reflexive Law: Every matrix is related to itself.

Symmetric Law: If matrix A is related to matrix B, then matrix B mustalso be related to matrix A.

Transitive Law: If matrix A is related to matrix B and matrix B is relatedto matrix C, then matrix A is related to matrix C.

Clearly, equality is such a relation (i.e., A = B). Do the names of the Lawsabove make any sense to you? Let's make a theorem.

THEOREM 4.2Matrix equivalence is an equivalence relation; that is,

1.

A A where A and B are in C0"".

3. If A ti B and B-- C then, A-- C where A, B and C are in C"'X"


The nice thing about equivalence relations is that they partition the set ofmatrices Cnixn into disjoint classes such that any two matrices that share aclass are equivalent and any two matrices in different classes are not equivalent.Mathematicians like to ask, is there an easy way to check if two matrices

4.1 Row Reduced Echelon Form and Matrix Equivalence 161

are in the same class'? They also want to know if each class is represented by aparticularly nice matrix. In the meantime, we want to extend the notion of RREF.

You may have noticed that the leading ones of RREF do not always line upnicely to form a block identity matrix in the RREF. However, all it takes issome column swaps to get an identity matrix to show up. The problem is thatthis introduces column operations, whereas RREF was accomplished solelymth row operations. However, the new matrix is still equivalent to the one westarted with. Thus, we will speak of a modified RREF of a matrix when we usea permutation matrix on the right to get the leading ones of RREF to have anidentity matrix block.

THEOREM 4.3Suppose A E there exists an11invertible matrix R and a permutation

matrix P such that RAP =L

®®1. Moreover, the first r columns of AP

-Cform a basis for the column space of A, the columns of P . . . form a

In-r'r

basis for the null space of A, and the columns of P . . form a basis of theC*

column space of A*.

PROOF We know there is an invertible matrix R such that RREF(A) _G

RA = . Suppose the pivot columns occur at ct, c2, ... ,cr in RA.

Consider the permutation or that takes ct to 1, c2 to 2, and so on, and leavesevery thing else fixed. Form the permutation matrix P(cr). In other words,column j of P(a) is e,, and the other columns agree with the columns of the

identity matrix. Then RAP = r 0 and the first r columns of A P form10 1

a basis for the column space of A. The remaining details are left to the reader. U

0 0 1 4 6 0 0 7 5

0 0 0 0 0 1 0 6 3

For example, consider A 0 0 0 0 0 0 1 4 8 , which0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

is already in RREF. The swaps we need to make are clear: 1 H 3, 2 H 6,and 3 Fa 7. Thus, the permutation a = (37)(26)(13) = (173)(26) and the

162

The

Moore-Pen

rose

Inverse

0 0 0 0 0 0 1 0 0

0 0 0 0 0 1 0 0 0

1 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0

perm

utat

ion

mat

rix is P

(Q)

= 0 0 0 0 1 0 0 0 0 Fin

ally

,

0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 1

1 0 0 4 6 0 0 7 5

0 1 0 0 0 0 0 6 3

AP

(o)

= 0 0 1 0 0 0 0 4 8

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

You

may be w

onde

ring

,

why

stop at a pe

rmut

atio

n

mat

rix

at the

righ

t

of A.

Onc

e

you

have

that

iden

tity

mat

rix

bloc

k,

you

can

cont

inue

with

tran

svec

tions

(i.e

.,

colu

mn

oper

atio

ns

pivo

ting

off

the

ones

)

and

"zer

o

out"

the

mat

rix

C.

Thi

s

is, in fa

ct,

the

case

and

lead

s

to a very

impo

rtan

t

resu

lt

and

norm

al

form

.

TH

EO

RE

M

4.4

(ran

k

norm

al

form

)

Any

mat

rix

A E C,

"" is equi

vale

nt

to a uniq

ue

mat

rix

of the

form 1, if m = n

r 11 0 1 Jr

® 0 I ifm

>r,

n>r,

E1r

:01i

fm=

r<no

r

ifm

>r=01

n, calle

d

the

rank

norm

al

form of A and

deno

ted

RN

F(A

).

The

zero

mat

rix is

in a clas

s

by itsel

f

PRO

OF

Let A E C

mxn

. If we

unde

rsta

nd

that

ther

e

can be em

pty

bloc

ks

of zero

s,

we

argu

e

that A ti

L ® ®]. Fi

rst

appl

y

row

oper

atio

ns

(i.e

.,

elem

enta

ry

mat

rice

s

on the

left

)

to prod

uce

RR

E

F(A

).

Say

the

colu

mn

num

bers

of the

lead

ing

ones

are

ci,

c2,

, Cr. The

n

use

colu

mn

swap

s

(i.e

.,

perm

utat

ion

mat

rice

s

on the

righ

t)

to prod

uce

L r ®]

. Fina

lly,

use

tran

svec

tions

with

the

help of the

ones in 1, to ze

ro-o

ut B row by row

.

You

have

now

achi

eved

SAT

=[® ®],

CO

RO

LLA

RY

4.1

Let

A, B be

long

to C"'

x".

The

n

A is equi

vale

nt

to B iff

A and B ha

ve

the

sam

e

rank

.


PROOF The proof is left as an exercise. U

These results give a very nice way to determine the classes under matrixequivalence. For example, C2x4 has three classes; (02x4}, {all matrices of rank1), {all matrices of rank 2}. Actually, we can do a bit better. We can pick acanonical representative of each class that is very nice. So the three classes

rare {02x4}, {all matrices equivalent to

1 0 0 OL 0 0 0 0 J } and {all matrices

equivalent tol 0 0 00 I 0 0

There is an algorithm for producing matrices S and T that put A into itsrank normal form. First, adjoin the identity matrix to A, [A 1 11. Next, rowreduce this augmented matrix: [A 1 1] -* [RREF(A) I S]. Then, S is aninvertible matrix with SA = RREF(A). Now, form the augmented matrixRR EF(A)I and column reduce it: [RREF(A)] , [RNF(A)] Then T is invertibleand SAT = RNF(A).

We can refine the notion of matrix equivalence to row equivalence and columnequivalence by acting on just one side of the matrix with elementary operations.

DEFINITION 4.3 (row equivalence, column equivalence)

1. We say the matrices A and B in (C"' are row equivalent and writeA `'R B iff there exists an invertible matrix S with SA = B.

2. We say A and B are column equivalent and write A -,c B iff there is anonsingular matrix T with AT = B.

In other words, A is row equivalent to B if we can obtain B from A byperforming a finite sequence of elementary row operations, and A is columnequivalent to B iff B can be obtained from A by performing a finite sequenceof elementary column operations on B.

THEOREM 4.SLet A and B be in C'nxn. Then the following statements are all equivalent:

1. A -R B.

2. RREF(A) = RREF(B).

3. Col(AT) = Col(BT)

4. Null(A) Null(B).



A similar theorem holds for column equivalence.

THEOREM 4.6Let A and B be in C"' x". Then the following statements are all equivalent:

I. A -C B.

2. RREF(BT) = RREF(BT).

3. Col(A) = Col(B).

4. JVull(AT) = Arull(BT ).

PROOF As usual, the proof is left to the reader.

Exercise Set 13

0

0

1. Prove that matrix equivalence is indeed an equivalence relation. (This isTheorem 4.2.)

2. Prove that matrices A and B are equivalent if and only if they have thesame rank. (This is Corollary 4.1.)

3. Describe the equivalence classes in C3x3 under ti .

4.

1240 862 1593 2278

2300 2130 1245 2620_Reduce A to its canonical form

2404 2200 1386 2818

under .

488 438 309 598

5. You may have noticed that your calculator has two operations that dorow reductions, ref and rref. Note that rref is what we talked about above,RREF. This is unique. There is a weaker version of row reduction thatis not unique, ref-row echelon form. Here you demand (1) all totallyzero rows are at the bottom and (2) if the first nonzero entry in row i isat position k, then all the entries below the ith position in all previouscolumns are zero. Argue that the positions of the pivots are uniquelydetermined even though the row echelon form need not be unique. Arguethat the number of pivots is the rank of A, which is the same as the numberof nonzero rows in any row echelon form. If you call a column of A basic


if it contains a pivot position, argue the rank of A is the same as thenumber of basic columns.

6. Is there a notion of column reduced echelon form, CREF(A)? If so,formulate it.

7. Suppose A is square and nonsingular. Is A -- A-'? Is A `'R A-1? IsA -c A-'?

8. Argue that A B iff AT Pt BT .

9. Prove that A '^R B iff AT -c BT

10. Argue that A -c B or A -R B implies A N B.

1 I. We say A and B are simultaneously diagonable with respect to equiva-lence if there exist nonsingular matrices S and T such that SAT = D,and SBT = D2 where D, and D2 are diagonal. Create a pair of 2-by-3matrices that are simultaneously diagonable with respect to equivalence.

12. Prove Theorem 4.5.


14. Argue that the linear relationships that exist among the columns ofRREF(A), which are easy to see, are exactly the same as the linear rela-tionships that exist among the columns of A. (Hint: Recall Theorem 2.7of Chapter 2.)

15. Many people call the columns of A corresponding to the leading onecolumns of RREF(A) the basic columns of A. Of course, the othercolumns of A are called nonbasic columns. Argue that the basic columnsof A form a basis of the column space of A. Indeed, only the basic columnsoccurring to the left of a given nonbasic column are needed to expressthis nonbasic column as a linear combination of basic ones.

16. Argue that if A is row equivalent to B, then any linear relationship amongthe columns of A must also exist among the same columns of B with thesame coefficients.

17. In view of exercise 16, what can you say if A is column equivalent to B?

18. Explain why the algorithm for producing the RNF of a matrix given inthe text above works.

19. Here is another proof that row rank equals column rank: We use RREFin this argument. First let R = RREF(A) = SA, where S is m-by-m


invertible and A is m-by-n. Conclude that Row(R) = Row(A). WriteA = [a, l a, I I. I

a,,] so that R = SA = [Sa, I Saz l . I Sa 1. LetB = {Sad,, Saj ... , Saj, } he the columns of R with the leading onesin them. Argue that 5 is a basis for Col(R). Since S is invertible, arguethat {aj,, aj,, ... , aj, } is an independent set. If cj is any column of A,Sc, . is a linear combination of the columns of B. Therefore, conclude c,is a linear combination of cj,, cj,, ... , cj,. Conclude that dim(Row(A))= r = dim(Col(A)).

20. If A is nonsingularn-by-n and B is n-by-r, argue that RREF([A I B]) _[I I A-i B].

21. If two square matrices are equivalent, argue that they are either bothinvertible or both singular.

22. Suppose T is a linear map from C" to C"'. Suppose B and B1 arebases of C" and C and C, are bases of C"'. Argue that Mat(T;B,,C,)= PMat(T;B,C)Q-l, where P is the transition matrix for C to C, andQ is the transition matrix from B to Cit. Deduce that any two matrixrepresentations of T are matrix equivalent.

23. Tell whether the following matrices are in REF, RREF, or neither:12 4 6 8 5 5 0 0 0

1 0 00 1 3 5 0 0 7 3 0

0 0 7 9 0 0 0 0 20 1 2

1 2 3 41 0 0 0 7

[ 1 0 1 00 0 1 0 5

0 I 0 2 3

24. Make up an example to find two different REFs for one matrix.

25. If you are brave, do you think you can find the RREF of a matrix withx-4 3 3

polynomial entries? Try 2 x - 1 1

-3 -3 x - 9

26. Suppose two people try to solve Ax = b but choose different ordersfor listing the unknowns. Will they still necessarily get the same freevariables?

27. Use rank normal form of a matrix A of rank r to prove that the largestnumber of columns (or rows) of A that are linearly independent is r.Argue that this is equivalent to saying A contains an r-by-r nonsingularsubmatrix and every r + I-by-r + I submatrix is singular.

28. Fill in the details of the proof of Theorem 4.3.


Further Reading

I H&K, 19711 Kenneth Hoffman and Ray Kunze, Linear Algebra, 2ndEdition, Prentice Hall Inc., Englewood Cliffs, NJ, (1971).

[L&S, 2000] Steven L. Lee and Gilbert Strang, Row Reduction of aMatrix and A = CaB, The American Mathematical Monthly, Vol. 107,No. 8, October, (2000), 681-688.

[Yuster, 1984] Thomas Yuster, The Reduced Row Echelon Form of aMatrix Is Unique: A Simple Proof, The American Mathematical Monthly,Vol. 57, No. 2, March, (1984), 93-94.

4.1.2 MATLAB Moment

4.1.2.1 Row Reduced Echelon Form

MATLAB has a built in command to produce the RREF of a matrix A. Thecommand is

rref(A)

Let's look at some examples.

>> B=round(] 0*rand(3,4))+round(10(3,4))*i

B=Columns I through 3

1.0000 + 8.0000i 6.0000 + 7.0000i 0 + 7.000012.0000 + 5.0001 3.0000 + 8.0000i 7.0000 + 4.0000i

2.0000 + 2.00001 2.0000 4.0000 + 8.0000i

Column 4

9.0000 + 5.0000i5.0000 + 7.0000i4.0000 + 4.00001

> >rref(B)

Columns I through 3

1.0000 0 00 1.0000 00 0 1.0000


Column 4

-2.2615 - I.2923i1.5538 + 1.0308i1.0462 + 0.1692i

There is actually more you can do here. The command

[R, jb] = rref(A)

returns the RREF R and a vector jb so that jb lists the basic variables in thelinear system Ax=b, r=length(jb) estimates the rank of A, A(:,jb) gives a basisfor the column space of A. Continuing our example above,

> > [R, jb]=rref(B)

R=

Columns I through 3

1.0000 0 00 1.0000 00 0 1.0000

Column 4

-2.2615 - 1.2923i1.5538 + I.0308i1.0462 + 0.16921

jb =

1 2 3

Let's get a basis for the column space of B.

>>B(:,jb)

ans =

1.0000 + 8.0000i 6.0000 + 7.0000i 0 + 7.0000i2.0000 + 5.000i 3.0000 + 8.0000i 7.0000 + 4.0000i2.0000 + 2.0000i 2.0000 4.0000 + 8.0000i

Of course, this answer is not surprising. You might try experimenting withthe matrix C=[ 1 2 3;2 4 5;3 6 9].

There is a really cool command called

rrefniovie(A )

This steps you through the process element by element as RREF is achievedfor the given matrix.



4.1.3.1 Pivoting Strategies

In theory, Gauss elimination proceeds just fine as long as you do not run intoa zero diagonal entry at any step. However, pivots that are very small (i.e., nearzero) can cause trouble in finite-precision arithmetic. If the pivot is small, themultipliers derived from it will be large. A smaller multiplier means that earliererrors are multiplied by a smaller number and so have less effect being carriedforward. Equations (rows) can be scaled (multiplied by a nonzero constant), sowe should choose as pivot an element that is relatively larger in absolute valuethan the other elements in its row. This is called partial pivoting. This will makethe multipliers less than I in absolute value. One approach is to standardize eachrow by dividing row i by >j Jail 1. Or we can just choose the largest magnitudecoefficient aki to eliminate the other xk coefficients.

An easy example will illustrate what is going on. Consider the system

1-x+y=anx+y=b

where n is very large compared to a and b. Using elementary operations, weget

1-x+y=an(I-n)y=b-na.

Thus

b - nay- 1-nx = (a - y)n.

When n is very large, the computer will see I - n as -n and b - na as -naso the answer for y will be a, and hence x will be zero. In effect, b and I areoverwhelmed by the size of n so as to disappear. On the other hand, if we simplyswap the two equations,

x+y=b-x+y=an

and eliminate as usual, we get

x+y=bC I) b1-- y=a --

n n


so

a - !'Y- I-!x =b-y.

In summary then, the idea on partial pivoting is to look below the currentpivot and locate the element in that column with the largest absolute value. Thendo a row swap to get that element into the diagonal position. This ensures themultipliers will be less than or equal to I in absolute value. There is anotherstrategy called complete pivoting. Here one searches not just below the currentpivot, but in all remaining rows and columns. Then row and column swaps arenecessary to get the element with largest absolute value into the pivot position.The problem is you have to keep track of row and column swaps. The columnswaps do disturb the solution space so you have to keep up with changing thevariables. Also, all this searching can use up lots of computer time.

4.1.3.2 Operation Counts

Operation counts give us a rough idea as to the efficiency of an algorithm.We count the number of additions/subtractions and multiplications/divisions.Suppose A is an n-by-n matrix and we wish to solve Ax = b (n equations in Ifunknowns).

Algorithm

2.

Gauss Elimination with back Additions/Subtractionssubstitution n; + 1 n2 - 11,

Multiplications/Divisions113 + n2 - Ii

Gauss-Jordan elimination (RREF) Additions/Subtractionsn3

+ ;n2 - 6nMultiplications/Divisions1113+n2- 1fin

Cramer's rule Additions/Subtractionsn4 - bn. - n2 + bn

M ulti plications/Divisions311 3 T in2 + ;n -

x = A-1b if A is invertible Additions/Subtractionsn;-I!2Multiplications/DivisionsIf 3+If2

4.2 The Hermite Echelon Form 171

Interestingly, I and 2 have the same number of counts. To understand why,note that both methods reduce the augmented matrix to a REF. We leave it asan exercise to see that the number of operations to do back substitutions is thesame as continuing to RREF.

Further Reading

[Anion, 19941 Howard Anton, Elementary Linear Algebra, 7th Edition,John Wiley & Sons, New York, (1994).

[C&deB 19801 Samuel D. Conte and Carl de Boor, ElementaryNumerical Analysis, 3rd Edition, McGraw-Hill Book Company, NewYork, (1980).

[Foster, 1994] L. V. Foster, Gaussian Elimination with Partial PivotingCan Fail in Practice, SIAM J. Matrix Anal. Appl., 15, (1994), 1354-1362.

4.2 The Hermite Echelon FormThere is another useful way to reduce a matrix, named in honor of the

French mathematician Charles Hermite (24 December 1822 - 14 January1901), that is very close to the RREF. However, it is only defined for squarematrices. Statisticians have known about this for some time.

DEFINITION 4.4 (Hermite echelon form)A matrix H in C" " is in (upper) Hermite echelon form if

1. H is upper triangular (hid = entij (H) = 0 if i > j).

2. The diagonal of H consists only of zeros and ones.

3. If a row has a zero on the diagonal, then every element of that row iszero; if hii = 0, then hik = O for all k = 1, 2, ... , n.

4. If a row has a I on the diagonal, then every other element in the columncontaining that I is zero; i f hii = 1 , then hji = O for all j = 1 , 2, ... , nexcept j = i.


The first interesting fact to note is that a matrix in Hermite echelon form mustbe idempotent.

THEOREM 4.7Let H E C""" be in Hermite echelon form. Then H2 = H.

PROOF Let bik he the (i,k) entry of Hz. Then the definition of matrixH i-I H

multiplication gives bik = Ehiihjk = >hijhjk + hiihik + j hiihjk. Ifj=1 j=1 j=i+l

i > k, then bik = 0 since this is just a sum of zeros. Thus H2 is upper triangular.

Ifi < k, then bik = >hijhjk. Weconsidercases. Ifhii = 0, then by (3), hij = 0j=i

f o r all j = 1,2, ... , n so bik = 0 = hik. If hii 0 0, then hii must equal 1, so11

bik = hik + r_ hijhjk . Now whenever hij 0 0 for i + I < j < tt, we havej=i+I

by (4) that hjj = 0 so from (3), h j,,, = 0 for all m. This is so, in particular, form = k. Thus, in any case, bik = hik so H2 = H. 0

THEOREM 4.8Every matrix A in C"can be brought into Hermite echelon form by usingelementary row operations.

PROOF First we use elementary row operations to produce RREF(A). Thenpermute the rows of RREF(A) until each first nonzero element of each nonzerorow is a diagonal element. The resulting matrix is in Hermite echelon form. 0

3 6 9 1 2 0For example, RREF( I 2 5 )= 0 0 1. Indeed

2 4 10 0 0 0I J [6s 0 3 6 9

1 11 2 0

6 z 0 1 2 5 = 0 0 1 J. To get the Hermite0

21 4 2 4 10 0 0 0echelon form, which we shall denote HEF(A), simply permute the second and

1s -;

?0 3 6 9 1 2 0

third rows. Then 0 I 1 2 5 = 0 0 0 = H.6 z 0 2 4 10 0 0 1

The reader may verify Hz = H. Thus, our algorithm f'or finding HEF(A) isdescribed as follows: use elementary row operations to produce [A

1 I] -[HEF(A) I S]. Then SA = HEF(A).


COROLLARY 4.2For any A E C", there exists a nonsingular matrix S such that SA is inHermite echelon form. Moreover, ASA = A.

1 2 1

For example, for A = 2 3 1

1 1 0

1 2 1

D2(-I)T32(-1)T31(-1)T21(-2) 2 3 1

1 1 0 1

1 0 -10 1 1

0 0 0H

or

1-3 2 0 1 [ 1 2 1 1 0 -112 -1 0 2 3 1 = 0 1 1

11 -1 1 1 1 0 0 0 0

THEOREM 4.9The Hermite echelon form of a matrix is unique, justifying the notation HEF(A)for A E C"xn

PROOF (Hint: Use the uniqueness of RREF(A).)The proof is left as exercise. U

Note that the sequence of elementary operations used to produce HEF(A) isfar from unique. For the example above:

0 -1 3 I 2 1 1 0 -10 1 -2 2 3 1 = 0 1 1

1 -1 1 1 1 0 0 0 0

The fact that H = HEF(A) is idempotent means that there is a direct sumdecomposition lurking in the background. The next result helps to indicate whatthat is.

COROLLARY 4.3For any A E C"', Mull(A) = Null(HEF(A)) =Col(/ - HEF(A)). More-over, the nonzero columns of 1 - HE F(A) yield a basis for the null space ofA. Also rank(A) = rank(HEF(A)) = trace(HEF(A)).


THEOREM 4.10Let A and B E C""". Then HEF(A) = HEF(B) iffCol(A*) =Col(B*)

PROOF First, suppose Col(A*) = Col(B*). Then there exists an invertiblematrix S with A*S* = B*. This says that SA = B. Now there exists Tnonsingular with TB = HEF(B). Then HEF(B) = TB = TSA = (TS)Ais a matrix in Hermite echelon form. By uniqueness, HEF(B) = HEF(A).Conversely, suppose H = HEF(A) = HEF(B). Then, there are nonsingularmatrices Sand T with SA = H = TB. Then A = S-I T B so A* = B*(S-J T)*.But (S-1 T) is nonsingular so Col(A*) = Col(B*). 0

COROLLARY 4.4

1. For any A E C""", HEF(A*A) = HEF(A).

2. For A E C""" and S E C""" nonsingular, HEF(SA) = HEF(A).

In other words, row equivalent matrices have the same Hermite echelon form.

THEOREM 4.11Let H = HEF(A) for some A in C""". Suppose that the diagonal ones of Hoccurs in columns numbered c1, c2, ... , ck. Then the corresponding columnsof A are linearly independent.


COROLLARY 4.5Consider the i"' column of A, a,. This column is a linear combination of theset of linearly independent columns of A as described in the theorem above.The coefficients of the linear combinations are the nonzero elements of the i`hcolumn of HEF(A).

Exercise Set 14

1. Fill in the arguments for those theorems and corollaries given above.

2. Prove that H = HEF(A) is nonsingular iff H = 1.


3. Argue that if A is nonsingular, then HEF(A) = I.

4. Let S be a nonsingular matrix with SA = HEF(A) = H. Argue thatAH=A.

5. Let A E C""", S E C"" nonsingular with SA = HEF(A) = H. Thenprove that AH = A.

6. Let H = HEF(A). Argue that A is idempotent iff HA = H.

7. Define A -- B iff Col(A*) = Col(B*). Is an equivalence relation? IsHEF considered as a function on matrices constant on the equivalenceclasses?

8. Create an example of a matrix A with nonzero entries such that H =1 0 1 1 [ 1 1 0 1 [ I I I

HEF(A) = 0 1 1 , 0 0 0, 0 0 00 0 0 0 0 1 0 0 0

9. Check to see that AH = H in the examples you created above.

10. Spell out the direct sum decomposition induced by H = HEF(A).

11. Fill in a proof for Corollary 4.2.

12. Make a proof for Theorem 4.9.



15. Make a proof for Theorem 4.11.


Further Reading

[Graybill, 19691 Franklin A. Graybill, Introduction to Matrices withApplications in Statistics, Wadsworth Publishing Co., Inc., Belmont, CA,(1969).

176 The Moore-Penrose hiverse

4.3 Full Rank FactorizationThere are many ways to write a matrix as the product of others. You will

recall the LU factorization we discussed in Chapter 3. There are others. In thissection, we consider a factorization based on rank. It will be a major theme ofour approach to matrix theory.

DEFINITION 4.5 (full rank factorization)Let A be a matrix in Cr` with r > 0. If there exists F in C""', and G

in Cr x" such that A = FG, then we say we have a full rank factorizationof A.

There are the usual questions of existence and uniqueness. Existence canbe argued in several ways. One approach is to take F to be any matrix whosecolumns form a basis for the column space of A. These could be chosen fromamong the columns of A or not. Then, since each column of A is uniquelyexpressible as a linear combination of the columns of F, the coefficients in thelinear combinations determine a unique G in Cr-,,,, with A = FG. Moreover,r=r(A)=r(FG)<r(G)<r.Thus GisinCrr'< .

Another approach is to apply elementary matrices on the left of A to producethe unique RREFof A. That is, we produce an invertible matrix R in Cmx"' with

GrxnRA = , where r = r(A) = r(G) and O(n,-r)x is the (m-r)-

0(,n-r)xn

by-n zero matrix. Then, A = R-) I

®Vn-r)xn I.With a suitable partitioning

of R-), say R-' = [R) R2J, where R, is m-by-r and R2 is m-by-(m - r),

GA = (R, : RZJ = RIG + R20 = RIG. Take F to be RI. Since

R-' is invertible, its columns are linearly independent so F has r independentcolumns and hence has full column rank. We summarize our discussion with atheorem.

THEOREM 4.12Every matrix A in C;' x" with r > 0 has a full rank factorization.

4.3 Full Rank Factorization 177

Even better, we will now describe a procedure, that is, an algorithm, forcomputing a full rank factorization of a given matrix A that works reasonablywell for hand calculations on small matrices. It appears in [C&M, 1979]. Let Abe in C; "

Step 1. Use elementary row operations to reduce A to RREF(A).

Step 2. Construct a matrix F by choosing the columns of A that corre-spond to the columns with the leading ones in RREF(A) placing them inF in the same order they appear in A.

Step 3. Construct a matrix G by taking the nonzero rows of RREF(A)and placing them as the rows of G in the same order they appear inRREF(A).

Then, A = FG is a full rank factorization of A.Now for the bad news. As you may have guessed by now, not only do full rank

factorizations exist, they abound. After all, in our first construction describedabove, there are many choices for bases of the column space of A, hence manychoices for F. Indeed, if A = FG is one full rank factorization of A in (Cr""with r > 0, choose any invertible matrix R in C; `. Let FR = FR and GR =R-' G. Then clearly A = FRGR is also a full rank factorization of A. Actually,this will turn out to be good news later since we will be able to select anR to produce very nice full rank factorizations. Again, we summarize with atheorem.

THEOREM 4.13Every matrix A in C;"`" with r > 0 has infinitely many full rank factorizations.

Example 4.13 6 13 1 2 0

Let A = 2 4 9 . Then RREF(A) = 0 0 1 so we take G =1 2 3 0 0 0

1 2 0 1 3 13

0 0 1J and F = 2 9 . The reader may verify that A = FG is

1 3indeed a full rank factorization of A.

178 The Moore-Penrose hiver.se

Exercise Set 15

1. Compute full rank factorizations for the following matrices:1 1 0 1 1 2 1 2 3 1 3 0

1 0 1 1 0 1 1 2 3 1 2 1

1 1 0 ' 1 1 2 ' 1 2 3 ' 1 3 0 '

1 0 1 1 0 1 1 2 3 1 2 1

I l 1 I 1 1 1 1

1 0 1 0 ,

1 [1 0 1 0 .

0 1 0 1 2 1 2 1

2. Argue that any A in C"' "' can be written A = L K, where L has a leftinverse and K has a right inverse.

3. Suppose A = FG is a full rank factorization of A and C" = Null(A)Co!(A). Argue that (G F)-' exists and E = F(GF)-'G is the projectorof C" onto Co!(A) along Mull(A).

4. Use full rank factorizations to compute the index of the followingmatrices:

I

11 0 1 1 2

1 0 1 1 0 1

1 1 0' 1 1 2

1 0 1 1 0 1

11 I I I I I

1 0 1 0, 1 0 1

0 1 0 1 2 1 2

2 3

2 3

2 3

2 3}

3 0

2 1

3 0

2 1

}

5. Suppose A has index q. Then C" = Col(Ay) ®.AIull(A`"). We can use afull rank factorization of A" to compute the projector of C" onto Col(A")alongArull(A"). Indeed, let Ay = FG be a full rank factorization. Arguethat F(GF)-I G is the projector of C" onto Col(Ay) along Afull(A").

6. Suppose A = FG is a full rank factorization. Argue that A = AZ iffGF=l.

7. Argue that a full rank factorization of a matrix A can he obtained by firstselecting a matrix G whose rows form a basis for 'Row(A). Then F mustbe uniquely determined.

8. Explain how to produce a full rank factorization from the modified RREFof a matrix.

4.4 The Moore-Penrose Inverse 179

9. Suppose A = A2. Prove the the rank of A is the trace of A.

10. (G. Trenkler) Suppose A and B are n-by-n idempotent matrices withA + B + AB + BA = 0. What can you conclude about A and B?

Further Reading

[C&M, 1979] S. L. Campbell and C. D. Meyer, Jr., Generalized Inversesof Linear Transformations, Dover Publications, Inc., New York, (1979).

4.3.1 MATLAB Moment

4.3.1.1 Full Rank Factorization

We can create an M-file to compute a full rank factorization of a matrix. Bynow, you should have this down.

I function FRF=frf(A)2 [R,Jp] = rref(A)3 r = rank(A)4 fori= 1:r5 G(I,:) = R(i,:)6 end7 F = A(:,jp)8 G

Experiment with this routine on some matrices of your own creation.

4.4 The Moore-Penrose Inverse

In this section, we develop a key concept, the Moore-Penrose inverse (MP-inverse), also known as the pseudoinverse. What is so great about this inverseis that every matrix has one, square or not, full rank or not. Our approach to thepseudoinverse is to use the idea of full rank factorization; we build up from thefactors of a full rank factorization. The idea of a generalized inverse of a singularmatrix goes back to E. H. Moore [26 January, 1862 to 30 December, 19321 ina paper published in 1920. He investigated the idea of a "general reciprocal" of


a matrix again in a paper in 1935. Independently, R. Penrose 18 August, 1931 ]rediscovered Moore's idea in 1955. We present the Penrose approach.

DEFINITION 4.6 (Moore-Penrose inverse)Let A be any matrix in C011". We say A has a Moore-Penrose inverse (or

just pseudoinverse for short) iff there is a matrix X in C" ""' such that

(MPI) AXA = A(MP2) X AX = X(MP3) (AX)' = AX(MP4) (X A)* = X A.

These four equations are called the Moore-Penrose equations and the order inwhich they are written is crucial for our subsequent development. Indeed, laterwe will distinguish matrices that solve only a subset of the four Moore-Penroseequations. For example, a 1-inverse of A would be a matrix X that is requiredto solve only MPI. A {I,2}-inverse of A would be required to solve only MPIand MP2. Now, we settle the issue of uniqueness.

THEOREM 4.14 (the uniqueness theorem)If A in C", x" has a pseudoinverse at all, it must be unique. That is, there can

be only one simultaneous solution to the four MP-equations.

PROOF Suppose X and Y in C"""' both satisfy the four Moore-Penroseequations. Then X = X(AX) = X(AX)* = XX*A* = XX*(AYA)* _XX'A*(AY)* = X(AX)*(AY)* = XAXAY = XAY = XAYAY =

=(XA)*(YA)"Y = A'X'A'Y'Y = (AXA)'Y'Y = A*Y*Y = (YA)'YYAY = Y.

The reader should be sure to justify each of the equalities above and note thatall four Moore-Penrose equations were actually used. 0

In view of the uniqueness theorem for pseudoinverses, we use the notationA+ for the unique solution of the four MP-equations (when the solution exists,of course, which is yet to be established). Since this idea of a pseudoinverse maybe quite new to you, the idea that all matrices have inverses may be surprising.We now spend some time on a few concrete examples.

Example 4.2

1. Clearly /,+ = I,, for any n and ® x = ® (Does this give us a chanceto divide by zero?)


2. Suppose A is square and invertible. Then A+ = A-'. This is, of course,how it should be if the pseudoinverse is to generalize the idea of ordinaryinverse. Let's just quickly verify the four MP-equations. AA-1 A = Al =A, A-'AA-' = A-'I = A-', (AA-')* = /* = / = AA-1, and(A-' A)* = 1* = I = A-' A. Yes, they all check.

Suppose P is a matrix such that P = P* = P2. Later we shall call such amatrix a projection (also known as a Herrnitian idempotent). We claim forsuch a matrix, P = P+. Again a quick check reveals, PP+P = PPP =PP = P, P+PP+ = PPP = P = P+, (PP+)* _ (PP)* = P* =P=PP=PP+,and(P+P)*=(PP)*=P*=P=PP=P+P.

3.

Once again, we are golden. So, for example,1 0 0 +

0 1 0 =0 0 0

00sand s0

4 -2 005 5

0 = Z0s s

0 0 0 0

i5

0

4. Let's agree that for a scalar k, k+ _ if k # 0 and k+ = 0 if )\ = 0.Let D be a diagonal matrix, say D = diag(di, d2, ... , We claimD+ = diag(di , dz . ... , d,+,). We leave the details as an exercise. In

1 0 0 + 1 0 0particular then, 0 2 0 = 0 z 0

0 0 0 0 0 0

5. What can we say about an n-by- I matrix? In other words we are justIb, 1

looking at one column that could be viewed as a vector. Say b =b2

L b,. JIf b = ', we know what b+ is so suppose b # -6. A little trial anderror leads us to b+ = - b*. Remember, b*b is a scalar. In fact, itb*bis the Euclidean length squared of b considered as a vector. We preferto illustrate with an example and leave the formal proof as an exercise.

We claim 2_ I 2 3n

] ,- [where b = 2 . First note that

3i 14 14 14 3i

14 Thb* b 21 2

[ ,2 2 1 2. en=

4= [ ]=

3i1414 1 4 3i

3i 3i


14 14i] 2][---

114 14 4

1 -3'

=I 2 -3i

[I) [14 14 14

I I I

Ib+b= 141414 J

21

I 2 -3i_ 124 1 1g1

It4 T4 194

14 14 14

_ [ I) and bb+ = I 2

L 3i

Next, we come to crucial cases in which we can identify the pseudoinverseof a matrix.

THEOREM 4.15

1. Suppose F E C" "' - that is, F has frill column rank. Then F+ _(F*F)-I

F*.

2. Suppose G E Cr,,,;

-that is, G hasfull row rank. Then G+ = G*(GG*)-I.

PROOF

(1) We verify the four MP-equations. First, FF+F = F((F*F)-I F*)F =F(F*F)-'(F*F) = F1 = F. Next, F+FF+ = ((F*F)-I F*)F((F*F)-I F*) _ (F*F)-I (F*F)(F*F)-1 F* = I((F*F)-I F*) = F+.Now F+F = ((F*F)-I F*)F = (F*F)-'(F*F) = I, so surely (F+F)*= F+F. Finally, (FF+)* = (F(F*F)-I F*)* = F**(F*F)-I F*F(F*F)*-'F* = F(F*F)-I F* = FF+.

(2) This proof is similar to the one above and is left as an exercise. 0

So we see that for matrices of full row or column rank, the pseudoinversepicks out a specific left(right) inverse of the matrix. From above, F+F = I,and GG+ = 1,. Now, for an arbitrary matrix A in C"I" with r > 0, we shallshow how to construct the pseudoinverse.

DEFINITION 4.7 (pseudoinverse)Let A be a matrix in C"' "". Take any full rank factorization of A = FG. Then

F+ and G+ exist by the theorem above. Define A+ in C"' by A+ := G+F+.In other words, A+ = G*(GG*)-I (F*F)-I F*.

THEOREM 4.16For an arbitrary matrix A in C" with r > 0, A+ defined above satisfiesthe four MP-equations and, hence, must be the unique pseudoinverse of A.


Moreover, AA+ = FF+ and A+A = G+G where A = FG is any full rankfactorization of A.

PROOF Suppose the notation of (4.7). Then AA+A = AG+F+FG =AG+G = FGG+G = FG = A. Next, A+AA+ = G+F+AA+ = G+F+FGA = G+GA+ = G+GG+F+ = G+F+ = At Also, AA+ = FGG+F+= FF+ and we know (FF+)* = FF+. Finally, A+A = G+F+FG = G+Gand we know (G+G)* = G+G. 0

We now have established the uniqueness and existence of A+ for any matrixA in C111". The approach we used here goes back to Greville [19601, whocredits A. S. Householder with suggesting the idea. There are some propertiesof pseudoinverses that are easy to establish. We collect some of these in thenext theorem.

THEOREM 4.17Let A E Cmxn Then

1. (AA+)2 = AA+ = (AA+)*.

2. (1", - AA+)Z = Um - AA+) AA+)*.

3. (A+A)2 = A+A = (A+A)*.

4. (1n-A+A)2=(1n-A+A)=(1"-A+A)*.

5. (/," - AA+)A = ®,nxn

6. (1 - A+A)A+ = Onxm

7. A++=A.

8. (A*)+ = (A+)*.

9.

10.

(A*A)+ = A+A*+

A* = A*AA+ = A+AA*.

11. A+ _ (A*A)+A* = A*(AA*)+.

12. (XA)+ = k+A+.

ROOF The proofs are left as exercises. El

Let's look at an example.


Example 4.33 6 13 3 13

We continue with the example from above: A = 2 4 9 = 2 9

1 2 3 1 3

11

3 13

J, where F 2 9 and G=L

0 0 01

, gives a fullI 3 L

rank factorization of A. Then direct computation from the formulas in (4.15.1)

i -22 79and (4.15.2) yields G+

0_ 0 and F+ = 2P 2 ? and so A+

0 1 26 26 26 J

3 -II 79 -3 -22 79

66

= If0 ITo There is something of interest

13 13 23 I36 130 130to note here. I t you recall the formula for the inverse of a matrix in terms of the ad-jugate, we see G+(GG*)-I = and F+ = (F*F)-I F*

so A+ = In

particular, if the entries of A consist only of integers, the entries of A+ will berational numbers with the common denominator d et (F* F)det (GG* ). In our ex-ample, det(GG*) = 26 while det(F*F) = 5, hence the common denominatorof 130.

Before we finish this section, we need to tie up a loose end. We have alreadynoted that a matrix A in C; ' with r > 0 has infinitely many full rank factor-izations. We even showed how to produce an infinite collection using invertiblematrices. We show next that this is the only way to get full rank factorizations.

THEOREM 4.18Every matrix A E C°" with r > 0 has infinitely many full rank factorizations.However, if A = FG = FIG 1 are two full rank factorizations of A, then thereexists an invertible matrix R in C"`r such that F, = FR and GI = R-' G.Moreover, (R-'G)+ = G+R and (FR)+ = R-I F+.

PROOF The first claim has already been established so suppose A =FG = F, G I are two full rank factorizations of A. Then FI G I = FG soFI FIG, = F,+FG so GI = (FI+F)G since FI+F, = Jr. Note that Fi Fis r-by-r and r = r(GI) = r((F, F)G) < r(FI F) < r, so r(FI F) = rand so Fi F is invertible. Call FI+F = S. Similarly, FIG, = FG impliesFIGIGi = FGGi = FI since GIGS _ fr. Again note GGi is r-by-r ofrank r so GG+ is invertible. Name R = GGi. Then SR = F,+FGG+ =FI+AGi = FI+FIGIGt = /,. Thus S = R-I. Now we can see GI = SG =


R-'G and F, = FGG+ = FR. To complete the proof, compute (FR)+ =((FR)*(FR))-'(FR)* = (R*F*FR)-IR*F* = R-l(F*F)-'R*-IR*F* =R-I(F*F)-I F* = R-1 F+ and (R-'G)+ = (R-iG)*((R-IG)(R-iG)*)-I =

G*(R-1)*(R-lGG*R-1*)-1 =G*(R-l)*(R-I)*-I(GG*)-lR=G*(GG*)-lR

G+R. 0

We end this section with a table summarizing our work on pseudoinversesso far.

TABLE 4.4.1: Summary Table

Dimension Rank Pseudoinverse

n =m n A+=A-'m-by-n n A+=(A*A)-IA*m-by-n m A+ = A*(AA*)-Im-by-n r A+ = G+ F+ where

A = FG is any fullrank factorization of A

Exercise Set 16

1. Let A = FG be a full rank factorization of A. Argue that F+A = G,FF+A = A, AG+ = F, and AG+G = A.

2. Suppose AL is a left inverse of A - that is, ALA = I. Is AL = A+necessarily true? Suppose A*A = I. What can you say about A+?

3. Suppose AZ = A in C""". Use a full rank factorization of A to provethat rank(A) = trace(A) (i.e., the rank of A is just the trace of A whenA is an idempotent matrix).

4. Justify each of the steps in the proof of the uniqueness theorem.

5. Determine the pseudoinverse of any diagonal matrix.

6. Go through the computations in detail of the numerical example in thetext.

7. Prove (2) of Theorem 4.15.


8. Prove the 12 claims of Theorem 4.17.

9. Prove the following: AA+A+* = A+*, A+*A+A = A+*, A*+A*A = AAA*A*+ = A, A*A+*A+ = A+, A+A+*A* = At

10. Argue that the row space of A+ is equal to the row space of A*.

11. Argue that the column space of A+ is equal to the column space of A*and the column space of A+A.

12. Argue that A, A*, A+, and A+* all have the same rank.

13. Prove (AA*)+ = A+*A+, (A*A)+ = A+A+* = A+A*+ and (AA*)+(AA*) = AA+.

14. Prove A = AA*(A+)* = (A+)*A*A = AA*(AA*)+A and A*A*AA+ = A+AA*.

15. Prove A+ = A+(A+)*A* = A*(A+)*A+.

16. Prove A+ = (A*A)+A* = A*(AA*)+ so that AA+ = A(A*A)+A*.

17. Show that if A = > A, where A; Aj = 0 whenever i # j, then A+ =r_ A;.

18. Argue that all the following matrices have the same rank: A, A+, AA+,A+A, AA+A, and A+AA+. The rank is Tr(AA+).

19. Argue that (-A)+ = -A+.

20. Suppose A is n-by-m and S is m-by-ni invertible. Argue that (AS)(AS)+= AA+.

21. Suppose A*A = AA*. Prove that A+A = AA+ and for any naturalnumber n, (A")+ = (A+)". What can you say if A = A*?

22. Prove that A+ = A* if and only if A*A is idempotent.

23. If A = ® ® , find a formula for At

24. Suppose A is a matrix and X is a matrix such that AX A = A, X AX = Xand AX = X A. Argue that if X exists, it must be unique.

25. Why does (F*F)-1 exist in Theorem 4.15?


26. (MacDuffee) Let A = FG be a full rank factorization of A in Qr"Argue that F*AG* is invertible and A+ = G*(F*AG*)-I F*. (Hint: Firstargue that F*AG* is in fact invertible.) Note F*AG* = (F*F)(GG*)and these two matrices are r-by-r or rank r hence invertible. Then(F*AG*)-I = (GG*)-I(F*F)-I.

X, y,

27. Let x = and y = . Show (xy*)+ = (x*x)+(y*y)+yx*.

xn yn

28. Find the MP inverse of a 2-by-2 matrix a bc d

29. Find examples of matrices A and B with (AB)+ = B+A+ and A and Bwith (A B)+ # B+A+. Then argue Greville's [1966] result that (AB)+ _B+A+ iff A+A and BB+ commute.

0 1 0 +

30. Find 0 0 1

0 0 0

31. Remember the matrix units E, ? What is Et?

32. Find necessary and sufficient conditions for A+ = A.

33. (Y. Tian) Prove that the following statements are equivalent for m-by-nmatrices A and B:

r 1

(i) Col I AAA1

= Col I BBB

(ii) Col f AAA 1 = Col I BBB

(iii) A =B. L

34. In this exercise, we introduce the idea of a circulant matrix. An n-hy-n matrix A is called a circulant matrix if its first row is arbitrary butits subsequent rows are cyclical permutations of the previous row. So,if the first row is the second row is (anaia2...an-1 ),and the last row is (a2a3a4 anal). There are entire books written onthese kinds of matrices (see page 169). Evidently, if you know the firstrow, you know the matrix. Write a typical 3-by-3 circulant matrix. Is theidentity matrix a circulant matrix? Let C be the circulant matrix whosefirst row is (0100 . . . 0). Argue that all powers of C are also circulantmatrices. Moreover, argue that if A is any circulant matrix with first row

then A =a,I+a2C+a3C2+...+aC"-'.

188 The Moore-Penrose inverse

35. Continuing the problem above, prove that A is a circulant matrix iffAC = CA.

36. Suppose A is a circulant matrix. Argue that A+ is also circulant and A+commutes with A.

37. (Cline 1964) If AB is defined, argue that (A B)+ = Bi A+ where ABA,B1, B, = A + A B and AI = ABiBi .

38. If rank(A) = 1, prove that A+ = (tr(AA*)-')A*.

39. Prove that AB = 0 implies B+A+ = 0.

40. Prove that A*B = 0 iff A+B = 0.

41. Suppose A*AB = A*C. Prove that AB = AA+C.

42. Suppose BB* is invertible. Prove that (AB)(AB)+ = AA+.

43. Suppose that AB* = 0. Prove that (A+B)+ = A+ +(I,, - A+B)[C++(I - C+C)MB*(A+)*A+(I - BC+) where C = (1,,, - AA+)B and

M = [1,1 + (1,, - C+C)B*(A+)A+B(I,1 - C+C)]-'

A +

44. Prove that = [A+ - T BA+ I T] where T = E+ + (1 -B

E+B)A+(A+)*B*K(Ip - EE+) with E = B(I - A+A) and K =[1,, + (I,, - EE+)BA+(A+)*B*(1 - EE+)]-'.

A+-A+B(C++D) l45. Prove that [A:B]+ C,+ + D

J

where C = (I,,, -

AA+)B and D =(I,, - C+C)[I,, + (1,, - C+C)B*(A+)*A+B(1,, - C+C)]-' B*(A+)*

A+(I,,, - BC+).

46. Argue Greville's [1966] results: (AB)+ = B+A+ iff any one of thefollowing hold true:

(a) A+ABB*A* = BB*A* and BB+A*AB = A*AB.(b) A+ABB* and A*ABB+ are self adjoint.(c) A+ABB*A*ABB+ = BB*A*A.(d) A+AB = B(AB)+AB and BB+A* = A*AB(AB)+.

47. Suppose A is m-by-n and B is n-by-p and rank(A) = rank(B) = n.Prove that (A BY =B+ A+.


Further Reading

[Cline, 1964] R. E. Cline, Note on the Generalized Inverse of a Productof Matrices, SIAM Review, Vol. 6, January, (1964), 57-58.

[Davis, 1979] Philip J. Davis, Circulant Matrices, John Wiley & Sons,New York, (1979).

[Greville, 1966] T. N. E. Greville, Note on the Generalized Inverse of aMatrix Product, SIAM Review, Vol. 8, (1966), 518-524.

[Greville, 1960] T. N. E. Greville, Some Applications of the PseudoInverse of a Matrix, SIAM Review, Vol. 2, (1960), 15-22.

[H&M, 1977] Ching-Hsiand Hung and Thomas L. Markham, TheMoore-Penrose Inverse of a Sum of Matrices, J. Australian MathematicalSociety, Vol. 24, (Series A), (1977), 385-392.

[L&O, 19711 T. O. Lewis and P. L. Odell, Estimation in Linear Models,Prentice Hall, Englewood Cliffs, NJ, (1971).

[Liitkepohl, 1996] H. Liitkepohl, Handbook of Matrices, John Wiley &Sons, New York, (1996).

[Mitra, 1968] S. K. Mitra, On a Generalized Inverse of a Matrix andApplications, Sankhya, Series A, XXX:1, (1968), 107-114.

[M&O, 1968] G. L. Morris and P. L. Odell, A Characterization for Gen-eralized Inverses of Matrices, SIAM Review, Vol. 10, (1968), 208-211.

[Penrose, 1955] R. Penrose, A Generalized Inverse for Matrices, Proc.Camb. Phil. Soc., Vol. 51, (1955), 406-413.

[R&M, 19711 C. R. Rao and S. K. Mitra, Generalized Inverses of Matricesand its Applications, John Wiley & Sons, New York, (1971).

[Rohde 1966] C. A. Rohde, Some Results on Generalized Inverses,SIAM Review, VIII:2, (1966), 201-205.

[Wong, 19811 Edward T. Wong, Polygons, Circulant Matrices, andMoore-Penrose Inverses, The American Mathematical Monthly, Vol. 88,No. 7, August/September, (1981), 509-515.


4.4.1 MATLAB Moment

4.4.1.1 The Moore-Penrose Inverse

MATLAB has a built-in command to compute the pseudoinverse of an m-by-n matrix A. The command is

pinv(A).

For example,> > A=[123;456;789ans =

-0.6389 -0.1667 0.3056-0.0556 0.0000 0.05560.5278 0.1667 -0.1944

> > format rat>> pinv(A)ans =-23/36 -1/6 1 1 /36

-1/18 * 1/18

19/36 1/6 -7/36.

For fun, find pinv(B) where B = ones(3), B = ones(4). Do you see a pattern?

4.5 Solving Systems of Linear Equations

Now, with the MP-inverse in hand, we consider an arbitrary system of linearequations Ax = b where A is m-by-n, x is n-by-1, and b is in-by-l.

THEOREM 4.19Ax = b has a solution if and only if AA+b = b. If a solution exists at all,every solution is of the form x = A+b + (I - A+A)w, where w is an arbitraryparameter matrix. Indeed, a consistent system always has A+b as a particularsolution.

PROOF First, we verify the consistency condition. If AA+b = b, thenevidently x = A+b is a solution to the system. Conversely suppose a solutiony exists. Then Ay = b, so A+Ay = A+b whence AA+Ay = AA+b. ButAA+A = A, so b = Ay = AA+b, and we have the condition. Now, supposethe system has a solution x and let x = A+b as above. Then, if w = x - xo,Aw = A(x - Ax - Ax,, = b - AA+b = b - b = V. Now clearly,

4.5 Solving Systems of'Linear Equations 191

ButAw=-6 implies A+Aw = -6, sow=(I-A+A)w.Thuswe see x = x +w = A+b + (I - A+A)w, and the proof is complete. D

So, while our previous concepts of inverse were highly restrictive, the MP-inverse handles arbitrary systems of linear equations, giving us a way to judgeHhether a solution exists and giving us a way to write down all solutions whenthey exist. We illustrate with an example.

Example 4.41 1 0

Let A = 10

1 in C4x3 and consider the system of linear equations

1 0 1

2

Ax = b, where b = 2 . First we compute the pseudoinverse of A. We

2

1

write A=FG= ] 0 r 1 0 1 1

1 0 1a full rank factorization of A.

1 0

Then A+ = G+F+ = G*(GG*)-I (F*F)-I F* =

2

and AA+ =0

1

20

1 I I

6 6

61 61

6 6 6 6

Since AA+ is idempotent, its rank is its trace,

which is 2. Also I - A+A -

2

see if we have any solutions at all: AA+b =

0

2

2 Yes, the system is consistent. To give all possible solutions we form2

2

1 -1 -1

3I We compute A A+b to73,

x = A+b + [I - A+AIw, where w is a column of free parameters: x =


i i i i 2 i -i -i2 61 61 2

3w

+ iW2 =

61 bi 2 31 w36 6 6 6 2 3 3 3

(wi - w2 - w3) 43$ + (-WI +w2+w;) +t 31 where t

3(-3(+ W2 + w3) 3i

W1 - w2 - W3. This also tells us that the nullity of A is I and the null space isi 1

3

Null(A) = span 3 Next consider Ax = b = I Again we-iT 1 ) - 0

1 2 0 2 0 I

check for consistency by computing AA+2 = 0 2 2 2

1 2 0 2 0 1

0 0; 0 Z 0

Evidently, there is no solution to this system.

The astute reader has no doubt noticed that the full force of the pseudoinversewas not needed to settle the problem of solving systems of linear equations.Indeed, only equation (MPI) was used. This observation leads us into the nextchapter.

Exercise Set 17

1. Use the pseudoinverse to determine whether the following systems oflinear equations have a solution. If they do, determine the most generalsolution.

2x - 10y + 16z = 10

3x + y - 5z = 1

(a) 2x - y + z _ 42x - 2y + z = 3

2x - 4y + 3z - 5w = -3(b) 6x - 2y + 4z - 5w = 1

4x - 2y + z = 4


2. Create examples of three equations in two unknowns that

(a) have unique solutions(b) have an infinite number of solutions(c) have no solutions.

3.

Are your examples compatible with the theory worked out above?

This exercise refers back to the Hermite echelon form. Suppose we desirethe solutions of Ax = b where A is square but not necessarily invertible.We have showed how to use the pseudoinverse to describe all solutionsto Ax = b if any exist. In this exercise, we consider a different approach.First form the augmented matrix [A I b]. There is an invertible ma-trix S such that SA = H = HEF(A) so form [A I b] -+ [SA I Sb]= [H I Sb].

(a) Argue that rank(A) = rank(H) = the number of ones on thediagonal of H.

(b) Prove that Ax = b is consistent iff Sb has nonzero componentsonly in the rows where H has ones.

(c) If Ax = b is consistent, argue Sb is a particular solution to thesystem.

(d) Argue that if H has r ones down its diagonal, then I - SA hasexactly n - r nonzero columns and these nonzero columns spanA(ull(A) and hence form a basis forMull(A).

(e) Argue that all solutions of Ax = b are described by x = Sb +(I - SA)D, where D is a diagonal matrix containing n - r freeparameters.

Further Reading

[Greville, 1959] T. N. E. Greville, The Pseudoinverse of a Rectangular orSingular Matrix and its Application to the Solution of Systems of LinearEquations, SIAM Review, Vol. 1, (1959), 38-43.

[K&X, 1995] Robert Kalaba and Rong Xu, On the Generalized InverseForm of the Equations of Constrained Motion, The American Mathemat-ical Monthly, Vol. 102, No. 9, November, (1995), 821-825.


MP-Schur complement, the rank theorem, Sylvester's determinantformula, the quotient formula, Reidel's formula, parallel sum

4.6 Schur Complements Again (optional)

Now, we generalize the notion of a Schur complement to matrices that may

not be invertible or even square. Let M = [ A"'xn B,,,xt 1 We define theCrxn Dsxt J

MP-Schur complement of A in M to be

M/A = D - CA+B.

Obviously, if A is square and nonsingular, A-' = A+ and we recapture theprevious notion of the Schur complement. Similarly, we define

M//D = A - BD+C.

Next, we investigate how far we can generalize the results of Chapter I.Things get a bit more complicated when dealing with generalized inverses. Wefollow the treatment in Carlson, Haynsworth, and Markham [C&H&M, 1974].

THEOREM 4.20 (the rank theorem)

Let M = I

C BD ].Then rank(M) > rank(A) + rank(M/A).

Moreover, equality holds iff

1. Null(MIA) c Null((I - A+A)B)

2. Null((M/A)*) F= Null((/ - A+A)C*)

3. (I - AA+)B(M/A)+C(I - A+A) = G.

PROOF Let P = [ -CA+®]

and Q = I ® 1+B J. Note P

and Q are invertible. Then P M Q_ 1 ® A B I -A+B

A B-CA+ I

1

[+C D) [® I

= [ -CA A+C -CA+B+D ][

®I B 1

A -AA+B+B-CA+A + C CA+AA+B - CA+B + -CA+B + D ]

4.6 Schur Complements Again (optional) 195

I A -AA+B+B 1

-CA+A + C CA+B - CA+B + -CA+B + D J_ A -AA+B + B Let P == [ -CA+A + C D - CA+B[ ® -(I - AA;)B(M/A)+ 1 M = [ Cn

x,n

DBj?jxj

sxt].

Then (M/A)5 , = Dsxt - C,nJApxmBmxt E C(nt+')x(n+t) 0

The following theorem is Sylvester's determinant formula for noninvertiblematrices.

THEOREM 4.21 (Sylvester's determinant formula)

Let M = [ AC kBB

1

Let either Null(A) C Null(C) or Null(A*) e

Null(B*). Let P = [pii] and

ptj det IRow;A

(C)Col B) 1 Then P = det(A)(M/A), and if M is

diin-by-n, det(P) = (det(A))n-k-'det (M).

PROOF Since either Null(A) c_ Null(C) or NUll(A*) c_ Arull(B*)holds, the determinant formula det (M) = det (A)det(M/A) can be ap-plied to the elements pig : p;j = det(A)(d,1 - Row;(C)A+Coli(B)) =(det(A))(M/A);;. If M is n-by-n, then by equation above det (P) =(det(A))n-kdet(M/A) = (det(A))n-I -'det(M). 0

The quotient formula for the noninvertible case is stated and proved below.

THEOREM 4.22 (the quotient formula)1

Let A = [ H K l , where B = [ ED

]. Let Arull(A) e Null(C),

Null(A*) C 1(ull(BJ*). Then (A/B) _ (A/C)/(B/C).

PROOF Since Null(A) e Null(C) and Null(A*) e_ Null(B*),B Cwe may write A =

IQP K

PB [ SC FR ]. So, we

have A/B = K - QPBB+B P = K - QBP, B/C = F - SCR.Partition P, Q as P = [p], Q = [ Q) Q2 ]. Then A/C =


(B/C) (B/C)P2Q2(B/C) K - (Q, + Q2)C(P, + RP"-) Hence A/C satisfies

Null(A) e Mull(C) and NUll(A*) c Null(B*). Since B/C = F - SCR,(A/C)/(B/C) = K - QiCPi - Q2SCP1 - Q2FP2 = A/B.

The following theorem is an example using the MP-inverse to get a Sherman-Morrison-Woodbury type theorem. The theorem stated below is due to Reidel[19921.

THEOREM 4.23 (Reidel's formula)Let A,,,, have rank l i , where 1, < 1, V1, V2, W1, W2 be l -by-k and G be a k-by-knonsingular matrix. Let the columns of V, belong to Col(A) and the columnsof W, be orthogonal to Col(A). Let the columns of V2 belong to Col(A*) andthe columns of W2 be orthogonal to Col(A*). Let B = W;*W; have rank k.Suppose Col(W1) = Col(W2). Then the matrix ar = A + (V1 + W,)G(V2 +W2)* has the MP-inverse

,n+ = A+ - W2(W, W2)-'Vz A+ - A+ Vj(Wj(W1 Wi)-l )* +W2(W2

W2)_,(G++VA +V1)(W1(W, Wi)-I)*

The proof is computational but lengthy, so we leave it as a challenge tothe reader. The corollary given below addresses some consequences of thistheorem.

COROLLARY 4.6The following are true under the assumptions of the previous theorem:

1. If G = /, then (A + (V, + W i)(V2 + W2 )* )+= A+ - W2(W2

W2)-,VA+ - A+V1(W1(W,*W1)-i)*

+W2(W2W2)-'V2 A+VI(Wi(Wr W,)-')*

2. If G = 1, and A = I, then (I + (Vi + W,)(V2 + W2)*)+_ / - W2(W? W2)-1 V2 - VJ(W1(W, Wi) T+ W2(W2 *W2)-' V2 V1(Wj (W1 Wi)-I)*

3. If V, =V2=O,then ((A + W, W2)*)+ = A+ + W2(WZ W2)-J G+(Wi (W, Wj )-I )*

Fill and Fishkind [F&F, 19991 noticed that the assumption Col(W1) =Col(W2) is not used in the proof of Reidel's theorem, and they used this the-orem to get a somewhat clean formula for the MP-inverse of a sum. Theyalso noted that a rank additivity assumption cannot be avoided in Reidel'stheorem.

4.6 Schur Complements Again (optional) 197

THEOREM 4.24Suppose A, B E Cnxn with rank(A + B) = rank(A) + rank(B). Then

(A + B)+ = (I - S)A+(1 - T) + SB+T

where S = (B+B(1 - A+A))+ and T = ((I - AA+)BB+)+

PROOF The proof uses Reidel's theorem and is again rather lengthy so weomit it. 0

The following simple example is offered on page 630 of Fill and Fishkind[ 1999].

Example 4.SLet A = B = [ 1]. Then rank(A+B) = rank(l+1) = rank(2) = 10 2 = rank[ 11+rank[1] = rank(A) + rank(B). Now S = [[1][0]]+ = [0] and T = [[0][1]]+= [0]. Therefore, (I-S)A+(I-T)+SB+T = [ 1 ] [ l ]+[ 1 ] + [0][11+[0] _ [11 and(A+B)+ = [2] and the theorem fails.

There is a nice application of the previous theorem to the parallel sum of twomatrices. Given A, B E Cnxn we define the parallel sum and A and B to be

A II B = (A+ + B+)+.

COROLLARY 4.7Suppose A, B E Cnxn with rank(A II B) = rank(A) + rank(B). Then A IIB = (I - R)A(I - W) + RBW, where R = (BB+(I - AA+)+ and W =((1 - A+A)B+B)+.

Exercise Set 18

1. Let M =L

C D ]. Argue that det(M) = det(A)det(M/A) if A is

invertible. Moreover, if AC = CA, then det(M) = AD - BC.

198 The Moore-Pet, rose Inverse

Further Reading

[C&H&M, 1974] David Carlson, Emilie Haynsworth, and ThomasMarkham, A Generalization of the Schur Complement by Means of theMoore Penrose Inverse, SIAM J. Appl. Math., Vol. 26, No. 1, (1974),169-175.

[C&H, 1969] D. E. Crabtree and E. V. Haynsworth, An Identity for theSchur Complement of a Matrix, Proceedings of the American Mathemat-ical Society, Vol. 22, (1969), 364-366.

[F&F, 1999] James Allen Fill and Donnielle E. Fishkind, The Moore-Penrose Generalized Inverse For Sums of Matrices, SIAM J. Matrix Anal.,Vol. 21, No. 2, (1999), 629-635.

[Gant, 1959] F. R. Gantmacher, The Theory of Matrices, Vol. 1, ChelseaPublishing Company, New York, (1959).

IRiedel, 1992] K. S. Riedel, A Shcrman-Morrison-Woodbury Identityf'or Rank Augmenting Matrices with Application to Centering, SIAM J.Matrix Anal., Vol. 13, No. 2, (1992), 659-662.

Chapter 5

Generalized Inverses

5.1 The (I)-InverseThe Moore-Penrose inverse (MP-inverse) of a matrix A in C'"" can be con-

sidered as the unique solution X of a system of simultaneous matrix equations;namely,

(MPI) AXA = A

(MP2) X AX = X

(MP3) (AX)` = AX

(MP4) (XA)` = XA.

We have already noted that only MPI was needed when seeking solutionsof a system of linear equations. This leads us to the idea of looking for "in-verses" of A that satisfy only some of the MP-equations. We introduce somenotation. Let A{1} = {G I AGA = A}, A{2} = (H I HAH = H}, andso forth. For example, All, 2} = A{ I) fl A{2}. That is, a 11,21-inverse of Ais a matrix that satisfies MPI and MP2. We have established previously thatA(1, 2, 3, 4} has just one element in it, namely A+. Evidently we have the in-clusions A{1,2,3,4} e_ A{1,2,3) E- A{1,2} E_ A{l}. Of course, many otherchains are also possible. You might try to discover them all.

In this section, we devote our attention to A( 1}, the set of all 1-inverses of A.The idea of a 1-inverse can be found in the book by Baer [1952]. Baer's ideawas later developed by Sheffield [ 1958] in a paper. Let's make it official.

DEFINITIONS.] (1-inverse)Let A E C'' ". A 1-inverse of A is a matrix G in C"' such that AGA = A;

that is, a matrix that satisfies MPI is called a 1-inverse of A. Let A(1) denotethe collection of all possible 1-inverses of A.

Our goal in this section is to describe A { 11. Of course, A+ E A ( I ) so thiscollection is never empty and there is always a ready example of an element

199

200 Generalized Inverses

in A(l). Another goal is to prove a fundamental result about I-inverses. Webegin by looking at a general matrix equation AX B = C where A E C"'""X E C""1', B E CP'"q and, necessarily, C E C"y. Here A, B, C are givenand we are to solve for X.

THEOREM 5.1Let AX B = C be as above. Then this equation has a solution if and only if'thereexists an AA in A(1) and a Bs in B{1 } such that AAACBAB = C. If solutionsexist, they are all of the form X = AACBR + W- AAAWBBA, where W isarbitrary in C"P.

PROOF Suppose the consistency condition AAACBAB = C holds for someAA in A{ 1) and some BA in B{ 1). Then clearly, X = AACBA solves the matrixequation. Conversely, suppose AX B = C has a solution X1. Then AX I B =C so A+AXIBB+ = A+CB+. Thus, AA+AXIBB+B = AA+CB+B andalso equals AX I B, which is C. Therefore, AA+CB+B = C and note thatA+EA(1) and B+ E B(I).

Now suppose AX B = C has a solution, say Y. Let K = Y - X,,, wherek = AACBA as above. Then AKB = A(Y - AYB -C -AAACBAB =G. Now, AKB = O implies AAAKBBR = 0, so K = K-AAAK BBR, so Y = X + K = AACBA + K - AA A K B BR. On the other hand,if X = AACBA + W -AAAWBBA for some W, then AX B = A(ARCBR)B +A(W -ARAWBBA)B= AAACBAB+AWB-AARAWBBAB = C usingtheconsistency condition and the fact that All and BA are 1-inverses. This completesthe proof. 0

The test of a good theorem is all the results that follow from it. We now reapthe harvest of this theorem.

COROLLARY 5.1Consider the special case AX A = A. This equation always has solutions sincethe consistency condition A All AAR A = A holds true for any AA in A { 11.including A+. Moreover, A111 = {X I X = ARAAR + W- AAA WAAA},where W is arbitrary and All is any I -inverse of A. In particular, All) = {XX = A+ + W - A+A WAA+), where W is arbitrary.

COROLLARY 5.2Consider the matrix equation AX = C. A in C"'"", X in C"P, and C inCm 'P. This is solvable iff AAAC = C for some I-inverse AA of A and thegeneral solution is X = ARC + (I - AAA)W, where W is arbitrary.

5.1 The (I }-Inverse 201

COROLLARY 5.3Consider the matrix equation XB = C, where X is n-by-p, B E Cex9, andC E C"xy. This equation is solvable iff CBSB = C for some I-inverse BSof B and then the general solution is X = CBS + W(1 - BBS), where W isarbitrary.

COROLLARY 5.4Consider the matrix equation AX = 0, where A E C'"x" and X is n-by-p.Then this equation always has solutions since the consistency conditionAAS®1S1 = 0 evidently holds for any 1-inverse of A. The solutions are ofthe form X = (I - ASA)W, where W is arbitrary.

COROLLARY 5.5Consider the matrix equation X B = 0. This equation always has solutionssince the consistency condition evidently holds. The solutions are of the formX = W (l - B BS ), where W is arbitrary and BS is some 1-inverse of B.

COROLLARY 5.6Consider a system of linear equations Ax = c, where A E C"'x" x is n-by-Iand c is m-by-1. Then this system is solvable iff AASc = c for any AS in At I )and the solutions are all of the form x = ASe + (1 - ASA)w, where w isarbitrary of size n-by-1.

We recall that for a system of linear equations Ax = b, A', if it exists, hasthe property that A-' b is a solution for every choice of b. It turns out that the1-inverses of A generalize this property for arbitrary A.

THEOREM 5.2Consider the system of linear equations Ax = b, where A E C'"x". Then Gbis a solution of this system for every b E Col(A) iff G E A(1).

PROOF Suppose first Gb is a solution for every b in the column space ofA. Then AGb = bin Col(A). Now b = Ay for some y so AGAy = Ay. But ycould be anything, so A G A = A. Conversely, suppose G E A( 11. Take any bin Col(A). Then b = Ay for some y. Then AGb = AGAy = Ay = b. 0

As Campbell and Meyer [C&M, 1979] put it, the "equation solving" gener-alized inverses of A are exactly the ]-inverses of A. Next, we give a way ofgenerating 1-inverses of a given matrix. First, we need the following.


THEOREM 5.3If S and T are invertible matrices and G is a I -inverse of A, then T GS-' isa 1-inverse of B = SAT. Moreover, every 1-inverse of B is of this form.

PROOF First B(T-'GS-')B = SAT(T-'GS-')(SAT) = SAIGIAT =SAG AT =SAT= B, proving T -' G S-' is a I -inverse of B. Next, let K be any1-inverse of B, that is B K B = B. Note also that S-' BT -' (T K S)S-' BT -'S-'BKBT-' = S-'BT-'. But S-'BT-' = A so if we take G = SKT, thenAGA = A(SKT)A = A. That is, G is a I-inverse of A and K = T-'GS-'asclaimed. 0

THEOREM 5.4 (Bose)

If A E C,' "' there exist invertible matrices S and T with SAT = I r

A matrix G is a I -inverse of A iff G = T N'S, where N"XI# Y

W ] where

X, Y, and W are arbitrary matrices of appropriate size.

PROOF Suppose SAT = [ r ®] , where S and T are invertible. Let

G = TX

Y ] S, where X, Y, and W are arbitrary matrices of appropriate

size. Then, AGA = AT [ X W ] SA =

S-'1 0 0jT'T1 Jr W jSS-I1 ® ®1 T''r YS [® ®] [X W ] [® ®]T- S[0 ®] [® 0T-'

= S-' [ r ®]T-' = A. Conversely, ifGisal-inverseofA,then T-'GS-'0

is a I -inverse of S-' AT -'' by the previous theorem. That is, N = T' GS-' is

a I -inverse of [ ® ®]. Partition N = [ M

W I, where M is r-by-r.

But [ ® ®] [ X W ] [ ® ®] = [ 00

] and also equals11

[ ® ®] . Therefore M = Ir. Thus G = T NS = T [ X W ] S. 0

Next, we consider an example.

5.1 The (11-Inverse 203

Example 5.l1 0 0

Let A = 0 1 0 . Then A = A+. To find all the 1-inverses of A, we0 0 0

compute A+ + W - A+A W A A+ for an arbitrary W :1 0 0 x y z 1 0 0 x y z 1 0 00 1 0 + u v w - 0 1 0 u u w 0 1 0=0 0 0 r s t 0 0 0 r s t 0 0 0

1 0 z 1 0 z0 I W . Thus A{ 1) 0 1 w z, w, t, r, s are arbitrary}.r s t r s t

l O z 1 0 0

We find that AA8 = 0 1 w , ABA = 0 1 0 , 1 - ABA =0 0 0 r s 0

0 0 0 0 0 -z0 0 0 ,and!-AAB= 0 1 -w-r -s I 0 0 1

We introduced you to the Hermite echelon form for a reason. It helps togenerate 1-inverses, indeed, nonsingular ones.

THEOREM S.SSuppose A is n-by-n and S is nonsingular with SA = HEF(A) := H. Then Sis a 1-inverse of A.


THEOREM S.6 [Bose, 1959]Let H E (A*A){ I). Then AHA* = A(A*A)+ = AA+. In other words, AHA*is the unique Hermitian idempotent matrix with the same rank as A.

PROOF WeknowA*AHA*A = A*AsoAHA*A =A and A*AHA* = A*.Let G1 and GZ be in A*A{l }. Then AGZA*A = AGZA*A so that AGiA* =AG2A*. This proves the uniqueness. The rest is left to the reader. 0

In 1975, Robert E. Hartwig published a paper where he proposed a formulafor the I-inverse of a product of two matrices. We present this next.


THEOREM 5.7 [Hartwig, 1975]For conformable matrices A and B, (AB)A = B,'AA - B4(1 - AAA)j(1 -

BB9)(1 - AAA)]9(1 - BBA)AA.

It would he reasonable to guess the same formula works for the MP-inverse;that is,

(AB)+ = B+A+ - B+(1 - A+A)[(1 - BB+)(1 - A+A)]+(I - BB+)A+.

If we check it with our previous example A = I1

0 1J

and B =

I 0 [

0 1 , we find (AB)+ # B+A+. However, the right-hand side computes1 1

to

I I I

I3-I I

3 3

3 3 3

-I 1 -l3 3 3

I -I-I I

-1 1 I[I I -I

3 3 3

I I -I

3 3 3

-I -I I

3 3 3

3 3 =

[1

1

-12 - 2 = (AB)+. So the formula works! Before

3 3

2 -I

3 31 1 0 1 l

1you get too excited, let A = 0 1 and B = [J .

Now A has full

°I

columns, so A+A = 1, A*A = [ 21

J _

2 I*4_1 -;

21 2111and

A+ _ (A*A)-IA* =2 1 1 0 1 _ 2 -1 1

1 2 [ 0 1 1] =3 [ -1 2 1

].AlsoB*B=E1JB+=

(B*B)-1 = [l0] and AB = 0 .

(AB)*(AB) = [21, and (AB)+ _ [(AB)*(AB)]-'(AB)* = Z[101]. How-

ever, B+A+ = [1011 [21

21 1

= '[2 -111 # (A B)+. It would he nice to have necessary and sufficientconditions for the formula to hold, or better yet, to find a formula that alwaysworks.

9 99 -1I -

-1 2

9 9

S. / The (I }-Inverse 205

Exercise Set 194 -100 1 1 0 1

1. Verify that G0 0 0 0 is a 1-inverse of A =

35

4 2

4

00 0 00 1-1-4 7

Can you find some others? Can you find them all?

2. Suppose A E C"" and A=Az[

A 11A

i J, where A 1 1 isl AZ1A11 A12 J

rr-by-r invertible. Show that G = I

®1

®J is a 1-inverse of A.

3. Suppose Gland G2 are in A( 1) for some matrix A. Show that2

(G 1 +G2)is also in A (1).

4. Suppose G1 and G2 are in A (I } for some matrix A. Show that XG 1 +(1 - X)G2 is also in A (1). In other words, prove that A (1) is an affine set.

5. Suppose G 1, G2, ... , Gk are in A 11) and X 1, X2, ... , Xk are scalars thatsum to 1. Argue that X1 G 1 + X2G2 + + I\k Gk is in All).

6. Is it true that any linear combination of 1-inverses of a matrix A is againa 1-inverse of A?

7. Let S and T be invertible matrices. Then show T-1 MS-1 is a 1-inverseof SAT for any Ax in A{ 1).

8. Argue that B is contained in the column space of A if and only if AA8 B =B for any A8 in All).

9. Argue that A{l} = (A++ [1 - A+A}W21 + W12(1 - AA+) I W21, W12arbitrary).

10. If G E A{1), then prove that GA and AG are both idempotent matrices.

11. If G E A{1), then prove rank(A) = rank(AG) = rank(GA) =trace(AG) < rank(G). Show rank(I -AG) = m - rank(A) and rank(I -GA) = n - r(A).

12. Suppose A is n-by-n and H = HEF(A). Prove A is idempotent if andonly if H is a 1-inverse of A.

206 Generalized Inver.e.s

13. If G is a I-inverse of A, argue that GAG is also a 1-inverse of A and hasthe same rank as A.

14. Show that it is always possible to construct a 1-inverse of A E C,'"" thathas rank = min{m, n}. In particular, argue that every square matrix hasan invertible 1-inverse.

15. Make clear how Corollary 5.1 to Corollary 5.6 follow from the maintheorem.



18. Suppose AGA = A. Argue that AG and GA are idempotents. Whatdirect sum decompositions do they generate?

19. What is (1)? What is / (I }? Remember the matrix units E11 E(C"'"? What is Eij 11)?

20. Is A{2, 3, 41 ever empty for some weird matrix A?

1 0 2 3

0 1 4 521. Find a 1-inverse for 0 0 0 0

0 0 0 0

22. Show that A E C, "" can have a {I }-inverse of any rank between r and

min fm, n}. (Hint: rank ® Dr + rank(D).)

23. Argue that any square nonsingular matrix has a unique 1-inverse.

24. Suppose A E C""" and G E A { l ). Argue that G* E A* (1).

25. Suppose A E Cand G E A{ I} and h E C. Argue that X+G E (XA)(1)

where, recall, k+ =0 if X= 0

x-' if x# 0

26. Suppose G E A (1). Argue that GA and AG are idempotent matricesthat have the same rank as A.

27. Suppose G E A{ 11. Argue that rank(G) > rank(A).

28. Suppose G E All). Argue that if A is invertible, G = A'.

29. Suppose G E A{1} with S and T invertible. Argue that, T-'GS-' E(SAT) { 1).

5.1 The { I }-Inverse 207

30. Suppose G E A( I). Argue that Col(AG) = Col(A), Null(GA) _J/ull(A), and Col((GA)*) = Col(A*).

31.

32.

Suppose G E A(1 ), where A E C"'. Argue that GA = I iff r = n iffG is a left inverse of A.

Suppose G E A{ 1), where A E C"""". Argue that AG = I iff r = m iffG is a right inverse of A.

33. Suppose G E A(l) and v E Null(A). Argue that G, _ [g,I I

rr 11

34. Let A E C"I x" . Argue that G E A (1) iff G = S L® 0J

T for some

invertible S and T.L

35. LetGEA(l). Argue that H = G+(W -GAWAG)isalsoa I-inverseand all 1-inverses of A look like H.

36. (Penrose) Prove that AX = B and X C = D have a common solution Xiff each separately has a solution and AD = BC.

37. Suppose G E Al I). Prove that G + B - GABAG E A{1} and G +AG) = (1" - GA)C E Al I).

38. Suppose G E A(1). Prove that rank(A) = rank(GA) = rank(AG) andrank(G) = rank(A) iff GAG = G.

39. Suppose G E A*A{1). Prove that AGA*A = A and A*AGA* _A*.

40. Suppose BAA* = CAA*. Prove that BA = CA.

41. Suppose G E AA*{1} and H E A*A{1}. Prove that A = AA*GA,A = AHA*A, A = AA*G*A, and A = AH*A*A.

42. Suppose A is n-by-m, B is p-by-m, and C is n-by-q with Col(B*) cCol(A*) and Col(C) e Col(A). Argue that for all G in A( 1), BGC =BA+C.

43. Suppose A is n-by-n, c is n-by-I, and c E Col(A) fl Col(A*). Arguethat for all G in A{1}, c*Gc = c*A+c. How does this result read ifA = A*?

44. Find matrices G and A such that G E A 11) but A V G { I).

208

Further Reading

Generalized Inverses

[Baer, 1952] Reinhold Baer, Linear Algebra and Projective Geometry,Academic Press, Inc., New York, (1952).

[B-I&G, 20031 Adi Ben-Israel and Thomas N. E. Greville, GeneralizedInverses: Theory and Applications, 2nd Edition, Springer, New York,(2003).

[B&O, 1971] Thomas L. Boullion and Patrick L. Odell, GeneralizedInverse Matrices, Wiley-Interscience, New York, (1971).

[C&M, 19791 S. L. Campbell and C. D. Meyer, Jr., Generalized Inversesof Linear Transformations, Dover Publications, Inc., New York, (1979).

[Sheffield, 1958] R. D. Sheffield, A General Theory For Linear Systems,The American Mathematical Monthly, February, (1958), 109-111.

[Wong, 1979] Edward T. Wong, Generalized Inverses as Linear Trans-formations, Mathematics Gazette, Vol. 63, No. 425, October, (1979),176-181.

5.2 {1,2}-Inverses

C. R. Rao in 1955 made use of a generalized inverse that satisfied MPI andMP2. This type of inverse is sometimes called a reflexive generalized inverse.We can describe the general form of 11,2)-inverses as we did with 1-inverses. Itis interesting to see the extra ingredient that is needed. We take the constructiveapproach as usual.

THEOREM 5.8

Let A E C"' -.There are matrices S and T with SAT =L

1, g ]. A matrix

G is a {1,2}-inverse of A if and only if G = T NMS, where N" = 11Y

where X and Y are arbitrary of appropriate size.X XY

5.2 {1,2}-beverses 209

PROOF First suppose G = TN"S as given in the theorem. Then G is a1-inverse by Theorem 5.3. To show it is also a 2-inverse we compute GAG =

GS-' I ® g ] T-'G = TNT f ® ®] N'S =

rT [ X X Y ] [ 0 0 X X Y ] S-T [ X

'r 'r Y®X XX Y1, Y

S = T I X XY ] S = G. Conversely, suppose G is a {1,2}-inverse of

A. Then, being a 1-inverse, we know G = TN"S, where N" _ [ X W ] .

But to be a 2-inverse, G = GAG =0

T [ X W ] SS-1 [ 011

O ] T-IT [ X W I S - T[ X 0]X W ] S = T [Jr XY ] S. Comparing matrices, we see

W=XY. 0

Exercise Set 20

1. Suppose S and T are invertible matrices and G is a {1,2}-inverse ofA. Then T-'GS-' is a {1,2}-inverse of B = SAT. Moreover, every{ 1, 2}-inverse of B is of this form.

2. (Bjerhammar) Suppose G is a 1-inverse of A. Argue that G is a11, 2}-inverse of A iff rank(G) = rank(A).

3. Argue that G E Ail, 2} iff G = G 1 A G2, where G1, G2 E Ail).

4. Argue that G = E(HAE)-' H belongs to A {2}, where H and E areselected judiciously so that HAE is nonsingular.

5. Argue that G = E(HAE)+H belongs to A {2}, where H and E arechosen of appropriate size.

6. Suppose A and B are {1,2}-inverses of each other. Argue that AB is theprojector onto Col(A) along Null(B) and BA is the projector of Col(B)along Null (A).

7. Argue that G E A{1, 2} iff there exist S and T invertible with G =BSr 0 0 JTandSAT= L 0 0

].


5.3 Constructing Other Generalized InversesIn this section, we take a constructive approach to building a variety of

examples of generalized inverses of a given matrix. The approach we adopt goesback to the fundamental idea of reducing a matrix to row echelon form. Suppose

GA E C;' x". Then there exists an invertible matrix R with RA = . . , where

G has r = rank(A) = rank(G). Now G has full row rank and, therefore, weknow G+ = G*(GG*)-I. Indeed, it is clear that GG+ = Jr.

Now let's define As, = [G+ + (I - G+G)X I V]R, where X and V arearbitrary matrices of appropriate size. We compute

G G

AAR A = R-' ... [G+ + (I - G+G)X:V JRR-' ... _0

G= R-1 ... [G+G + (I - G+G)XG + 01 _

10 1G+G+(!-G+G)XG+0 G

R ' ... = R-' ... = A. We see A"'

is a (I)-inverse of A. For example,

let A= 113

6 J. Then I 13 I ] [ 3 6 ] = [ 0 0 ]so

G 1 2 ],GG*=151,G+=L 2/5 J,G+G=1 /5 2/5 + 4/5 -2/52/5 4/5 ' ! - G G = -2/5 1 /5 ) Then

AA - 1 /5 + 4/5 -2/5 r ' x 1 0[ -2/5 1 /5 ] [ u ] [ -3 1 ] -

2/5 y

+ 5 Su:X 1

1 ]2-L+y [ -30

5 5 5

5+ 5-2u-3x x2 r _ ],where r, u, x, and y can be chosen arbitrarily.s- T +5 3Y Y

The skeptical reader can compute AAA' A directly and watch the magic of justthe right cancellations in just the right place at just the right time. Suppose in our

example it = r = I and x = y = 5 . Then A91 =1

02/5 5 1 . However,

5.3 Constructing Other Generalized Inverses 211

1 2 0 1/5 _ -45 3/5A AXE _ [ 3 6 -2/5 1 /5 -12/5 9/5 and

Mi A= 0 1 /5

111 2 3/5 6/5

1-2/5 1/5 3 6 ] _ [ 1 /5 2/5

neither of which is symmetric. Also Ari AA91 _0 1 /5 -4/5 3/5 _ -12 9 #

A91 .25[ -2/5 1/5 ] [ -12/5 9/5 ] - [ 4/25

3/25 1

So this choice of u, r, x, and y produces a (I }-inverse of A that satisfies noneof the other MP-equations. In order to gain additional equations, we make somespecial choices.

As a next step, we choose X = 0 in our formula for AR' to produce

AR'14 = [G+: V] R. As you may have already guessed from the notation,

we claim now to have added some symmetry, namely equation MP-4. SurelyMP-1 is still satisfied, and now

1G

ARM A = [G+ : V] RR-' = G+G + V O = G+G, which is evi-

dently a projection. From our example above,AR4 1/5 - 3x x

2/5 - 3y y ] '

Continuing with ourchoice ofx = y = 1/5 we have Axw =[

-2/5 1/5, -1/5 1/5

is in A (1 However AAX 4 =4)1 2 -2/5 1 /5 _

-4/5 3/5

, .3 6 -1/5 1 /5 -12/5 9/5

is not symmetric and A914 A A914 -2/5 1 /5 1 [ -4/5 3/51/5 1 /5 -12/5 9/5

_ -4/25 3/25 Axj.

8/25 6/25

So AK14 satisfies MP-1 and MP-4 but not MP-2 or MP-3.All right, suppose we want to get MP-2 to be satisfied. What more can we

demand? Let's let mathematics be our guide. We wantG

1Akw =AX4AA84 _ [G+V RR-' [G+: V] R

J

= [G+ : V] .

G.[G+ : V ] R = [G+G + 0] [G+ : V R

= [G+GG+ G+G V ] R = [G+ : G+G V ] R. We are close but we badly

need G+GV = V. So we select V = G+W, where W is arbitrary of ap-propriate size. Then G+GV = G+GG+W = G+W = V as we desired.


Now A9124 _ [G+G+ W ] R will he in A { 1, 2, 4} as the reader may verify.

Continuing with our example,

G+W[

21/5 a15/5 ] [a] = [ 2a/5 1 SO

1/5 a15 1 0 _ 115- a/5AR

24 [ 2/5 2a/5 ] [ -3 1 ] [ 2/5- 65 2a/5 j' Choosing

2/5 1/5a = 1, we get A9124 = [ 4/5 2/5

] E A { 1, 2, 41. However, AA11-

-10/5 5/5 _ -2 1

-30/5 15/5 ] -6 3 jis not symmetric, so AA124 ¢ A 13).

We are now very close to A+ = AAI2;+. We simply need the AAsI24 to be sym-

metric. But AA8124 =1 2 1/5 -3a/5 a/5 _ I -3a a

[ 3 6 ] [ 2/5 -6a/5 2a/5 j - 3 -9a 3aWe need 3 - 9a = a or I0a = 3, so a = 3/ 10. The reader may verify that

A+ = [21/50 3/50

/50 6/50 ] It is not at all clear how to get A+ by making another

Cspecial choice for W, say. We have AMI24 = R-1 . [G+ G+W] R =

R-1 [ ® ® ] R. The simple minded choice W = ® does not work, as the

reader may verify in the example we have been running above. However, wesort of know what to do. Our development of the pseudoinverse through full

Grank factorization says look at R = AG+. Then F = R-' G+ _

GG+R ® R, : R, ] = R1, which is just the first r

columns of an invertible matrix R-'. Hence, F has full column rank and soF+ = (F*F)-l F*. Get A+ from G+F+. In our example,

A+G+ = [ 3 6 j [21/5

/5 j = [ 3 ], so F* F = [ 1 3 ] [ i ] _

[ I U], so F+ _ 10 13 ] . Now G+ F+ _ [ 2/5 ] o [1 3 ] _ [21150

/50 6/50 ]Let us summarize with a table.

TABLE 5.1

G1. Given A in form RA Then Am, _

10


[G+ + (I - G+G)X : V1 R E A { I }, where X and V are arbitrary

and G+ = G*(GG*)-'.

JJ

2. AA14 = [G+: V] R E A {1, 4), where X was chosen to be 0 and V is

still arbitrary.

3. A9'24 = [G G+W] R E A 11, 2, 4}, where V is chosen as G+ W,

where W is arbitrary. J

4. Ab123' = G+F+, where F = AG+ and F+ = (F*F)-I F*.

Now let's play the same game with A* instead of A. Let's now reduce A*and be a little clever with notation

F* F*r

S*A* _ ... , so A* = (S*)-i .. and A = I F : 0] S-1. Now F0 0 L J

has full column rank, so F+ = (F*F)-1 F* and clearly F+F = I. Now wetake

F+ + X(I - FF+)AK1 = S , where X and V are arbitrary of appro-

V

F++X(I - FF+)priate size. Then AAx' A = [F ®]S-'S [F : ®]S-'

V

= [FF+ + FX(I - FF+)] [F : ®] S-' == [FF+F+0: ®]S-' = [F ®]S-1 = A so Asi E A(l). Continuing with

our example,

]soAr

A = f 3 6 *2 6, and 1-2 OI ] [ 2 6 ] - [ 0 0,

Taking*,L16J [0

121= 1

3 0

0 ,so

r

L

1 r

L

1

J

r 1

A 11 0 0 Now FL

3 F*= [ 1 3

F*F=[ 1 0],F* 1/10 3/10],FF* 1/10 3/10

3/10 9/10

and I - FF+ =9/10 -3/10 1-3/10 1/10

].ThusA1


9/l0 -3/10f 1 -2 1 io i0 1+ 1 r u -3/10 I/10 J

101 ]

...................................................

L x y J

1/10+ io - io -2x 3/10-3r/IO+ L' -24110

L x y JChoosing r = u = I and x = 1 / 10, y = -1/10 we get A^ 1

r5/10 3/10 1

L 1 / 10 - 1/ I 0 Jis a { I)-inverse of A that satisfies none of the other MP.

equations. Making

I

tspecial choice X = 0, we find AAkii

[F : ®]S-'S F+ = FF+, which is a projection. So, in our example,

AR 1 = r 2x - 2y 1 , a (1, 3)-inverse of A. With x and y as above,10 10

A9 + = [ 1 /11/010 5/1/1010 1 is in A (1, 3) but not A (2, 41 . Reasoning as be-

F+fore, we see the special choice V = WF+ yields AM121 = S . . . as a

WF+r I 2a 3 (w 1

11, 2, 3)-inverse of A. In our example, AA 124 = L 91010 103u/ 1 pn

J

so, with

a = ],we haveL

1//I O 3-3/010 1 as a specific { 1, 2, 3)-inverse ofA A. To get

1 11Si

At we look at G = F+A = F+ F 01 S-1 = [F+F : ®J = S1.LL J S2

which has full row rank so G+ = G* (GG*)-l The reader may verify ourexample reproduces the same A+ as above. We summarize with a table.

TABLE 5.2

1. Given A in C;""', form A* = (S*)-iJ

... to get A = [F ®]S '.

ThenF+ + X(/ - FF+)

A91 = S E A (1), where X and V are arbi-V

2.

trary of appropriate size and F+ = (F*F)-1 F*.

F+A 11, 3), where X was chosen to be 0 and V is

Vstill arbitrary.


F+3. Ag123 = S . E A (1, 2, 3}, where V is chosen as W F+, where

W F+W is arbitrary.

4. A91234 = A+ = G+F+, where G = F+A and G+ = G*(GG*)-I.

We indicate next how to get generalized inverses of a specified rank. We usethe notation A {i, j, k, 11s for the set of all {i, j, k, 1) inverses of ranks. We beginwith {2}-inverses.

THEOREM S.9 (G. W. Stewart, R.E. Funderlie)Let A E Cn"' and 0 < s < r. Then A{2}, = {X I X = YZ, whereYEC"' ZEC!"ZAY=I,).

PROOF Let X be in the right-hand set. We must show X is a {2}-inverseof A. But XAX = YZAYZ = YIZ = YZ = X. Conversely, let X E A{2},.Write X = FG in full rank factorization. Then F E C;" and G E C;"'". ThenX = XAX so FG = FGAFG. But then, F+FGG+ = F+FGAFGG+ so1,. = CAF. 0

COROLLARY 5.7LetAE(C',! Then All, 2}={FGI FEC"xr,GECr"GAF=1r}.

PROOF A{1, 2} = A{2}r. 0

COROLLARY S.8If GAF = 1, then G E (A F)(1, 2, 4).

THEOREM 5.10LetAEC" ando<s <r.Then A{2,3},;{y(AY)+I AYE C;.""s}.

PROOF Let X = Y(AY)+, where AYE Then AX = AY(AY)+, so(AX) = (AX)* Also, XAX = Y(AY)+AY(AY)+ = Y(AY)+ = X. Moreover,

= rank(AY) = rank(AX) = rank(X). Conversely, suppose X E A{2, 3},.Then AX is a projection of rank s. Thus (A))+ = AX and so X (AX)+ =XAX = X, and X plays the role of Y. 0


THEOREM 5.11Let A E C11'-" and 0 < s < r. Then A{2, 4), _ {(YA)+Y I YA E C's: "}.

PROOF The proof goes along the lines of the previous theorem and is leftas an exercise. 0

Researchers have found uses for generalized inverses of type { 1, 2, 3} and11, 2,41 (see Goldman and Zelen [G&Z, 1964)).

Exercise Set 21

1. Argue that A(], 2, 3,41 e AI1, 2, 3) g A{ 1, 2) e A(1), with equalityholding throughout if and only if A is invertible.

2. Suppose G is a 11, 2,31-inverse of A. Argue rank(G) = rank(Al) _rank(A).

3. Argue that the following statements are all equivalent:

(i) A*B = 0.(ii) GB=0,where G e A{I,2,3}.(iii) HA = 0, where H E B(l, 2, 3}.

4. Argue that a matrix G is in A(1, 2, 3) if and only if G = HA*, where His in A*A{l}.

5. Argue that a matrix G is in A{ 1, 2, 4} if and only if G = A*H, whereH is in AA*{ 1).

6. Construct various generalized inverses of AL

1

L -12

01

1

i2i I.

7. Let B = A*(AA*)912. Argue that B is a (I, 2, 4)-inverse of A.

8. Let C = (A*A)yl2 A*. Argue that C is a 11, 2, 3)-inverse of A.

9. Let B E A { 1, 2, 4} and C E A 11, 2, 3). Argue that BAC = A+. Is itgood enough to assume B E A 11, 4) and C E A (1, 3)?

10. Suppose A E C""" and SAT = I B ®J , where B E C"" Is

invertible. Then let G = TNS, where NZ X 1Y W

5.4 12)-inverses 217

Then argue

(i) GEA{I}iffZ=B-'.(ii) G E A {1, 2} iffZ = B-' and W = YBX.(iii) G E A { 1, 2, 3} iffZ = B-1, X = -B-1 S, S2, and W = -YS, S2,

S,

where S =S2

(iv) G E A 11, 2,41 iffZ = B-' ,Y= -T2 + T, B-1, and W = -T2 +

T, X, where T = {T,:T2J.(v) G = A+ iff Z = B-', X = -B-' S, S2 , Y = -T2 T, B-1, and

W =T2+T,B-'S,S2.(vi) LetG E A*A{I}.Argue that GA* E A{1,2,3}.LetH E AA*(1).

Argue that A* H E A { 1, 2, 4} .

11. (Urguhart) Let G E A{ 1, 4} and H E A ( I, 3). Prove that G A H = A +.

5.4 {2}-Inverses

In this section, we discuss the 2-inverses of a matrix. The problem of finding2-inverses is a bit more challenging than that of describing 1-inverses because itis a "quadratic" problem in the unknowns. To understand what this means, let's

look at the 2-by-2 case. Given A ad

J, find Xu v J such that

c

XAX = X. That is, solveL u v ] I c d u v I = I u v I for

x, y, u, and v. The reader may check that this amounts to solving the followingequations for x, y, u and v

x = x2a + ycx + xbu + yduy = xay + y2c + xbv + yd vu =uax+vcx+u2b+vduv = uay + vcy + ubv + v2d,

which are quadratic in the unknowns. The reader may also verify (just inter-change letters) that solving for 1-inverses is linear in the unknowns.

We will approach 2-inverses using the rank normal form. First, we need atheorem.

THEOREM 5.12Let A E C" and suppose S and T are invertible matrices of appropriatesize. Also suppose X is a 2-inverse of A. Then T -I X S-' is a 2-inverse of SAT.Moreover, every 2-inverse of A is of this form.


PROOF Let S and T be invertible and let X he a 2-inverse of A. We claimT-'XS-1 isa2-inverse of SAT. To prove this, we compute T-'XS-I(SAT)T-1XS-' = T -' X AX S-' = T -' X S-' , since XAX = X by assumption. Con-versely, let K be any 2-inverse of SAT. Then K(SAT)K = K. Note thatTKSS-'SATT-'TKS-t = TKSATKS = TKS, so if L = TKS, thenLAL = (TKS)(S-'SATT-')(TKS) = TKSATKS = TKS = L. In otherwords, L is a 2-inverse of A and K = T-' LS-1, as claimed. 0

The next theorem shows the structure that every 2-inverse takes relative tothe rank normal form of its matrix.

THEOREM 5.13 [Bailey, 2002]1

Let A E C;""'. Suppose RNF(A) = SAT = [ ® ®] for appropriate

invertible matrices S and T. A matrix X is a 2-inverse of A if and only if

X = T RS, where R W WY J where M is idempotent of size r-by-r

and M is a left inverse of Y and right inverse of W, where all these matricesare of appropriate size.

PROOF Suppose SAT = [ ® ®] , where S and T are invertible, and let

X=TRS=T W WY]S,whereM=M2,My=Y,WM=W,and

M, W, Y are all of appropriate size. Then XAX = T IM Y 1

W WY J

SAT[W WY]S T[W WY]SS'[0 ". 0']T-'T[ W WY

S T[W WY ][0 ®][W WY]S T[W M CD0]M2

M[W WYIS - T[WM

WYIS - T[W

WY]S = X.

Conversely, suppose X is a 2-inverse of A. Then R = T -' X S-' is a

2-inverse of SAT = [®

®

11 j by the previous theorem. Partition R =

[W

Z] , where M is r-by-r. Then [

M Z J [ ® ® ,

1W Z ] - [ W 0 ] [ W Z ] - [ W M W Y ] - [ W Z]

5.4 {2}-Inverses 219

The conclusions follow by comparing blocks (i.e., M = M2, MY = Y,WM=W,andZ=WY). 0

1 2 3 4

Let's look at an example. Let A = 2 4 6 7 . By performing ele-1 2 3 6

mentary row and column operations on A, we find RN F(A) =1

00

0

1

0

0

00

0

00

=

-7 4 01 0 -2 -3

2 -I 0 A0 0 I 0

According to the previous

-5 2 1

0 0 0 1

0 1 0 01 1

theorem, we need a 2-by-2 idempotent matrix M. So choose M =2 2

r

Next we need Y such that MY = Y. That is, Ii

L Y2 1

= IY2Y,

1

Y,2 2 L

The reader may verify that this equation implies y, = y2, so we might justas well take them to equal 1. We also need WM = W; that is, we need

Iwl w2] [wl w2]. Once again the reader may verify this en-2 2

tails that w1 = w2; again why not use I? Now we can put R together; R =I I

2 2

! 12 2 Finally we c ompute TRS, which will be a 2-inverse of A.

1 2

1 2135 _L _92 2

TRS = _-15

7 2 . The final verification left to the reader is that

15 7 1

2 2this matrix is indeed a 2-inverse of A as promised.

Next, we look at the connection of the 2-inverse to other generalized inversesof a matrix. Let's fix an m-by-n matrix A of rank r over C. Note that if we havea 2-inverse X for A, then rank(X) = rank(X AX) < rank(A) = r. Now choosearbitrary matrices E in Cnxk and H in Ckx"' and form the k-by-k matrix HAE.Suppose we can find a2-inverse of HA E, say (HAE)x2. For example, (HAE)+would certainly be one such. Then, if we form X = E(HAE )112H, we findXAX H) = E((HAE)A2)(HAE)((HAE)A2)H = E ((HAE)K2) H = X. In other words, X is a 2-inverse of A. But do all2-inverses look like this? Once again we appeal to full rank factorizations ofA. Suppose A = FG is a full rank factorization of A. Then our X above lookslike X = E ((H FG E)R2) H. Note H F is k-by-r and G E is r-by-k.


THEOREM 5.14Suppose k = r =rank(A). Then HAE is invertible iff HF and GE are invertible.

PROOF Note that HF, GE, and HAE are all r-hy-r, and HAE =(HF)(GE). Thus det(HAE) = det(HF)det(GE). The theorem follows fromthis formula and the familiar fact that a matrix is invertible if it has a nonzerodeterminant. D

THEOREM S.1 SLet k = r = rank(A) and choose H and E so that HAE is invertible. ThenX = E(HAE)-' H is a {/, 2}-inverse of A.

PROOF We already know X is a 2-inverse of A. We compute, AX A =AE(HAE)-'HA = AE(HFGE)-'HA = AE(GE)-'(HF)-'HA =FGE(G E)-' (H F)-'HFG = FG = A, making X a 1-inverse as well. D

THEOREM S.16With the hypotheses of the previous theorem, add that we choose H = F*. ThenX is a (1, 2, 3}-inverse of A.

PROOF We need only that (AX)* = AX. But AX = AE(HAE)-'H= AE(F*FGE)-' F* = (FGE)(GE)-'(F*F)-' F* = FF+, which we knowto be self-adjoint. D

THEOREM S.17In the previous theorem, choose E = G* instead of H = F*. Then X is a(1, 2, 4)-inverse of A.

PROOF Weonlyncedthat(XA)* = XA.ButXA = E(GE)-'(HF)-l HFG= E(GE) -' (H F)-'H FG = E(G E)-' G = G*(GG*)-' G = G+G, which weknow to be self-adjoint. 0

THEOREM 5.18In the previous theorem, choose both H = F* and E = G*. Then X = A+.

PROOF In this case, X = G*(GG*)-'(F*F)-' = G+F+ =A+. D

Can we find a way to control the rank of a 2-inverse of a given matrix? Thenext theorem gives some insight into this question.

5.4 {2}-Inverses 221

THEOREM 5.19Let A E C;'"". Suppose 0 < s < r. Then

1. The rank r 2-inverses of A are exactly the (1, 2)-inverses of A.

2. If 0 < s< r, the ranks 2-inverses of A form the set { Y Z I Y E C""' , Z ECS", and ZAY = 1, }.

PROOF The proof of Theorem 5.19 (1) is left as an exercise. For (2), let Xhe a rank s 2-inverse of A. Let X = YZ be a full rank factorization of X sothat then Y E C"" and Z E C;"". Now YZ = X = XAX = YZAYZ. ButY+Y = 1, = ZZ+, so 1, = ZAY. Conversely, let X = YZ, where ZAY = /,.Then XAX = YZAYZ = YIZ = X. 0

We have avoided a serious issue until now. Above we wrote 2-inverses asX = E(HAE)-' H. But how did we know we could ever find any matrices Eand H so that HAE is invertible? We have the following theorem.

THEOREM 5.20Let A E C'"". Then X is a 2-inverse of A if and only if there exists E andH where E has full column rank, H has full row rank, Col(X) = Col(E),Col(X*) = Col(H*), HAE is invertible, and X = E(HAE)-' H.

Before we leave this section, we can actually determine all 1-inverses of agiven rank. Suppose G E Ail }, where A E Cn'". We know r < rank(G) <minim, n). We claim all the rank s I-inverses of A are in the set (YZ

Y E C""', Z E C;""', where ZAY = I ® (D }. Indeed, suppose G

belongs to this set. Then G = YZ and rank(G) = s. Partition Y with ablock of its first r columns and Z with its first r rows. Say Y = [Y, I Y2]

Z,and Z = . It follows that Z, A Y, = Ir and Z, A Y2 = 0. Let

ZZ

G, = Y,Z,. Then GIAG, = Y,Z,AY,Z, = Y,1ZI = Y,Z, = G soG E A (2). But rank(G I) = r = rank(A) so by (5.8), G, is also a 1-inverse of A.

Z,

Thus, AG,AGA = AYIZ,AYZA = AYI[1, 1 ®] A = AY,ZIA =ZZ

AG, A = A. Conversely, let G be a 1-inverse of A of ranks. Let G = FH be afull rank factorization of G so F E C' " and H E C;"'". Then HAFHAF =HAGAF = HAF so HAF is an idempotent of rank r. Thus there exists

an invertible matrix S with SHAFS-1 _ Ir ®J. Let Y = FS-1, and


Z=SH.Then YEC;",ZEC,""'andZAY=SHAFS--' _ I Jr 01

YZ = FH = G.

Exercise Set 22

1. Find a 2-inverse for D =1

00 0

-1 0 , following the examp le

worked above.0 0 0

0 1 0

2. Find a 2-inverse for N-= 0 0 1 , following the exa mple worke d

above.0- 0 0

3. Verify the equations for the 2-by-2 case listed at the beginning of thissection.

4. Let A = FG he a full rank factorization of A E Cr"". Argue thatGs' Ft' E A{1}, Gx=FR' E A{2), GX4Fb' E A{4}, G91 F92 E A{2}, andGA' F1' E All). Finally, argue that A+ = G+Fx'+ = G914 F+.

5. Suppose A = FG is a full rank factorization of A. Argue that F(G F)-I Gand F(GF)+G are 2-inverses of A.

Further Reading

[Bailey, 20031 Chelsey Elaine Bailey, The (21-Inverses of a Matrix,Masters Thesis, Baylor University, May, (2003).

[Greville, 19741 T. N. E. Greville, Solutions of the Matrix EquationX AX = X and Relations Between Oblique and Orthogonal Projectors,SIAM J. Appl. Math., Vol. 26, No. 4, June, (1974), 828-831.

[Schott, 19971 James R. Schott, Matrix Analysis for Statistics, JohnWiley & Sons, New York, 1997.

5.5 The Drazin Inverse 223

5.5 The Drazin Inverse

We have looked at various kinds of generalized inverses, most dealing withthe problem of solving systems of linear equations. However, other inverseshave also been found to be useful. The one we consider next was introduced byM. P. Drazin [Drazin, 19581 in a more abstract setting. This inverse is intimatelyconnected with the index of a matrix. It is defined only for square matrices andlike the MP-inverse, is unique.

DEFINITION 5.2 (Drazin inverse)Let A E Cnxn with index(A) = q. Then X E C"x" is called a Drazin

inverse (D-inverse for short) of A ifX satisfies the following equations:

(D1)XAX = X(D2) AX = XA(D3) Aq+' X = Aq

We see that a D-inverse of A is, in particular, a 2-inverse of A that commuteswith A. Well, the zero matrix does that! So it must be (D3) that gives theD-inverse its punch. Let's settle the issues of existence and uniqueness rightaway.

THEOREM 5.21Let A E Cnx", with index(A) = q. Then there exists one and only one matrixX E C"11 that satisfies (DI), (D2), and (D3). We shall denote this unique matrixby A° and call it the Drazin inverse of A.

PROOF For uniqueness, suppose X and Y both satisfy (Dl) through (D3).Then Aq+'X = Aq = Aq+'Y so MAX = MAY. Thus AgXA = AgYA.Then AgXAX = AgYAX so AqX = AgYAX. Now Aq-'AX = Aq-'AYAXso Aq-'XA = Aq-'AMAX so Aq-'XAX = Aq-'AYAX2 so Aq-'X =Aq-' AYX AX = Aq-'AYX = Aq-' YAX. Thus we have peeled off one factorof A on the left to get Aq'' X = Aq-' YAX . Continue this process until allAs are peeled away and conclude X = YAX. A symmetric argument givesY = XAY.Next, (D3)implies Aq(AY-1) = 0, soXAq(AY- 1) =OwhenceAq(XAY - X) = 0. Now using (D1), Aq(XAY - XAX) = 0 so Aq(XA)(Y-X) = O, hence X Aq X A(Y - X) =0. Using (D2), Aq-' (X AX)A(Y-X) =® so, by (DI), Aq-'(XA)(Y - X) = 0, so Aq-'(XAY - XAX) = Aq-'(X AY - X) = 0. Again, this shows we can peel off factors of A on the left andconclude X AY - X = O so X = X A Y = Y by the above.


There are various approaches for existence, but since we used the full rankfactorization to get the MP-inverse, we do the same for the D-inverse. If A ECnxn has full rank n, then take A° = A-1 and easily check the three equationsfor the D-inverse. It is a fact that the index q of a matrix is characterizedby a sequence of full rank factorizations: A = FIG,, G, F, = F2G2, ... .

Then index(A) = 'q if

l if GqFqfor the first time. Define A° _

l 4F9F, F2 . Fq(Gq Fq)-(q+I )GgGq_1 GI, when (Gq Fq)-l exists

0 ifGgFq =Note, if Gq Fq = 0, then A is nilpotent of index q + 1. It is straightforward

to verify that AD so defined actually satisfies (D1) through (D3). 0

A number of facts are easily deduced about the D-inverse.

THEOREM 5.22Let A E C0", with index(A) = q. Then

I. Col(A°) =Col(Aq).

2. JVull(A°) =AIull(Aq).

3. (AA°)22 = AA° is the projector of C" onto Col(Aq) along A(ull(Aq).

4. (I -AA°)2 = (I -AA°) is the projector onto Afull(Aq)along Col(Aq).

5. Aq+P(A°)1' = Aq.

6. A''+1 A ° = A" iff p > q, where p and q are positive integers.

PROOF The proofs are easy and left as exercises. 0

We could have used the core-nilpotent factorization to get the Drazin inverse.

THEOREM 5.231

Let A E C01 --', with index(A) = q > 0. If A = S [ C 0 1S-I is a core-J

nilpotent factorization of A, then A° = SL

C-1 0 S-'.(CD 01

PROOF We leave it as an exercise to verify (D I) through (D3). 0

Campbell and Meyer [C&M, 1979 talk about the core of a matrix A usingthe D-inverse. It goes like this.


DEFINITION 5.3 Let A E C""". Then AA°A is defined to be the core ofthe matrix A and is denoted CA; we also define NA = A - CA.

THEOREM 5.24Let A E C""" . Then NA = A - CA is a nilpotent matrix of index q = index(A).

PROOF If the index(A) = 0, then A = CA, so suppose index(A) > 1. Then(NA)9 = (A - AA°A)` = [A(1- AA °)]y = Ay(I - AA°)y = Ay - Ay = 0.Ifp<q,AP=Av+iA°00,soindex(N)=q. 0

DEFINITION 5.4 (core-nilpotent decomposition)Let A E C"I". The matrix NA =A - CA = (I -A AD)A is called the nilpo-

tent part of A, and A = CA + NA is called the core-nilpotent decompositionof A.

THEOREM 5.25r 11

Let A E C""" and let A = S[1 C ®J S-' be a core-nilpotent factorization

of A. Then CA=SL C 0 S-1 and NA=SI ® ®JS-'.


There is some uniqueness here.

THEOREM 5.26Let A E C""". Then A has a unique decomposition A = X + Y, whereXY = YX = 0. The index(X) < 1 and Y is nilpotent of index q =index(A).Moreover, the unique decomposition is A = CA +NA.

PROOF If index(A) =10, then Y = 0 and Ar is invertible. If index(X) = 1,

write X = S I C0 ® S-'. Then Y = S L® ®J S-1, since XY =YX = 0 and C is invertible. Thus Y2 is nilpotent with index(Y2) = index(A).

NowA=X+Y=SfC

® ]s'.sox=cAandY=NA. 0L Y2

COROLLARY 5.9Let A E C"". Then CA = CAP, NA = NAP, and AP = CAP + NAP. If pindex(A), then AP = C.


Next we list additional properties of the D-inverse, leaving the proofs asexercises.

THEOREM 5.27Let A E C"". Then

if index(A) > I1. index(A) = index(CA)

I= 0 if index(A) = 0

2. NACA=CANA=O

3. NAA°=A°NA=O

4. CAAA° = AA°CA = CA

5. (A°)° = CA

6. A = CA iff index(A) < I

7. ((A°)°)° = A°

8. AD = CD

9. (AD)r = (A*)D

10. A° = A+ iff AA+ = A+A.

There is a way to do hand computations of the D-inverse. Begin with A EC"". Chances are slim that you will know the index q of A, so compute A.If A" = 0, then A° = 0, so suppose A" # O. Find the Hermite echelon formof A, HEF(A). The basic columns v1, v2, ... , v,. form a basis for Col(Ay).Form I - HEF(A) and call the nonzero columns Vr+i, ... , v". They form abasis for Null(Ay). Form S = [vi I I v,. I Vr+J I I v"]. Then S-1 AS_ C O

is a core-nilpotent factorization of A. Compute C-'and formO NA D =S C

O S-1.O O1

1 5 4 3

For example, suppose A =52 15 5

6The index of A is not

L6 10 -6 -6

5974 19570 8446 5562

apparent, so we compute that A 414214 31930 -1854 -4326= 3708 26780 27192 21012-16480 -24720 20600 19776 J

5.5 The Drazin Inverse

1 0 -2 -30 1

3

and note HEF(A) = 20 U U

0 0 0

5974 19570

14214 319303708 26780

-16480 -24720

227

3 ]. Next, we form I - HEF(A)

07 15

-3 -6 ,

2 0where we have eliminated

0 5_ 69 _ L55- 0 0-

2 2127 69

some fractions in the last two columns. Now P- I A P = i 2 0 00 0 0 00 0 0 0

69 _ 155

We see C = 2 2127 6910 2

69 _ 55412 412127 69

2060 412

0 00 0

0 0

0 0

0 00 0

P-1 =

1 5 2 3206 206 103 206

I 0 7 3

103 206 103

5 15 5 3

206 206 206 2063 5 3 3

103 103 103 103

We have shown earlier that A-' can always be expressed as a polynomial inA. The MP-inverse of a square matrix may not be expressible as a polynomialin A. The D-inverse, on the other hand, is always expressible as a polynomialin the matrix.

THEOREM 5.28Let A E C""". Then there is a polynomial p(x) in C[x] such that p(A) = A°.

PROOF Write A = S I ® ®J S-1 in a core-nilpotent factorization.

Now, C is invertible, so there exists a polynomial q(x) with q(C) = C-1. Letp(x) = x9[q(x)]y+I , where q = index(A). Then p(A) = Ay[q(A)]y+I

S

r Cq ®] r q(C) ® ]v+l S-I = S[ C'[q(C)]y+l 0 1 S-I =[® ® II` ® q(N) 0 01

SC-1 0 S_I - A°.

[ 0 01

COROLLARY 5.[pLet A, B be in CnX" and AB = BA. Then

1. (A B)° = B°A° = A°B°

2. A°B=BA°

0


3. AB° = B°A

4. index(AB) < rnax{index(A), index(B)}.

As long as we are talking about polynomials, let's take another look at theminimum polynomial. Recall that for each A in C""" there is a monic polyno-mial of least degree IIA(X) E C[x] such that p.A(A) _ ®. Say pA(x) = xd+ad-IX d-i+ +aix+a". We have seen that A is invertible iffthe constant terma $ 0 and, in this case, A-' = [Ad-i + ad_, A`t-2 + + a2A + al I].

allNow suppose that i is the least positive integer with 0 = a = ai = = a;-ibut a; i4 0. Then i is the index of A.

THEOREM 5.29Let A E C" " and µA (x) = x`t + ad_Ixd-I + + a,x`, with a, # 0. Theni = index(A).

PROOF Write A = S I ® N I

jS-1 in a core-nilpotent factorization

where q = index(A) = index(N). Since AA(A) = 0, we see 0 = p.A(N) _(Nd-i+ad_;Nd-'-1 Since (N"-'+

ad_iNd-'-I + + ail) is invertible, we get N' = 0. Thus i > q. Supposeq < i.Then A°A' = A'-'. Write iLA(x) = x'q(x)soO= IIA(A) = A'-'q(A).Multiply by AD. Then 0 = A'-t q(A). Thus r(x) = x'-'q(x) is a polynomialthat annihilates A and deg(r(x)) < deg(µA(x)), a contradiction. Thereforei =q. 0

There is a nice connection between the D-inverse and Krylov subspaces.Given a square matrix A and a column vector b, we defined in a previous home-work exercise the Krylov subspace, K,(A, b) = span{b, Ab,... , A'-'b}. Wehave the following results that related a system Ax = b to solutions in a Krylovsubspace where A is n-by-n. The proofs are left as exercises.

THEOREM 5.30The following statements are equivalent:

1. A°b is a solution of Ax = b.

2. b E Col(A`J), where q is the index of A.

3. Ax = b has a solution in the Krylov subspace K,,(A, b).


THEOREM 5.31Suppose m is the degree of the minimal polynomial of A and q is the indexof A. If b E Col(Ay), then the linear system Ax = b has a unique solutionx = ADb E K,,,_q(A, b). If b V Col(A<), then Ax = b does not have asolution in K, (A, b).

Exercise Set 23

1 0 1 1 2 1

1. Find the D-inverse of 0 0 1 and 0 5 40 0 0 2 -l -2

2. Fill in the proof of Theorem 5.22.

3. Fill in the-proof of Theorem 5.23.



Further Reading

1

[B-I&G, 2003] Adi Ben-Israel and Thomas N. E. Greville, Gener-alized Inverses: Theory and Applications, Springer-Verlag, New York,(2003).

[C&G, 1980] R. E. Cline and Thomas N. E. Greville, A Drazin Inversefor Rectangular Matrices, Linear Algebra and Appl., Vol. 29, (1980),53-62.

[C&M, 19911 S. L. Campbell and C. D. Meyer, Jr., Generalized Inversesof Linear Transformations, Dover Publications, New York, (1991).

[C&M&R, 1976] S. L. Campbell, C. D. Meyer, Jr., and N.J. Rose, Appli-cations of the Drazin Inverse to Linear Systems of Differential Equationswith Singular Constant Coefficients, SIAM J. Appl. Math., Vol. 31, No.3, (1976), 411-425.


[Drazin, 1958] M. P. Drazin, Pseudo-Inverses in Associative Rings andSemi-Groups, The American Mathematical Monthly, Vol. 65, (1958),506-514.

IH&M&S, 20041 Olga Holtz, Volker Mehrmann, and Hans Schneider,Potter, Wielandt, and Drazin on the Matrix Equation AB = wBA: NewAnswers to Old Questions, The American Mathematical Monthly, Vol.111, No. 8, October, (2004), 655-667.

[I&M, 1998] Ilse C. F. Ipsen and Carl D. Meyer, The Idea BehindKrylov Methods, The American Mathematical Monthly, Vol. 105, No.10, December, (1998), 889-899.

[M&P, 1974] C. D. Meyer, Jr. and R. J. Plemmons, Limits and the Indexof a Square Matrix, SIAM J. Appl. Math., Vol. 26, (1974), 469-478.

[M&P, 1977] C. D. Meyer, Jr. and R. J. Plemmons, Convergent Powersof a Matrix with Applications to Iterative Methods for Singular LinearSystems, SIAM J. Numer. Anal., Vol. 36, (1977).

[M&R, 1977] C. D. Meyer, Jr. and Nicholas J. Rose, The Index and theDrazin Inverse of Block Triangular Matrices, SIAM J. Applied Math.,Vol. 33, (1977).

[Zhang, 20011 Liping Zhang, A Characterization of the Drazin Inverse,Linear Algebra and its Applications, Vol. 335, (2001), 183-188.

5.6 The Group InverseThere is yet another kind of inverse that has been found to be useful.

DEFINITION 5.5 (group inverse)We say a matrix X is a group inverse of A iff it satisfies

(MPI)AXA=A(MP2) XAX = X(3) AX = XA.

5.6 The Group Inverse 231

Thus, a group inverse of A is a 11, 2}-inverse of A that commutes with A.First, we establish the uniqueness.

THEOREM 5.32 (uniqueness of the group inverse)If a matrix A has a group inverse, it is unique.

PROOF Suppose X and Y satisfy the three equations given above. ThenX = XAX = AXX = AYAXX = YAX = YYA = YAY = Y. O

It turns out that not every matrix has a group inverse. We use our favorite,the full rank factorization, to see when we do. When a group inverse does exist,we denote it be A".

THEOREM 5.33 [Cline, 1964]Suppose the square matrix A = FG is in full rank factorization. Then A has agroup inverse if and only if G F is nonsingular. In this case, A" = F(G F)-2 G.

PROOF Suppose r = rank(A) and A = FG is a full rank factorization of A.Then GF is in C',' and A2 = FGFG = GF) , sorank(A2) = rank(GF),since F has full column rank and G has full row rank. Thus, rank(A2) = rank(A)if and only if GF is nonsingular. Let X = F(GF)-2G. We compute AXA =FGF(GF)-2GFG = FG = A, XAX = F(GF)-2GFGF(GF)-2G =F(G F)-2G = X and X A = FG F(G F)-2G = F(G F)-2G = F(G F)-(G F)-I G FG = F(G F)-2G FG = X A. 0

COROLLARY 5.11A square matrix A has a group inverse if and only if index(A) = I and if andonly if rank(A) = rank(A2).

Exercise Set 24

1. If A is nonsingular, argue that A* =

2. Prove that A** = A, if A" exists.

3. Prove that A*# = A"*, if A' exists.

4. Prove that And = A"" for any positive integer n.


5. Show that A" = AGA for any G in A3(1 ).

6. Create a group inverse of rank 2 for a 4-by-4 matrix and a D-inverse ofrank 2 for the same matrix. What are the differences'?

Further Reading

[C&M, 1991 ] S. L. Campbell and C. D. Meyer, Jr., Generalized Inversesof Linear Transformations, Dover Publications, New York, (1991).

[Hartwig, 1987] Robert E. Hartwig, The Group Inverse of a BlockTriangular Matrix, in Current Trends in Matrix Theory, Elsevier SciencePublishing Co., Inc., F. Uhlig and R. Grone, Editors, (1987).

[Meyer, 1975] C. D. Meyer, Jr., The Role of the Group GeneralizedInverse in the Theory of Finite Markov Chains, SIAM Review, Vol. 17,(1975), 443-464.

Chapter 6

Norms

length, norm, unit ball, unit sphere, norm equivalence theorem,distance fienction, convergent sequence of vectors, Holder's inequality,Cauchy-Schwarz inequality, Minkowski inequality

6.1 The Normed Linear Space C"

We have seen how to view C" as a space of vectors where you can study thealgebra of addition and scalar multiplication. However, there is more you cando with vectors than just add them and multiply them by scalars. Vectors alsohave properties of a geometric nature. For example, vectors have length (somepeople say magnitude). This is the concept we study in this chapter. Weemotivated by our experience in R", specifically R2.

By the famous theorem about right triangles (remember which one?), thelength square of x = (XI, x2) is Ilxll2 = xi + xZ. It is natural to define thelength of x as Ilxll = x1 + x2. However, when dealing with complex vectors,we have a problem. Consider x = (1, i) in C2. Then, following the previousformula, Ilxll2 = 12+i2 = I-1 = 0. Unless you are doing relativity in physics,this is troublesome. It says a nonzero vector in C2 can have zero length! Thatdoes not seem right, so we need to remedy this problem. We can use the ideaof the magnitude of a complex number to make things work out. Recall that ifthe complex number z = a + bi, then Iz12 = a2 + b2 = zz = zz. Let's defineIlxl12 = 1112 + li I2 = I + I = 2; then llxll = f, which seems much morereasonable. Therefore, we define Ilxl12 = Ixi I2 + Ix2I2 on C2.

With this in mind, we define what we mean by the "length" or "norm" ofa vector and give a number of examples. Common sense tells us that lengthsof nonzero vectors should be positive real numbers. The other properties weadopt also make good common sense. We can also use our knowledge of the"absolute value" concept as a guide.

233

234

x2

Norms

---------------- (x1, x2)

xl

Figure 6.1: Norm of a vector.

DEFINITION 6.1 (norm)A real valued function 11 11 : C" -* R is called a norm on C" iff

1. llxll >OforallxinC"andllxll =0ifandonly

2. Ilaxll = lal llxllforallxinC" andallainC

3. llx + yll < llxll + Il ll for all x, y in C" (triangle inequality).

It turns out that there are many examples of norms on C". Here are just afew. Each gives a way of talking about the "size" of a vector. Is it "big" or isit "small"? Our intuition has been formed by Euclidean notions of length, butthere are many other ways to measure length that may seem strange at first.Some norms may actually he more convenient to use than others in a givensituation.

Example 6.1Let x = (x1,x2, ... , be a vector in C".

6. 1 The Norrned Linear Space C" 235

'I1. Define 11x111 = > lx; 1. This is a norm called the e, norm, sum norm, or

r=i

even the taxicab norm (can you see why?).

2. Define lIX112 = ( Ixt12) . This is a norm called the f2 norm or Eu-

clidean norm.

I " 1/P

3. More generally, define IIx11P = I > Ix; IP , where I < p < oo. This

is a norm called the eP norm or just the p-norm for short and includes theprevious two examples as special cases.

4. Define IIxLIoo = max {1x;1}. This norm is called the l... norm or the max, <, <

norm.

5. Suppose B is a matrix where B = S2 with S = S* # 0. Then define11x118 = (x*Bx)2. This turns out to be a norm also.

There are two subsets that are interesting to look at for any norm, their unitball and unit sphere.

DEFINITION 6.2 (unit ball, unit sphere)Let 11 11 be a norm on C Then, the set B, = {x E C" I lIxll < 1) is called the

unit ball for the norm, and S, = (x E C I Ilxll = 1) is called the unit spherefor the norm.

It may be helpful to visualize some of these sets for various p-norms in R2(see Figure 6.2).

THEOREM 6.1 (basic facts about all norms)Let 11 11 be a norm on C" with x and y in C. Then

1. Ix11 = II-x11 for all x in C".

2. 111 x 11 - I l y 1 1 I < 11x - yll < IIxII + Ilyll.

3. For any invertible matrix S in C" x", the function IIx 11s = II All defines anorm 1111 s on C.

4. The unit ball is always a closed, bounded, convex set that contains theorigin of C".

236

P=°°

p=7

p=2

p = 1.5

P=1

Nouns

y

Figure 6.2: Unit spheres in various norms.

5. Every norm on C" is uniformly continuous. In other words, given anyE > 0 there exists S > 0 such that Ix; - y; I < S for 1 < i < it impliesIII x11 - IIYIII <e, where x=(x,,x2,...

PROOF The proofs are left as exercises. 0

Recall that a subset K of C" is called convex if x and y in K implies thattx + (I - t )y also belongs to K for any real t with 0 < t < 1. In other words, ifx and y are in K, the line segment from x toy lies entirely in K. We remark thatgiven any closed, bounded, convex set K of C" that contains the origin, there isa norm on C" for which K is the unit ball. This result is proved in Householder[ 1964]. This and Theorem 6.1(3) suggest that there is an enormous supply ofnorms to choose from on C". However, in a sense, it does not matter which oneyou choose. The next theorem pursues this idea.

THEOREM 6.2 (the norm equivalence theorem)Let 1111 and 11 11' be any two norms on C". Then, there are real constants C2

C, > 0 such that C, 11x11 < Ilxil' < C2 IIx11 for all x in C".

6. 1 The Normed Linear Space C" 237

PROOF Define the function f (x) = Ilxll' from C" to R. This is a continuousfunction on C". Let S = {y E C"

IIIxII = 1). This is a closed and bounded

subset of C". By a famous theorem of Weierstrass, a continuous function ona closed and bounded subset of C" assumes its minimum and its maximumvalue on that set. Thus, there exists a Ymin in S and a Ymax in S such thatf (Ym,n) < f (Y) < f (Ymax) for ally in S. Note ymin $ -6 since IlYmin II = 1SO f (Ynun) > 0. Now take any x in C" other than ', and form Note

IIxII .

that this vector is in S, so Ilxll' = IlxllI I IIxII

II > IIxII f(Yn,,,,). In fact, the

inequality IIxII' > IIxII f(Ymin) holds even if x = V since all it says then isx

' < IIxII f(Ymax) Putting this together,0 > 0. Similarly, IIxII' = IIxII IIIIxII

we have IIxII f(Ymin) Ilxll' < IIxII f(Ymax) Clearly, f(ymin) -< f(Ymax) TakeCI = f (Yrmn) and C2 = f (ymax) 0

Since we do not really need this result, you may ignore its proof altogetherbecause it uses ideas from advanced calculus. However, it is important to knowthat whenever you have a norm - that is, a notion of length - you automaticallyhave a notion of "distance."

DEFINITION 6.3 (distance function)Let 11 11 be a norm on C". For x and y in C", define the distance from x to y

byd(x,y)= IIY - xll.

THEOREM 6.3 (basic facts about any distance function)Let d be the distance function derived from the norm 11 11. Then

I. d(x, y) > O for all x, y in C" and d(x, y) = 0 iff x = y.

2. d(x, y) = d(y, x) for all x, y in C" (symmetry).

3. d(x, z) < d(x, y) + d(y, z) for all x, y, z in C" (triangle inequalityfor distance).

4. d(x, y) = d(x + z, y + z) for all x, y, z in C" (translation invariance).

5. d(ax, ay) = foal d (x, y) for all x, y in C" and all a E C .

PROOF The proofs here all refer back to basic properties of a norm and sowe leave them as exercises. 0

238 Norms

A huge door has just been opened to us. Once we have a notion of distance,all the concepts of calculus are available. That is because you now have a wayof saying that vectors are "close" to each other. However, this is too big a storyto tell here. Besides, our main focus is on the algebraic properties of matrices.Even so, just to get the flavor of where we could go with this, we look at theconcept of a convergent sequence of vectors.

DEFINITION 6.4 (convergent sequence of vectors)Let (XI, x2, ... ) be a sequence of vectors in C". This sequence is said

to converge to the vector x in C" with respect to the norm 11 11 if for each e > 0there is a natural number N > 0 such that if n > N, then lix,, - xii < e.

One final remark along these lines. In view of the norm equivalence theorem,the convergence of a sequence of vectors with respect to one norm implies theconvergence with respect to any norm. So, for convergence ideas, it does notmatter which norm is used. Once we have convergence, it is not difficult toformulate the concept of "continuity." How would you do it?

The astute reader will have noticed we did not offer any proofs that the normsof Example 6.1 really are norms. You probably expected that to show up in theexercises. Well, we next give a little help establishing that the EP norms reallyare norms. When you see what is required, you should be thankful. To getthis proved, we need a fundamental result credited to Ludwig Otto Holder(22 December, 1859-29 August, 1937).

THEOREM 6.4 (Holder's inequality)Let a,, a2, ... , a and b, , b2, ... , b be any complex numbers and let p and

q be real numbers with 1 < p, q < oc. Suppose - + l = 1. Thenp q

'lv tlq

laabiI Jail'' (ibIi1

PROOF If either all the as are zero or all the bis are zero, the inequalityabove holds trivially. So let's assume that not all the as are zero and notall the bis are zero. Recall from calculus that the function f(x) = ln(x) isalways concave down on the positive real numbers (look at the second derivativef"). Thus, if a and 0 are positive real numbers and k is a real number with0 < k < 1, then k In(a) + (1 - k) In(p) < ln(k(x + (1 - k)(3) (see Figure 6.3).

To see this, note that the equation of the chord from (a, en(«)) to (a, £n(R))

is en(p) = en(p) - en(a)(x- 0), so ii'x = ka+(1 - k

0 -«

6.1 The Normed Linear Space C" 239

11

+h1.1 ((3, In((3))

ax + (1-X)(3

Chord: y - in(p) =In(p) - lav(a) (x -

It x = aA + (1-A)li, y = Aln(a) + (1-A)In(13)

Figure 6.3: Proof of Holder's Theorem.

£n((3) - en(a) - en(a)_ (Xa + (1 - X)13- R) = en(13) + _M(p)

- R)) _en(pj)+X(en((3)-en(a)). From the picture it is clear that k ln(a)+(1-X) ln((3) <ln(ka + (1 - X)13).

Now, by laws of logarithm, ln(a'`(3'-A) < In(Xa + (I - X)p). Then, sinceln(x) is strictly increasing on positive reals (just look at the first derivative f'),we get

aAP1-A < lea + (I - X)(3. Now choose

a; = lay

Iv

13. =Ib. I"

and k =1-.

EIaiI;,FIb1I"

P

J=1 l=1

11 11

Let A = > Iaj I P and B = 1: I bi I y1=1 J=1

Note that to even form a; and (3;, we need the fact that not all the aj and notall the bj are zero.

Then using a; (3;1 ->, < ka1 + (1 - 1\ _ I , we getp

CJaAP/1

CJb81y/1-1

P

or

Jai I 1 b 1 1 1 la; I l 1 Ibl ly(A)l/P (B)hI'

P A + 9 B

240 Norms

This is true for each i, I < i < n, so we may sum on i and get

of

Jai IJbiI I Elaibil(A)1 /v (B) 1/q = (A)1/" (B) 1/q,-1

I n

Iai1" 1" Ibilq I I I I

If

< -E + -j- = --E IailP + --E Ibilqp i=1 (A) 9 1=1 (B) p (A) i=1 9 (B) i=1

II A+

I IB= I+ I= 1.

p (A) q (B) p qI n

Therefore, (A)1/ (B)1/q laibi I < 1.i=1

,

By clearing denominators, we get laibiI -< (A)1 /v (B) 1/q

i=1

n n

IaJIP)

'/p

E Ibil

11y

j=1

Wow! Do you get the feeling we have really proved something here? Actually,though the symbol pushing was intense, all we really used was the fact that thenatural logarithm function is concave down.

As a special case, we get a famous inequality that apparently dates backto 1821 when Augustin Louis Cauchy (21 August 1789-23 May 1857) firstproved it. This inequality has been generalized by a number of mathematicians,notably Hermann Amandus Schwarz (25 January 1843-30 November 1921).

COROLLARY 6.1 (Cauchy -Schwarz inequality in C")Take p = q = 2 above. Then

n1/2

n 1/2

a.b < /Ia, Iz Ib, . I2I I

i=1 !=1 i-I

COROLLARY 6.21/P

T h e function I I I I p : C" -+ R d e f i n e d by Ilxll,, = (E I xi l°) is a norm on C"

wherex = (x1,x2,... x,,) and I < p < oo.

6. 1 The Normed Linear Space C" 241

PROOF We must establish that this function satisfies the three definingproperties of a norm. The first two are easy and left to the reader. The challengeis the triangle inequality. We shall prove llx + ylli, < IIxIIP + Ilylli, for all x, y

in C". Choose any x, y in C". If x + y = -6, the inequality is trivially true, sosuppose x + y # 6. Then IIx + yll, 4 0. Thus

IIx+ylIP=?Ixi+yil''i=1

n n

Ixi+yillxi+yiiP-' (Ixil+lyiI)lxi+yiIP-'

i=1 i=1

n n

i=1 i=I

n UP n I/qn UP

IxilP (lxi+yii)(P-1)q + lyiIP

i=l i=1 i=1

n 1/q n 1/q

x Ixi + yi I(P-l)yI = (IIxIIP + IIyIIP) lxi + yi I(P-I)4

Ci=1i=1

P/Pq

(IIxIIP+IIyIIP) (EIxi+yilP Ii-

= (IIxIIP + liyIIP)(IIx+YIIP)P/q

Dividing both sides byllx+yliP/q , we getllx+yli, = IIx+YIIP -P/q -< IIxIIP+IIyIIP which is what we want.

Notice we needed lI x + y II Pi4 0 to divide above. The reader should be able to

explain all the steps above. It may be useful to note that pq - q = (p - 1)q =follows from v + 1 = 1.

This last inequality is sometimes called the Minkowski inequality afterHermann Minkowski (22 June 1864-12 January 1909).

Exercise Set 25

1. Give direct arguments that 21 and e«, are norms on C".

-jT2. For x in C", define Ilxll = I if x540 if x= -6 .IsthisanormonC"?

242 Norn,.s

3. Let d he a function satisfying (1), (2), and (3) of Theorem 6.3. Doesllxll = d(x, 6) define a norm on C"? Supposed satisfies (1), (3), and(5). Do you get a norm then?

4. A vector x is called a unit vector for norm II II if J x 11 = 1. Suppose IIx ll 1

and x # V. Can you (easily) find a unit vector in the same directionas x?

5. If you have had some analysis, how would you define a Cauchy sequenceof vectors in C" ? Can you prove that every Cauchy sequence in C"converges? How would you define a convergent series of vectors in C" ?

6. Define 11 11 : C" -* R by Ilxll = Ixi 14- Ix21 + - + "'- Ix,, I. Show that 11 11

is a norm on C" and sketch the unit ball in R2.

7. Let x be any vector in C". Show that Ilxlla, = viim Ilxllt, (Hint: n'1't' -*00

I as p -o oc and Ilxll,, < n'/t' IIXII. )

8. Compute the p-norm for the vectors (1, 1, 4, 3) and (i, i, 4 - 2i, 5) forp = 1, 2, and oo.

9. Argue that a sequence (x,,) converges to x in the oo-norm if each com-ponent in x converges to the corresponding entry in x. What is the limitvector of the sequence (1, 4 + 71 !'2M in the oo-norm?

n is

10. Prove llxllc,, 11x112 < n Ilxll,,, for all x in C". Why does this saythat if a sequence converges in the oc-norm, then the sequence mustalso converge to the same result in the 2-norm? Also argue that llxlli

lIX112 and Ilxlli < n Ilxll... for all x in C".

11. Discuss how the 2-norm can he realized as a matrix multiplication. (Hint:If x is in C"' 1, IN 112 = "'X-'X.)

12. Argue that x -). x iff the sequence of scalars llx,, - xll -+ 0 for anynorm.

2 p,

13. Argue that C>xi < nEXi , where the x; are all real.,_i i=i

14. Determine when equality holds in Holder's inequality.

15. Can IN - Y112 = IN + YII2 for two vectors x and y in C"?

16. Argue that I Ilxll - IIYII I min(IIx - YII , IN + YII) for all x and yin C".

17. How does the triangle inequality generalize ton vectors?

6.1 The Normed Linear Space C" 243

18. Argue Ilxll,0 < lIX112 -< 11x111 for all x in C".

19. Prove all the parts of Theorem 6.1.


21. Prove if p < q, then n-v IIxII,, < Ilxlly < Ilxlli, for any x E C".

22. Let (0, 0) and (1, 2) be points in the (x, y)-plane and let Li be the line thatis equidistant from these two points in the el, e2, and e,, norms. Drawthese lines.

23. Compute the 1, 2, and oo distances between (-10, 11, 12i) and (4, 1 +i, -35i).

24. Prove that if a sequence converges in one norm, it also converges withrespect to any norm equivalent to that norm.

25. Can you find a nonzero vector x in C2 such that Ilxll,,. = lIX112 = IIxII1?

26. Suppose Ilxll,,, IlXlln are norms on C". Argue that max{llxll,,, IlXlln} isalso an norm on C". Suppose X1 and X2 are positive real numbers. Arguethat K1 IIxII" + K2 IlXlln is also a norm.

27. Could the p-norm 11x11 , be a norm if 0 < p < 1?

28. Consider the standard basis vectors e; in C". Argue that for any norm II II

on C', IIxII < IxiI Ileill,where x=(x1,X2,... ,xn).f=1

29. Suppose U is a unitary matrix(U* = U-1). Find all norms from theexamples given above that have the property IlUxll = IIxII for all x =(XI, x2, ... , Xn) In 01.

30. A matrix V is called an isometry for a norm 11 11 iff II Vxll = IIxII forall x = (X1, x2, ... , x,,) in C". Argue that the isometrics for a norm onC" form a subgroup of GL(n, Q. Determine the isometry group of theEuclidean norm, the sum norm, and the max norm.

31. Statisticians sometimes like to put "weights" on their data. Supposew1, w2, ... , W. is a set of positive real numbers (i.e., weights). Does

IIxII = (Ewi Ix;l")1 tN define a norm on C" where I < p < oo? Howi=1

about I I x I I = max{w, Ixi l , W2 Ix2l, ... , wn Ixnl}?

32. Verify that Example (6.1)(5) actually defines a norm.

244 Norms

33. Argue that the intersection of two convex sets is either empty or again aconvex set.

34. Prove that for the 2-norm and any three vectors zi, z2, z; I1z1 - z21122 +Ilz2 -23112+11?, -2,112+IjZ, +z2 +2,112 = 3(IIzjII'+IIz2112+112.,1121.

35. There is another common proof of the key inequality in Holder's in-equality. We used a logarithm, but you could define a function f (t) =>\(t - 1) - tn` + 1, 0 < X < 1. Argue that f(1) = 0 and f '(t) > 0.Conclude that t?` < Xt + (I - X). Assume a and b are nonnegative real

numbers with a > b. Put t = a , A = 1 . If a < b, put t = b and k =b p a q

where 1 + - = 1. In each case, argue that a T b < a +b-.

p q p q

36. As a fun problem, draw the lines in the plane 1R2 that are equidistant from(0, 0) and (1, 2) if distance is defined in the 1, 2 and oo norms. Try (0, 0)and (2, 1) and (0, 0) and (2, 2).

37. Prove that 2 11x - y11 ? (11x11 + Ilyll)

x and y.

x y

Ilxll Ilyllfor nonzero vectors

Further Reading

[Hille, 1972] Einar Hille, Methods in Classical and Functional Analysis,Addison-Wesley Publishing Co., Reading, MA, (1972).

[House, 1964] Alston S. Householder, The Theory of Matrices in Numer-icalAnalysis, Dover Publications, Inc., New York, (1964).

[Schattsch, 1984] D. J. Schattschneider, The Taxicab Group, TheAmerican Mathematical Monthly, Vol. 91, No. 7, 423-428, (1984).

matrix norm, multiplicative norm, Banach norm,induced norm, condition number

6.2 Matrix NormsWe recall that the set of all in-hy-n matrices C111" with the usual addition andscalar multiplication is a complex vector space, the "vectors" being matrices. It

6.2 Matrix Norms 245

is natural to inquire about norms for matrices. In fact, a matrix can be "strungout" to make a "long" vector in C"' in various ways, say by putting one rowafter the other. You could also stack the columns to make one very "tall" columnvector out of the matrix. This will give us some clues on how to define normsfor matrices since we have so many norms on Cl'. In particular, we will see thatall the vector norms of the previous section will lead to matrix norms. Thereara, however, other norms that have also been found useful.

DEFINITION 6.5 (matrix norm)A real valued function 11 11 C'"" -+ R is called a matrix norm iff it satisfies

1. 11AII?Oforall AEC''"and 11A11=0iffA=O

2. llaAll =1allIAll for all aECandA EC

3. II A + B II II A II + II B II for all A, B E Cn". (Triangle inequality)We say we have a strong matrix norm or a Banach norm iff we have in

addition n = m and

4. IIABII <_ IIAII IIBIIforallA,B,CinC"""The terms multiplicative and submultiplicative are also used to de-

scribe (4).

As with vector norms, there are many examples of matrix norms.

Example 6.2n

1. II A II i = > > l ai j 1. This is a norm called the sum norm.i=1 j=1

M

2. II A II, = max Y_ lai j 1. This is a norm called the maximum column sumI<j<ni=j

norm.

3. IIAIIF = (n_ laijli=i j=i

Frobenius norm.

2 = tr(A*A)1/2. This is a norm called the

i/p

4. II A II v = l aijv This is a norm called the Minkowski p-norm

(i=1j=I /or Holder norm. It generalizes (1) and (3).

246 Norms

5. HA lloo = max{ jai, I }. This is a norm called the max norm.ij

6. 11A II,_, = max E jail This is a norm called the maximum row sum1:51!51n j=1

norm.

The reader is now invited to make up more norms motivated by the examplesabove.

We remark that unit balls and unit spheres are defined in the same way as forvector norms: B, =(A E C'"xn I IIAII -< l} and S, =(A E C"' III All=]).Also, most of the facts we derived for vector norms hold true for matrix norms.We collect a few below.

THEOREM 6.5 (basic facts about matrix norms)Let 11 11 be a norm on C"'x" and A, B E C'nxn

Then

1. 111 A 11 - IIB111:5 II A - B II< IIAII + IIBII for all A, B E C'nxn

2. The unit ball in Cm xn is a closed, bounded, convex subset of C" x" con-taining the zero matrix.

3. Every matrix norm is uniformly continuous.

4. If 11 11 and 11 11' are matrix norms, there exist constants C2 > C, > 0 suchthat C, IIAII < IIAII < C2 II A II for all A in C"' xn (norm equivalencetheorem).

5. Let S be invertible in Cmx"'. Then IlAlls = IISAS-' II defines anothermatrix norm on C"' In'.

PROOF The proofs are left as exercises. El

Now we have a way to discuss the convergence of matrices and the distancebetween matrices. We have a way to address the issue if two matrices are "close"or not. In other words, we could develop analysis in this context. However, wewill not pursue these ideas here but rather close our current discussion looking atconnections between matrix norms and vector norms. It turns out vector normsinduce norms on matrices, as the following theorem shows. These inducednorms are also called operator norms.


THEOREM 6.6Let II Ila and II Ilb be vector norms on C' and C", respectively. Then the function

II Ila.b on C ,,n defined by II A Ila,b = maxIIAXIIa = max IIAxIIa is a matrix

x16-e IIxIIb IlXIIh=I

norm on C"" Moreover, IIAxIIa < IIA Ila,bI xllb for all x in C" and all A inC""n. Finally, if n = m, this norm is multiplicative.

PROOF We sketch the proof. The fact that IIA IIa. b exists depends on afact from advanced calculus, which we shall omit. For each nonzero vector x,IIAxIIa > 0 so the max of all these numbers must also be greater than or equalIlxlIb

to zero. Moreover, if II A lla.b = 0, then Ax = ' for every vector x, whichwe know implies A = 0. Of course, if A = 0, it is obvious that IIA lla.b = 0

For the second property of norm, IlaAlla.b = max IIaAxlla = lal max IIAxIIa =x IlxlIb -0-d' IlxlIb

Ial IIAIIa.b Finally, for x nonzero, IIA + Blla,b = max II(A + B)xlla

x06 IlxlIbIIAx+Bxlla < max IIAXIIa + IIBxIIa < maxIIAXIIa + max IIBxIIamax _ =

x# ' IlxlIb x#T IlxlIb x0-d IlxlIb -0-6 IlxlIb

IlAlla.b+IlBlla,b.Moreover, foranynonzerovector x, IIAIlxXlIb IIa

<m ax

IIAXIIa

IlxlIb

IIAIIa.b So IIAXIIa _< 1IA Ila,bll x11b. Finally, IIABxIIa _< IIAIIa.b IIBxIIa <_

IIAila.b IIBIIa.b IIxIIb Thus, IIABIIa b = IIABxIIa < IlAlla.b IIBIla.b 0IIxIIb

DEFINITION 6.6 (induced norm)The matrix norm II Ila.b in the above theorem is called the norm induced by

the vector norms II Ila and II IIb If n = m and II Ila = II IIb , we simply say thematrix norm ll II is induced by the vector norm II Ila.

We remark that the geometric interpretation for an induced matrix norm 11 11 isthat II A II is the maximum length of a unit vector (i.e., a vector in S1) after it wastransformed by A. This is clear from the formula IIAII = max IIAxIIa Here

IlxII,=Ithe terms "length" and "unit" must be understood in terms of the underlyingvector norms. Next, we look at some examples of induced norms.

Example 6.3We will only consider the case where m = n and where the two vector normsare the same. We present the examples in Table 6.2.

248 Norms

TABLE 6.2.1:Vector Norm

el : 11X111 = > Ixilr=I

n 1l1/2

e2 : lIX112 = Ix;12/

Ilxll00 = max IxilI«<n

Induced Matrix Norm

II A II I.1 = max F, I aji l = IIAII,,,,Is/sni_I

IIAII2.2 = max IIAxII211x112=I

II A

n

= maxi > I aii I =11 A II,=

You might have thought that the e2 vector norm would induce the Frobeniusnorm, but that is not so. Generally, II A II F 0 II A 112,2 Next we investigate whatis so nice about induced norms.

THEOREM 6.7 (basic properties of induced norms)

1. Let II Ila.b be the matrix norm on C" "' induced by the vector norms II Ilaon cm and 1111 b on C". Let N be any other matrix norm on C' x" such that

IIXIIa N(A) Il Xll bfor all x in C", all A in Cmxn. Then IIA Ila.b N(A)for all A E C""".

2. Let II Ila , 11 Ilb , 11111 be vector norms on C", C', and GN respectively andlet II Ila,h be the matrix norm on (Cmxn induced by II IIa and 1111b , 11I1b,I

be the matrix norm on CPx"' induced by II IIb and I1II1. Finally let ll Ila,,be the matrix norm on C'' x" induced by II Ila and 11111. Then II AB II,,., <-

11 A IIa,b 11 B 11b., for all A E C'. xn B E Cl""". The particular case wherein = n = p and all vector norms are the same is particularly nice. Inthis case, the result reads, after dropping all the subscripts.

IIABII IIAII IIAII

3. Let 11 11 be a Banach norm on C" x". Then

(i) IIABII < IIAII II Bll forall A, B in C"x"(ii) 11111 = 1

(iii) IIA"II _< IIAII"forA E Cnxn

(iv) IIA-III >IIAII

for all invertible A in C"".

PROOF The proofs are left as exercises. U

Of course, we have only scratched the surface of the theory of norms. Theyplay an important role in matrix theory. One application is in talking about


the convergence of series of matrices. A very useful matrix associated to asquare matrix A is eA. We may wish to use our experience with Taylor series

00 I

in calculus to write eA = E -A". For this to make sense, we must be able-on!

to deal with the question of convergence, which entails a norm. In appliednumerical linear algebra, norms are important in analyzing convergence issuesand how well behaved a matrix or system of linear equations might be. There isa number called the condition number of a matrix, which is defined as c(A) =be if A is singular

and measures in some sense "how close" aA-' 1111 A 11 if A is nonsingular

nonsingular matrix is to being singular. The folks in numerical linear algebra areinterested in condition numbers. Suppose we are interested in solving the systemof linear equations Ax = b where A is square and invertible. Theoretically, thissystem has the unique solution x = A-'b. Suppose some computer reportsthe solution as z. Then the error vector is e = x - z. If we choose a vectornorm, we can measure the absolute error Ilell = IIx - ill and the relative error11ell

II II .

But this supposes we know x, making it pointless to talk about the errorx

anyway. Realistically, we do not know x and would like to know how goodan approximation i is. This leads us to consider something we can compute,namely the residual vector r = b - Az. We will consider this vector again laterand ask how to minimize its length. Anyway, we can now consider the relative

IIb - AnIIresidual

IIbII,which we can compute with some accuracy. Now choose

a matrix norm that is compatible with the vector norm we are already using.Thenr=b-Ax=Ax-Ax=Ae so IIrII = IlAell IIAII Ilellande=A-'rand so Hell IIA-' 11 110. Putting these together we get

11r1l

IIAII < hell IIA-' II IIrII

Similarly,

Il-lll `Ilxll < IIA ' II Ilbll.

From these it follows that

I IIrII < hell< IIAII IIA-' II

IIrII

IIAII IIA-' II Ilbll Ilxll Ilbll

or

I IIrII hell IIrII

c(A) Ilbll - IIxII- c(A)

Ilbll

Now we have the relative error trapped between numbers we can compute (ifwe make a good choice of norms). Notice that if the condition number of a

250 Norms

matrix is very close to 1, there is not much difference between the relative errorand the relative residual.

To learn more about norms and their uses, we point the reader to the referenceson page 252.

Exercise Set 26

1 . Let 11 11 be a matrix norm on C" x" such that 11 A B 11 < I I A I I1 1 1 1B1Show thatthere exists a vector norm N on 0" such that N(Ax) < IIA All N(x) for allA EC"'x"and all x in C.

2. Show IIAII2 < IIA11I,1 IIAII,,.. for all A E Cm '

3. Show that the matrix norm 1111 on Cnxn induced by the vector normII Ilp , I < p < oo satisfies

F

IIAll E(E I aij 1`')yi=1 j=1

for all A E C"".

4. Let 1111 P and II Ily be matrix norms on C"xn of Holder type with i + i = IShow that for p > 2, we have 11ABII,, < min(IIAII,,11Blly , II1IIy I1BII,,)

5. Prove IIA + B112 + IIA - BIIF = 2 IIAIIF + 2 IIBIIF .

8426. Find the I -norm and oo-norm of - 13 7

191

7. Does IIAII = max { Ia;j I } define a Banach norm for n-by-n matrices?1:50 <n

8. Argue directly that I I A II2 < IIAIIF II II2and IIABIIF IIAIIF IIBIIF

9. Prove that when A is nonsingular, min IlAxll =I

measures howIIXII=l IA-111

much A can shrink vectors on the unit sphere.

10. Argue that IIA112.2 = IIA*112.2and IIAII, = IIA*IIF. Also, IIA*AII2.2 =IIAIl2,2


11.

12.

Prove that if SS* = / and T*T = I , then IIS*AT112,2 = IIA112,2

Argue that I I A I I F = I I AT I

IF.

13. Prove that if A = AT, then 11Alloo = IIAl11.

14. Argue that 11Axll . < fn- IIA112 IIxII for all x in C".

15. Prove that IlAxII2 <- ,/ IIAlI,,, lIX112 for all x in C".

16. Argue that f IIAIl2 < IIAII. <-,/n IIAIl2

17. Suppose A is m-by-n and Q is m-by-m with Q-1 QT. Argue that

IIQAIIF = IIAIIF

18. Argue that there cannot exist a norm 1111 on C" such that IIAxFI < IIAIIF IIXIIfor all x in C".

19. Suppose U and V are unitary matrices, U* = U-' and V* = V-'. Forwhich norms that we have discussed does II U A V II = IIA II . For all A inf,nxn7

20. Suppose 11 11 is a Banach norm. Suppose 0 E = E2.

(a) Prove that 11 Ell > 1.(b) Prove that if A is singular, II! - All > 1.

I I >IIIII(c) Prove that if A is nonsingular, IIA'11 A11

(d) Prove that if 111 - A 11 < 1, then A is nonsingular.(e) Prove that if A is nonsingular, A + B nonsingular the IIA-'-

(A+B)-'II < IIA 'II II(A+B)-'II IIBII.

21. Argue that II ll is not a Banach norm but II II' where 11 A II' = m II A II. ison (Cntxm

22. Suppose S2 is a positive definite m-by-m matrix . Argue that the Maha-lanobis norm IlAlln = tr(A*QA) is in fact a norm on C,nxm

23. Prove that for an m-by-n matrix A, IIA 112 < II A II i < mn 11 A ll 2

24. Prove that for an m-by-n matrix A, IIA Iloo < II A II i < mn IIA Il .

25. Prove that for an m-by-n matrix A, II All... < II A Ill <_ mn II A II,.

26. Prove Holder's inequality for m-by-n matrices. Suppose I < p, q < 00

and - + 1 = 1. Then IIABII1 <- IlAllt, IIBIIq.p q

252 Norms

Further Reading

[B&L, 1988] G. R. Belitskii and 1. Yu. Lyuhich, Matrix Norms and TheirApplications, Oper. Theory Adv. Appl., Vol. 36, Birkh5user, (1988).

[H&J, 1985] Roger A. Horn and Charles R. Johnson, Matrix Analysis,Cambridge University Press, (1985).

[Ld, 1996] HelmutLutkepohl, Handbook of Matrices, John Wiley & Sons,New York, (1996).

[Noble, 1969] Ben Noble, Applied Linear Algebra, Prentice Hall, Engle-wood Cliffs, NJ, (1969).

6.2.1 MATLAB Moment

6.2.1.1 Norms

MATLAB has built in commands to compute certain vector norms. If v =(v1, v2, ... , vn) is a vector in Cn, recall that the p-norm is

1/n

I <p<oo

and

Ilvlloo = max {Ivjl}.

1<j<11

In MATLAB, the function is

norm(v, p).

Note that p = -inf is acceptable and norm(v,-int) returns min {Iv1I}.I<j<n

For example, let's generate a random vector in V.


v= I0*rand(5,1)+i* 10*rand(5, 1)

V =

6.1543 + 4.0571 i

7.9194 + 9.3547i

9.2181 + 9.16901

7.3821 + 4.10271

1.7627 + 8.9365i

We can make a list of various norms by

> > [norm(v, 1), norm(v,2), norm(v,3), norm(v,-int),norm(v,inf)]

ans =

50.1839 22.9761 17.9648 7.3713 13.0017

For fun, let's try the vector w = (i, i, i, i, i).

> > w=[iiiii]w=

Column I through Column 4

0 + 1.0000i 0 + 1.0000i 0 + 1.0000i 0 + 1.0000i

Column 5

0 + 1.0000i

[norm(w,1), norm(w,2), norm(w,3), norm(w,-int),norm(w,int)]

ans =5.0000 2.2361 1.7100 1.0000 1.0000

MATLAB can also compute certain matrix norms. Recall that the p-norm of amatrix A is

IIAIIt, = maxIIAvIIv,

where I < p < oo.IIVIIp

Also recall thatn

maxE laijI1 <i <n j=1

which is the maximum row sum norm. Of course, we cannot forget the Frobeniusnorm

II All F- = (1:1: 1aij12 l = tr(A*A) 1/2.

/

254 Norms

Let's look at an example with a randomly generated 4-by-5 complex matrix.The only matrix norms available in MATLAB are the p-norms where p is 1, 2,inf, or "fro"

>> A=10*rand(4,5)+i*10*rand(4,5)

A=

Column I through Column 4

9.5013 + 0.5789i 8.9130 + 1038891 8.2141 + 2.72191 9.2181 + 4.451(1

2.3114+3.5287i 7.6210 + 2.20277i 4.4470 + 1.9881 i 7.3821 + 9.3181 i

6.0684 + 8.1317i 4.5647 + 1.98721 6.1543 + 0.1527i 1.7627 + 4.65991

4.8598 + 0.09861 0.1850 + 6.0379i 7.9194 + 7.4679i 4.0571 + 4.18651

Column 5

9.3547 + 8.4622i

9.1690 + 5.2515i

4.1027 + 2.0265i

8.9365 + 6.72141

> > [norm(A,I), norm(A,2), norm(A,-inf),norm(A,int),norm(A,'fro')]

ans =

38.9386 35.0261 50.0435 50.0435 37.6361

The 2-norm of a very large matrix may be difficult to compute, so it can beestimated with the command

normest(A, tol),

where toll is a given error tolerance. The default tolerance is 10-6.Recall that the condition number of a nonsingular matrix A relative to some

p-norm is defined as

c,,(A) = IIAII, IIA-' 11P .

This number measures the sensitivity to small changes in A as they affectsolutions to Ax = b. A matrix is called "well conditioned" if c,,(A) is "small"and "ill conditioned" if cp(A) is "large" The MATLAB command is

cond(A, p),


where p can be 1, 2, inf, or `fro.' Rectangular matrices are only accepted forp = 2, which is the default. For example, for the matrix A above

> > cond(A, 2)

ans =8.4687

Again, for large matrices, computing the condition number can he difficultso MATLAB offers two ways to estimate the 1-norm of a square matrix. Thecommands are rcond and condest. To illustrate, we shall call on a well-knownill conditioned matrix called the Hilbert matrix. This is the square matrix whose

(i, j) entry is1

. + . + I . It is built into MATLAB with the call hilb(n). We will

use the rat format to view the entries as fractions and not decimals. We will alsocompute the determinant to show just how small it is making this matrix very"close" to being singular.

> > format rat

> > H = hilb(5)

H=11111

2 3 4 5

' 1 1 1 1

2 3 4 5 6

1 1 1 1 1

3 4 5 6 71 I I I I

4 5 6 7 8

I I I I I

5 6 7 8 9

Chapter 7

Inner Products

Hermitian inner product, parallelogram law, polarizationidentity, Appolonius identity, Pythagorean theorem

7.1 The Inner Product Space C"In this section, we extend the idea of dot product from W to (r". Let's go

back to R2, where we recall how the notion of dot product was motivated. Theidea was to try to get a hold of the angle between two nonzero vectors. We havethe notion of norm, Ilxii = ',/x - + x- , and thus of distance, d(x, y) = IIy - xii .Let's picture the situation:

(XI, Xz)

Figure 7.1: The angle between two vectors.

257

258 Inner Products

By the famous law of cosines, we have IIY - x112 = 11x112 + IIY112 - 2 Ilxll IIYIIcos 0. That is, (y, - x, )2 + (y2 - X2)2 = x2 + x2 + y; + y2 - 2 IN 11 11Y11 cos 0.

Using the good old "foil" method from high school, we get y; - 2x, yi +x; +y2 - 2x2y2 + x2 = x; + x2 + y; + y2 - 2 Ilxll IIYII cos 0. Now cancel and getx, y, + x2y2 = I I x 1 1 IIYII cos 0 or

X1 Y1 + x2)'2cos0 =

Ilxll IIYII

Now, the length of the vectors x and y does not affect 0, the angle betweenthem, so the numerator must be the key quantity determining the angle. Thisleads us to define the dot product of x and y as x y = x, y, + x2y2. This easilyextends to R". Suppose we copy this definition over to C". In R2, we have(x,, x2) (x,, x2) = xi +x2; if this dot product is zero, x, = x2 = 0, so (x,, x2)is the zero vector. However, in C2, (1, i) (1, i) = 12 + i 2 = 0; but we dotted anonzero vector with itself! Do you see the problem'? We have seen it before. AFrench mathematician by the name of Charles Hermite (24 December 1822-14 January 1901) came up with an idea on how to solve this problem. Usecomplex conjugates! In other words, he would define(I, i) (1, i) = I 1 T+ T =

1 I + i(-i) = I - i2 = 2. That feels a lot better. Now you can understandwhy the next definition is the way it is.

DEFINITION 7.1 (Hermitian inner product)LetxandybeinC",where x=(X,,X2,... ,x")andy=(y,,Y2,...y").Thenthe Hermitian inner product of x and y is (x I y) = x, y, +x2y2 + +x,, y" =

01

:x1)'1.1='

We are using the notation of our physicist friends that P. A. M. Dirac (8 Au-gust 1902-22 October 1984 ) pioneered. There is a notable difference however.We put the complex conjugation on the ys, whereas they put it on the xs. Thatis not an essential difference. We get the same theories but they are a kind of"mirror image" to each other. In matrix notation, we have

x,X2

(x I Y) = [7 172...Yn]

Xn

Note that we are viewing the vectors as columns instead of n-tuples. The nexttheorem collects some of the basic computational facts about inner products.The proofs are routine and thus left to the reader.

7.1 The Inner Product Space C"

THEOREM 7.1 (basic facts about inner product)Letx,y,zbeinC",a, (3 E C. Then

1. (x+ylz)=(xIZ)+(Ylz)

2. (ax I z) = a (x l z)

3. (xIY)=(YIx)

4. (xlx)0and (xIx)=0iffx

5. (xII3Y)=R(xIY)

259

Notice that for the physicists, scalars come out of the first slot decorated witha complex conjugate, whereas they come out of the second slot unscathed. Next,using the properties established in Theorem 7.1, we can derive additional com-putational rules. See if you can prove them without mentioning the componentsof the vectors involved.

THEOREM 7.2Let x, y, z be in C", a, 0 E C. Then

1.

2.

3.

4.

5.

6.

7.

(XIY+Z)=(xIY)+(xlz)

(-6 IY)=\xl-6 >=Oforanyx,yinC"

(x I Y) = O for all y in C" implies x =

(X I z) (y I z) for all z in C" implies x = y

7n

),"

F-ajxj IY)=

Haj(Xj IY)j=1 j=1

x I E13kyk/ _ >Rk(X I Yk)k=1 k=1

gym` p m P

`ajXj I EQky ') = EajQk (Xj I Yk)j=1 k=1 j=lk=1

8. (x-ylz)=(XIZ)-(ylz)9. (xly-Z)(XIY)-(Xlz).

260 Inner Products

Next, we note that this inner product is intimately connected to the h normon C". Namely, for x = (x1, x2, ... x I

x X1x,12 + Ixz12 + + Ix" 12 = 11X112. Thus 11X112 = (x I x). Note this makesperfect sense, since (x I x) is a positive real number.

We end with four facts that have a geometric flavor. We can characterize theperpendicularity of two vectors through the inner product. Namely, we definex orthogonal to y, in symbols x I y, iff < x I y >= 0.

THEOREM 7.3Let x, y, z be in C". Then

1. IIx + y112 + IIX - y112 = 211x112 + 2 IIY112 (parallelogram law)

2. (x I y) = I/4(Ilx+y112 - IIX-Y112 + i IIX+iyI12 - i IIX-iY1121(polarization identity)

3. IIZ -x112 + IIZ - YII'` = z IIX - y112 + 2 IIz -Z

(x + Y)112 (Appoloniusidentity)

4. If'x 1 y, then IIx+Y112 = IIXII2 + 11Y112 (Pythagorean theorem).

PROOF The norm here is of course the 12 norm. The proofs are computationaland left to the reader. 0

Exercise Set 27

1. Explain why (x I y) = y*x.

2. Establish the claims of Theorem 7.1, Theorem 7.2, and Theorem 7.3.

3. Argue that I< x I y >I < < x I x > < y I y > . This is the Cauchy-Schwarz inequality. Actually, you can do a little better. Prove that

I<xIY>1< Ix,yrl<<xlx><YIY>.Make upanexample

where both inequalities are strict.

4. Let x = (1, 1, 1, 1). Find as many independent vectors that you can suchthat <xIy>=0.

5. Argue that < Ax I y >=< x I A*y > for all x, y in C".

7. I The Inner Product Space C" 261

6. Prove that <xlay +(3z>=ax <xIy>+(3<xlz>for all x,y,zin C" and all a, R E C.

7. Argue that (x I -6 ) = 0 = (-6 l y) for all x, y in C".

8. Prove that if (x I y) = 0, then IIX ± YII2 = IIXI122 + 1ly112

9. Argue that if (Ax I y) = (x I By) for all x, y in C", then B = A*.

10. Prove that Ax = b is solvable iff b is orthogonal to every vector inNull(A*).

11. Let z = (Z I, Z2) E C2 and w = (w1, w2) E C2. Which of the followingdefines an inner product on C2?

(a) (z I w) = zIW2(h) (zlw)=ziw2-z2wi(c) (z l w) = zi wi + z2w2(d) (z I w) =2zIWi +i(z2wi -ziw2)+2z2w2.

12. Suppose llx + yll = 9, Ilx - YII = 7, and IIXII = 8. Can you determineIlyll?

13. Suppose (z I w) = w*Az defines an inner product on C". What can yousay about the matrix A? (Hint: What can you say about A* regarding thediagonal elements of A?)

14. Suppose f is a linear map from C" to C. Argue that that there is a uniquevector y such that f (x) = (x I y) .

15. Prove 4 (Ax I y) _ (A(x + y) I x + y) - (A(x - y) I x - y) + i (A(x+iy) I x + iy) - (A(x - iy) I x - iy) .

16. Prove that for all vectors x and y, l1x+yl12-i Ilix+y1l2 =11X112+IIy112-i(IIXII2 + IIYIIZ) + 2 (x I y)

Further Reading

[Rassias, 1997] T. M. Rassias, Inner Product Spaces and Applications,Chapman & Hall, Boca Raton, FL, (1997).

262 Inner Products

[Steele, 2004] J. Michael Steele, The Cuuchy-Schwarz Muster Class: AnIntroduction to the Art of Mathematical Inequalities, Cambridge Univer-sity Press, Cambridge, and the Mathematical Association of America,Providence, RI, (2004).

orthogonal set of vectors, M-perp, unit vector, normalized,orthonormal set, Fourier expansion, Fourier coefficients,Bessel's inequality, Gram-Schmidt process

7.2 Orthogonal Sets of Vectors in Cn

To ask that a set of'vectors be an independent set is to ask much. However, inapplied mathematics, we often need a stronger condition, namely an orthogonalset of vectors. Remember, we motivated the idea of dot product in an attempt toget a hold of the idea of the angle between two vectors. What we are saying is that

the most important angle is a right angle. Thus, if you believe cos 0 =x y

IIxII IIYIIin R" then, for 0 = 90°, we must have x y = 0. This leads us to the nextdefinition.

DEFINITION 7.2 (orthogonal vectors and 1)Let x, y E C". We say x and y are orthogonal and write x 1 y iff (x I y) = 0.That isy*x = 0.ForanysubsetM C C", define M1 ={x E C" I x 1 mforallm in M). M1 is read "M perp." A set of vectors {xj } in C" is called anorthogonal set iff < xj I xk > = 0 if j # k.

As usual, there are some easy consequences of the definitions and, as usual.we leave the proofs to the reader.

THEOREM 7.4Let x,yEC", aEC,M,NCC".Then

1. xIyiffy.1x

2. x 1 0 for all x in C"

3. xlyiffxlayforall ain C

4. M1 is always a subspace of C"

7.2 Orthogonal Sets of Vectors in C"

5. Ml = (span(M))1

6. McM"7. M111 = M1

8.

9. (_)1 = c"

10.

M C N implies Nl a Ml.

263

Sometimes, it is convenient to have an orthogonal set of vectors to be of unitlength. This part is usually easy to achieve. We introduce additional languagenext.

DEFINITION 7.3 (unit vector, orthonormal set)Let X E C". We call x a unit vector if (x I x) = 1. This is the same as sayingIIxII = 1. Note that any nonzero vector x in C" can be normalized to a unit

vector u = X.

A set of vectors {xi } in C" is called an orthonormal set iffIIxII

(xi I xk)0 if j 54 k. In other words, an orthonormal set is just a set ofI fj = =

pairwise orthogonal unit vectors. Note that any orthogonal set can be madeinto an orthonormal set just by normalizing each vector. As we said earlier,orthogonality is a strong demand on a set of vectors.

THEOREM 7.5Let D be an orthogonal set of nonzero vectors in C". Then D is an independentset.

PROOF This is left as a nice exercise.

The previous theorem puts a significant limit to the number of mutuallyorthogonal vectors in a set; you cannot have more than n in C". For example,(i, 2i, 2i), (2i, i, -2i), and (2i, -2i, i) form an orthogonal set in C3 and being,necessarily, independent, form a basis of C3. The easiest orthonormal basisfor C" is the standard basis T e i = ( 1 , 0, ... , 0), - e 2 = (0, 1, 0, ... , 0), ... ,-'e " = (0, 0.... , 0, 1). One reason orthonormal sets are so nice is that theyassociate a particularly nice set of scalars to any vector in their span.

If you have had some analysis, you may have heard of Fourier expansionsand Fourier coefficients. If you have not, do not worry about it.

264 Inner Products

THEOREM 7.6 (Fourier expansion)Let {u 1 , u2, ... , U,,,) be an orthonormal set of vectors. Suppose x = u I u I +a2uz + ... + an,um. Then aj = (x I uj) for all j = 1, 2, ... , m.

PROOF Again, this is a good exercise. a

DEFINITION 7.4 (Fourier coefficients)Let {ej } be an orthonormal set of vectors in C" and x a vector in C". The setof scalars { (x I ej) } is called the set of Fourier coefficients of x with respectto this orthonormal set.

In view of Theorem 7.3, if you have an orthonormal basis of C" and if youknow the Fourier coefficients of a vector, you can reconstruct the vector.

THEOREM 7.7 (Bessel's inequality)Let ( e 1 , e2, ... , e,,,) be an orthonormal set in C". If x is any vector in C", then

I.in

x-1: (xIek)ekk=1

2 "I

=11x112-EI (xIek)12k=I

2. (x_>1 (x I ek) ek /1 ej for each j = 1, ... , m

k=1

3. I(x l ek)IZ < IIx112.k=I

PROOF We will sketch the proof of (1) and leave the other two statements asexercises. We will use the computational rules of inner product quite intensively:

in

x - E(xIek)ekk=1

2 m

k=1

x-(xIek)ek jk=1 /

(xl=( xx)-ek)ek \\)- Iek)ekIk=1/

((xk= 1

m m

+ ((xIeL)eklE (xlek)ek\k=1 k=1

M m

=(xlx)-E(xIek)(xIek)-E(xIek)(ek Ix)k=1 k=1

7.2 Orthogonal Sets of Vectors in C" 265

i" m

+J:J: (xIek)(ek Iej)(xIej)k=1 j=1

=IIXII2-21: 1(XIek)I2+EI(XIek)12k=1 k=1

m

= IIXI12 - I (X I ek) 12 .

k=I a

The reader should be sure to understand each of the steps above.Again there is some nice geometry here. Bessel's inequality may be inter-

preted as saying that the sum of the squared magnitudes of the Fourier coeffi-cients (i.e., of the components of the vector in various perpendicular directions)never exceeds the square of the length of the vector itself.

COROLLARY 7.1Suppose {uI , u2, ... , u" } is an orthonormal basis of C". Then for any x in C",we have

I. IIXI12 = F_ I(X I uk)12 (Parseval's identity)k=1

2. x = j (x I uk) Uk (Fourier expansion)k=I

3. (xIY)=E (XIuk)(uk IY) forallx,yinC".k=I

Our approach all along has been constructive. Now we address the issue ofhow to generate orthogonal sets of vectors using the pseudoinverse. Begin withan arbitrary nonzero vector in C", call it x1. Form x*. Then we seek to solve thematrix equation x, x2 = 0. But we know all solutions of this equation are of the

form x2 = [I - xlxi ] v1, where vI is arbitrary. We do not want x2 to be 0, sowe must choose v, specifically to be outside span(xi). Let's suppose we havedone so. Then we have x I, x2, orthogonal vectors, both nonzero. We wish a thirdnonzero vector orthogonal to x, and x2. The form of x2 leads us to make a guessfor x3, namely x3 = [I - x1 xI - x2x? ] v2. Then x3 = V2* [/ - x,xt - x2X ] .

Evidently, x3x, and xiX2 are zero, so x, and x2 are orthogonal to x3. Again weneed to take our arbitrary vector v2 outside span {x,,x2} to insure we have not

taken x3 = '. The pattern is now clear to produce additional vectors that arepairwise orthogonal. This process must terminate since orthogonal vectors areindependent and you cannot have more than n independent vectors in C".

266 Inner Products

To get an orthonormal set of vectors, simply normalize the vectors obtainedby the process described above. Let's illustrate with an example.

Example 7.1i

In C3, take xi = i . Then xi =i

xi

X *J x,

-i

3

[-i/3 -i/3 -i/3]. Then ! - xixt = 13 -Ii]

[-i/3 -i/3 -i/3]i

1/3 1/3 1/3 2/3 -1/3 -1/3

= 13 - 1/3 1/3 1/3 = -1/3 2/3 -1/31/3 1/3 1/3 -1/3 -1/3 2/3

1

Evidently, v, = 0 is not in span

J), So0

2/3 -1/3 -1/3X2 = [ I 0 0 ] -1/3 2/3 -1/3 =

-1/3 -1/3 2/3

=You may verify x, I x2 if you are skeptical. Next, x2x22/3 [ 2/3 -1/3 -1 /3

2/3

-1/32/3

-1/3 [I -1/2 -1/2]-1/3 -1/32/3 -1/3 -1/3 0 0 0

-1/3 1/6 1/6 , so13-x,xi -x2x2 = 0 1/2 -1/2-1/3 1/6 1/6 0 -1/2 1/2

0 i2/3

Now v2 = 0 is not in span i , -1/3 (why not?), so1 -1/3

0 0X3 =

-1/2 . (Note, we could have just as well chosen v2 = 1 )

1/2 0i +2/3 0

Therefore i , -1/3 , 1/2 is an orthogonal set of vectors in

i -1/3 1/2

7.2 Orthogonal Sets of Vectors in C" 267

C3 and necessarily a basis for C3. Note, the process must stop at this point since

x,x+1 + x2xz + x3x3

1/3 1/3 1/ 3 2/3 -1/3 - 1/3 0 0 01/3 1/3 1/ 3 + -1/3 1/6 1/6 + 0 1/2 -1/21/3 1/3 1/ 3 -1/3 1/6 1/6 0 -1/2 1/2

1 0 0

0 1 0 = I3.

0 0 1

We note a matrix connection here. Form the matrix Q using our-i -i -i

orthogonal vectors as columns. Then Q* Q = 2/3 -1/3 -1/30 -1/2 1/2

i 2/3 0 3 0 0

i -1/3 -1/2 = 0 2/3 0 is a diagonal matrix with posi-i -1/3 1/2 0 0 1/2

Live entries on the diagonal. What happens when you compute QQ*?Finally, let's create an orthonormal basis by normalizing the vectors obtained

i/above: i / 3- , - 3 z , -1 /. . Now form U =i 3f Ilk

i/ 2/f 0

-1 /. and compute U*U and UU*.

Next, we tackle a closely related problem. Suppose we are given yi, Y29Y3, . , independent vectors in C". We wish to generate an orthonormal se-

}quence q, , q2, q3 ... so that span { q, , q2, , qj } = span {Y,, y2, ... , Y j

for all j. Begin by setting q, = YiYi. Being independent vectors, none of

Y

the ys are zero. Clearly span(q,) = span(y1) and q1 q, = Y, Yi = 1,y- Y, YfYi

so q, is a unit vector.Following the reasoning above, set qz = vi [I - q,qi ] = v*, [I - q,q*1].

A good choice of vi is * Y2 + 2. For one thing, we know Y2(y2 [I - gtgt ] Y2)

span(y1), so any multiple of Y2 will also not be in there. Clearlyy2* [I - giq*,]=q2q, = 0, so we get our orthogonality. Moreover, g2g2

(yz [I - q'qi ] Y2)T

[I - q. q, ] Y2 _ Yz [I - q i q, ] Y2 = 1. Now the pattern should be clear;(Y2* [I - gigi ] Y2)2 Yi [I - gigi] Y2

268 Inner Products

choose q3 = Yi [I - gigi - q2q? yNote y, span {y1, yz). General)

(yi [I - q qi - q:qi ] Y3)2rr

k-1

Yk I I - Egigi* L 1_1then gk= ,.

k-1

(y[k

I- j=-1 gigj Yk)

This is known as the Gram-Schmidt orthogonalization process, named afterErhardt Schmidt (13 January 1876-6 December 1959), a German mathemati-cian who described the process in 1907, and Jorgen Peterson Gram (27 June1850-29 April 1916), a Danish actuary. It produces orthogonal vectors startingwith a list of independent vectors, without disturbing the spans along the way.

Exercise Set 28

1. Let x = 1l . Find a basis for {x}1.

-I2. Let x1, x,, ..., x, be any vectors in C". Form the r-by-r matrix G by

G = [(xj I xk )] for j, k = 1, 2, ..., r. This is called the Gram matrix afterJ. P. Gram, mentioned above. Argue that G is Hermitian. The determinantof G is called the Grammian. Prove that x1 , x2, ..., xr form an independentset if the Grammian is positive and a dependent set if the Grammian iszero.

3. For complex matrices A and B, argue that AB* = 0 iffCol(A) 1 Col(B).

4. Argue that C" = M ® M1 for any subspace M of C". In particular,prove that dim(M1) = n - dim(M). Also, argue that M = M11 for anysubspace M of C.

5. Prove that every orthonormal subset of vectors in C" can be extended toan orthonormal basis of C".

6. Apply the Gram-Schmidt processes to {(i, 0, -1, 2i), (2i, 2i, 0, 2i),(i, -2i, 0, 0)} and extend the orthonormal set you get to a basis of C4.

7. Prove the claims of Theorem 7.3.

7.3 QR Factorization 269

8. Suppose M and N are subspaces of C". Prove that (M nN)1 = M1+N'and (M + N)1 = M1 n N.

7.2.1 MATLAB Moment

7.2.1.1 The Gram-Schmidt Process

The following M-file produces the usual Gram-Schmidt process on a set oflinearly independent vectors. These vectors should be input as the columns ofa matrix.

I function GS = grmsch(A)2 [m n] = size(A);3 Q(:, 1)=A(:, 1)/norm(A(:, I ));4 fori=2:n5 Q(:,i) = A( i) Q(:,1:i I)*Q(:,I:i-1)'*A(:,i);6 Q(:,i) = Q(:,i)/norm(Q(:,i));7 end

8 Q

You can see that this is the usual algorithm taught in elementary linear algebra.There is a better algorithm for numerical reasons called the modified Gram-Schmidt process. Look this up and make an M-file for it. In the meantime, tryout the code above on some nice matrices, such as

I 1 1

_ 1 1 0A

1 0 0

1 0 0

QR factorization, unitary matrix, Kung's algorithm,orthogonal full rank factorization

7.3 QR FactorizationThe orthogonalization procedures described in the previous section have

some interesting consequences for matrix theory. First, if A E C"'x", thenA* E cnxm, where A* = AT. If we look at A*A, which is n-by-n, and partition

270 Inner Products

A by columns, we see

A*A =

aTiT2-T

a,,

a2 a" I = [a,Tal] = [(ai I ae)]

In other words, the matrix entries of A*A are just the inner products of thecolumns of A. Similarly, the matrix entries of AA* are just the inner productsof the rows of A. It should now be easy for the reader to prove the followingtheorem.

THEOREM 7.8Let A E C"' 1". Then

I. The columns of A are orthogonal iff A*A is a diagonal matrix.

2. A* is a left inverse of A (i.e., A*A = 1) iff the columns of A are ortho-normal.

3. The rows of A are orthogonal iff AA* is a diagonal matrix.

4. A* is a right inverse of A (i.e., AA* = 1) iff the rows of A are ortho-normal.

From this, we get the following.

COROLLARY 7.2Let U E C"x" Then the following are equivalent.

1. The columns of U are orthonormal.

2. U*U = 1,,.

3. U*=U-1.

4. UU* = I.

5. The rows of U are orthonormal.

This leads us to a definition.

DEFINITION 7.5 (unitary matrix)The square matrix U E C"111 is called unitary if'U satisfies any one (and henceall) of the conditions in Corollary 7.2 above.


Unitary matrices have many useful features that we will pursue later. But firstwe want to have another look at the Gram-Schmidt procedure. It leads to a usefulfactorization of matrices. Suppose, for the purpose of illustration, we have threeindependent vectors a,, a2, a3 that can be considered the columns of a matrix A

1 1

(i.e., A = [a, 1212 1 a3]). Say, to be even more concrete, A1 0=1 0 01 0 0

We begin the process by taking q, =a,

=a,

Then q, is a unit vector,'Vra Ila, 11

* a, a,qTq,

=asa, aa = I. But the new wrinkle here is we can solve back for

V l i

a*a = aa,a , . Namely, a, = a,a,q, = Ila, II qi But note, a, q, _ ,

a ,a ,

Ila, 11, so a, = (a*gl)glIn our numerical example, q, _ and=

2

a, = 2q, sincea,q, I 1 1 1 ] = 2 = a, a, = Ila, 11. Note

2a,q, is real and positive. By the way, we can now write A = [2q, 1212 1 a3].

Next, we solve back for az. Let C2 = a2 - (a2q,)q, = a2 - (az 1 qi) qiC2 C2

and q2 =cz IIc2Il

Since cz = a2 - (a29i)9, = a2 - (a29i)9,

we see c*2 2q = aZgl - (a*gi)q,gi. But q, q, = I so c*qi = 0 = (q, I cz)making cz I. q,. Therefore, q2 1 q1, since qz is just a scalar multiple of cz.

In our numerical example, c2 = 1 - 1 = 21 so gz =0 --0

2

i

z, I . Now we do the new thing and solve for az. We see a2 = cz +

(azgi)gi = c2c92 + (aigi)gi = Ilczll qz + (a2 I qi) qi and so a2g2 =c2c2g2g2 + (a2q, )q,qz = c2cz = Ilcz II . Again, a2g2 is real and positive,

and a2 = (a2q, )q, + (a2g2)g2 = (a2 19i) qi + Ilcz 11 qz In our numericalexample, az = Lq, + L q2 so A = [2qi I qi + qz I a31. Finally we take c3 =a3-(aigi)gi -(a2g2)g2 = 1313 -(a3 I qi) qi -(a3 19z) 92. Wecomputec3q, =

272 /cuter Products

(a3-(a*ql)q*-(a*g)g2)q1 =a*sqi-a*qi-' 'andc;q2 =a3gz-aigz = and conclude c31gl , q2. Take q3 =

C3 = c3and get q3 1

11C3 11

11 1

i 1 2

0 _ -2q1, q2 In our example, c3 = 0 - 1 /2 T -1/2 0

0 i 0,f2-

2fso q3 - -O2 Solving for a3, we see a3 = C3 + (a*gl)g1 + (aigz)gz =

0

c3c3g3 + (a3*g1)g1 + (a*g2)g2 = 11C3 11 q3 + (a3 I qi) qi + (a3 I q2) q2 and

a;q3 = cjc3 = 11c311. So finally, a3 = (a'391)91 + (a392)g2 + (a3g3)g3 =(a3 I q1) 91 + (a3 I q2) q2 + 11C3 11 q3. In our example

a3 =

1 1 1 f00 =(z) +(2) 2 +(Z ) 02

Z ZL0 -0A = [2q1 I q1 +qz I Zq1+Zq2+ z q3] _ [q1 I q2 I q31

so

1 1 f2 2 2 1

1 1 2_ 2 2 0 1 = QR, whe re Q* Q = 1, since t he

1 0

00 0

2

columns of Q are orthonormal and R is upper triangular with positive realentries on the main diagonal. This is the famous QRfactorization. Notice thatthe columns of A were independent for this to work. We could generalize theprocedure a little bit by agreeing to allow dependencies. If we run into one,agree to replace that column with a column of zeros in Q. For example,

2 6 1 40 i 0 f 2

11 3 1 4 12 2 2 0 0 0 0 0

1 3 1 4 0 0 I 0 2_A 2 2 0 0 1 4

1 3 0 0 0 2

1 3 0 0 0 20

20 0

0 0 0 0 0

20

20 0

0 0 0 0 ,f2-

2

We lose a bit since now Q* Q is diagonal with zeros and ones but R is stillupper triangular square with nonnegative real entries on the main diagonal. Wesummarize our findings in a theorem.


THEOREM 7.9 (QRfactorization)Let A E C'' ", with n < m. Then there is a matrix Q E C"' "" with orthonor-mal columns and an upper triangular matrix R in C", such that A = QR.Moreover, if n = ni, Q is square and unitary. Even more, if A is square andnonsingular R may be selected so as to have positive real numbers on thediagonal. In this case, the factorization is unique.

PROOF Suppose A has full column rank n. Then the columns form an inde-pendent set in C"'. Apply the Gram-Schmidt procedure as previously illustratedand write A = QR, with Q* Q = I and R upper triangular with positive real en-tries on the main diagonal. If the columns of A are dependent, we proceed withthe generalization indicated immediately above. Let A = [a1 I a2 I ... I a"j. If

a1 = 6, take q, = "; otherwise takeat al

q1 =atal TV- al II

Next compute C2 = a2 - (a2 gt )ql = a2 - (a2 191) 91 If c2 = ', whichhappens if a2 depends on a1, set q2 = 1. If c2 0 take q2 =

c2=

c C2

C2 k-1Generally, for k = 2, 3, ... , n, compute Ck = ak - E (akgi)gi =

IIC2II j=1

ak - E (ak I qj ) qi. If ck = '(iff ak depends on the previous a's), setj=1

qk = '. Else, take qk = Ck = CkThis constructs a list of vec-

Ce C, II Ck 11

tors q1, q2, ... , q, that is an orthogonal set consisting of unit vectors and thezero vector wherever a dependency was detected. Now, by construction, eachqj is a linear combination of a,, ... , aj, but also, each aj is a linear combina-tion of q I, ... , qj. Thus, there exist scalars ak% f o r j = 1 , 2, ... , n such that

iai = jakigk.

k=1

Indeed akj = (akqj) = (ak I qj). To fill out the matrix R we take ak1 = 0 ifk > j. W e also take a;j = 0 for all j = 1, 2, ... , n for each i where q1 Inthis way, we have A = Q R where the columns of Q are orthogonal to each otherand R is upper triangular. But we promised more! Namely that Q is supposedto have orthonormal columns. All right, suppose some zero columns occurred.Take the columns that are not zero and extend to a basis of C'. Say we getqi, q2, ... , q' as additional orthonormal vectors. Replace each zero columnin turn by q1, q2, ... , q'. Now we have a new Q matrix with all orthonormal

columns. Moreover, QR is still A since each new q' matches a zero row of R.This gives the promised factorization.

If A is nonsingular, m = n = rank(A), so Q had no zero columns. Moreover,Q*A = R and Q necessarily unitary says R cannot have any zero entries on its

274 Inner Products

main diagonal since R is necessarily invertible. Since the columns of Q form anorthonormal basis for C"', the upper triangular part of 'R is uniquely determinedand the process puts lengths of nonzero vectors on the main diagonal of R. Theuniqueness proof is left as an exercise. This will complete the proof. Q

One final version of QR says that if we have a matrix in C"'"" of rank r, wecan always permute the columns of A to find a basis of the column space. Thenwe can apply QR to those r columns.

COROLLARY 7.3Suppose A E C;"". Then there exists a permutation matrix P such thatA P = [QR I M], where Q* Q = I,, R is r-by-r upper triangular with positiveelements on its main diagonal.

We illustrate with our matrix from above.

I I 1 3 4

AP= I 1 0 3 4

1 0 0 3 0

1 0 0 3 0

f 2 1 6 4z z 0 0

0 1 0 4

Ti 2

z2

0 00 0 2 0 0

z1

0 0 00 0

2

0 0 01 _I

20 0 0

0 0 0 0 0

So, what good is this QR factorization? Well, our friends in applied math-ematics really like it. So do the folks in statistics. For example, suppose youhave a system of linear equations Ax = b. You form the so-called "normalequations" A*Ax = A*b. Suppose you are lucky enough to get A = QR. ThenA*Ax = R*Q*Qx = A*b so R*Rx = R*Q*b. But R* is invertible, so we arereduced to solving Rx = Q*b. But this is an upper triangular system that canbe solved by back substitution.

There is an interesting algorithm due to S. H. Kung [2002] that gives a wayof finding the QR factorization by simply using elementary row and columnoperations. We demonstrate this next.

7.3.1 Kung's Algorithm

Suppose A III. Then A* E C" and A*A E C""". We see A*A issquare and Hermitian and its diagonal elements are real nonnegative numbers.

7.3 QR Factorization

and E =

Suppose A has n linearly independent columns. Then A*A is invertible. Thus wecan use ' "2 1) pairs of transvections (T,j (c)*, T;j (c))to "clean out" the off diago-nal elements and obtain (AE)*(AE) = E*A*AE = diag(di, d2,..., D,a diagonal matrix with strictly positive elements. This says the columns ofAE are orthogonal. Let C = diag[ d , ..., Then Q = AEC-' has or-thonormal columns. Also, E is upper triangular, so E-' is also as is R = CE-'.Finally note, A = QR.

i 0 3

Let's work through an example. Suppose A =2 0 00 2 i 0

Then-4i 0 0

121 0 -3i 21 0 0

A*A = 0 5 0 I and Ti 21 )A*ATi3(3) = 0 5 021

3i 0 9 0 0 60

21 0 0

Thus, C =

AEC-' _

0 0

0 0 Vz1 0 2 zi

4_1 0 0550 0

L a, 0 _ 2

21 105

275

21

0 1 0 Now Q

0 0 1

21 0 -iVandR_CE` = 0 f 0

0 0 2

The reader may verify that A = QR.We finish this section with an important application to full rank factorizations.

THEOREM 7.10Every matrix A E C111111 has a full rank factorization A = FG, where F* = F+.

PROOF Indeed take any full rank factorization of A, say A = FG. Then thecolumns of F form a basis for Col(A). Write F = QR. Then A = (QR)G =Q(RG) is also a full rank factorization of A. Take F1 = Q, G, = RG. ThenFi* F, = Q* Q = 1, but then Fi = (Fl* F1) ' Fi = Fl*. O

DEFINITION 7.6 (orthogonal full rank factorizations)If A = FG is a full rank factorization and F* = F+, call the factorization anorthogonal full rank factorization.

276 !niter Products

Exercise Set 291 1 0

1. Find an orthogonal full rank factorization for1 0 1

1 1 0

1 0 1

Further Reading

[Kung, 2002] Sidney H. Kung, Obtaining the QR Decomposition byPairs of Row and Column Operations, The College Mathematics Journal,Vol. 33, No. 4, September, (2002), 320-321.

7.3.2 MATLAB Moment

7.3.2.1 The QR Factorization

If A is an m-by-n complex matrix, then A can be written A = QR, where Qis unitary and R is upper triangular the same size as A. Sometimes people usea permutation matrix P to permute columns of A so that the magnitudes of thediagonal of R appear in decreasing order. Then AP = QR.

MATLAB offers four versions of the QR-factorization, full size or economysize, with or without column permutations. The command for the full size is

[Q, R] = qr(A).

If a portion of R is all zeros, part of Q is not necessary so the economy QR isobtained as

[Q, R] = qr(A, 0).

To get the diagonal of R lined up in decreasing magnitudes, we use

[Q. R, P] = qr(A).



>> A=pascal(4)A=

I l 1 I

1 2 3 4

I 3 6 10

1 4 10 20>> [Q,R]=9r(A)Q=

-0.5000-0.5000-0.5000-0.5000

R=

0.67080.2236

-0.2236-0.6708

0.5000 0.2236-0.5000 -0.6708-0.5000 0.67080.5000 -0.2236

-2.000 -5.000 -10.000 -17.5000000

>> format rat

-2.236100

-6.7082 -14.08721.000 3.5000

0 -0.2236

>>[Q,R]=qr(A)Q=

-1/2-1/2 6-1/2 --1/2 -

R=

646/96346/2889646/2889646/963

1/2 646/2889-1/2 -646/963-1/2 646/963

1 /2 -646/2889

-2 -5 -10 -35/20 -2889/1292 -2207/329 -4522/321

0 0 1 7/2

0 0 0 -646/2889To continue the example,>>[Q,R,Pl=9r(A)Q=

-202 /4593 -1414/2967 1125/1339 -583/2286-263 /1495 -1190/1781 -577/3291 1787/2548

-26 3/598 -217/502 -501/1085 -621/974-26 3/299 699/1871 975/4354 320/1673

R=-4693/202 -1837/351 -1273/827 -1837/153

0 -2192/1357 -846/703 -1656/12370 0 357/836 -281/12950 0 0 -301/4721

278 Inner Products

P-0 0 1 0

0 1 0 0

0 0 0 1

1 0 0 0

Note that as a freebee, we get an orthonormal basis for the column space ofA, namely, the columns of Q.

7.4 A Fundamental Theorem of Linear AlgebraWe have seen already how to associate subspaces to a matrix. The rank-plus-

nullity theorem gives us an important formula relating the dimension of thenull space and the dimension of the column space of a matrix. In this section,we develop further the connections between the fundamental subspaces of amatrix. We continue the picture of a matrix transforming vectors to vectors.More specifically, let A be an m-by-n matrix of rank r. Then A transformsvectors from Cn to vectors in (C' by the act of multiplication. A similar viewapplies to A, A*, AT, and At We then have the following visual representationof what is going on.

Cn

Amxn, A

FA-,AT,A'=AT

Figure 7.2: The fundamental subspaces of a matrix.

(Cm

The question now is, how do the other subspaces fit into this picture'?To answer this question we will use the inner product and orthogonality ideas.

Recall (x I y) = y*x = >x; y; and M1 denote the set of all vectors that

7.4 A Fundamental Theorem of Linear Algebra 279

are orthogonal to each and every vector of M. Recall that M = M11 andbf ® M1 = C" for all subspaces M of C". Also, if N CM, theniy1 C N1.

We will make repeated use of a formula that is remarkably simple to prove.

THEOREM 7.11Let A be in Cmxn X E C", Y E Cm. Then

(Ax I y) = (x I A*y).

PROOF We compute (x I A*y) = (A*y)*x = (y*A**)x

= (y*A)x = y*(Ax) = (Ax I y).

Next, we develop another simple fact - one with important consequences.First, we need some notation. We know what it means for a matrix to multiply asingle vector. We can extend this idea to a matrix, multiplying a whole collectionof vectors.

DEFINITION 7.7 Let A E C"" and let M be a subset of C"LetA(M)=(AxI xEM}. Naturally, A(M)CC'

This is not such a wild idea since A(C") = Col(A) and A(Null(A)) _

{-6} . Now for that simple but useful fact.

THEOREM 7.12LetAEC'"x",M CC",and N CC"'.Then

A(M) c N iU A*(N1) C M1.

PROOF We prove the implication from left to right. Suppose A(M) _g N.We need to show A*(N1) c M1. Take a vector y in A*(N1). We must arguethat y E M1. That is, we must showy 1 m for all vectors m in M. Fix a vectorm E M. We will compute (y I m) and hope to get 0. But y = A*x for somex E N1 so (y I m) = (A*x I m) = (x I Am). But X E N1, and Am E N, sothis inner product is zero, as we hoped. To prove the converse, just apply theresult just proved to A*. Conclude that A**(M11) c N11 (i.e., A(M) C N).This completes the proof. 0

Now we get to the really interesting results.

280

THEOREM 7.13Let A E C"". Then

1. Null(A) = Col(A*)1

2. Null(A)1 = Col(A*)

3. Null(A*) = Co!(A)1

4. Null(A*)1 = Col(A).

Inner Products

PROOF It is pretty clear that if we can prove any one of the four state-ments above by replacing A by A* and taking "perps," we get all the others.We will focus on (1). Clearly A*(Null(A*)) c (0 ). So, by 7.12, A((_6)1) c_A(ull(A*)1. But (-iT)J = C"; so this says A(C") C lVull(A*)1, However,A(C") = Col(A), as we have already noted, so we conclude Col(A) _cNull(A*)1. We would like the other inclusion as well. It is a triviality thatA(C") C A( C"). But look at this the right way: A(C") S; A(C") = Col(A).Then again, appealing to 7.12, A*(Col(A)1) C Ci1 = (-(I). Thus, Col(A)1 C_Null(A*), which gets us NUII(A*)1 C Col(A). 0

COROLLARY 7.4Let A E C' x 1. Then

I. Null(A) = Col(AT)1 = Row(A)1

2. Col(A) = Null(AT)1

3. Null(A)1 = Row(A)

4. Null(AT) = Col(A)1.

We now summarize with a theorem and a picture.

THEOREM 7.14 (fundamental theorem of linear algebra)Let A E C; X" Then

1. dim(Col(A)) = r.

2. dim(Null(A)) = n - r.

3. dim(Co!(A*)) = r.

4. dim(Null(A*)) = in - r.

7.4 A Fundamental Theorem of Linear Algebra

Cn

Amxn, A

4A+, AT, A'=AT

Figure 7.3: Fundamental theorem of linear algebra.

Cm

281

5. Mull(A) is the orthogonal complement ofCol(A*).

6. NUll(A*) is the orthogonal complement ofCol(A).

7 iVull(A) is the orthogonal complement of Row(A).

There is a connection to the problem of solving systems of linear equations.

COROLLARY 7.5Consider the system of linear equation Ax = b. Then the following are equiv-alent:

1. Ax = b has a solution.

2. b E Col(A) (i.e., b is a linear combination of the columns of A).

3. A*y = -6 implies b*y = 0.

4. b is orthogonal to every vector that is orthogonal to all the columnsof A.

Exercise Set 30

1. If A E C"1x", B E C^xP, and C E Cnxt', then A*AB = A*AC if andonly if AB = AC.

2. Prove thatA(ull(A*A) = Null(A).

282 Inner Products

3. Prove that Col(A*A) = Col(A*).


5. Fill in the details of the proof of Theorem 7.14.

Further Reading

[Strang, 1988] Gilbert Strang, Linear Algebra and its Applications, 3rdEdition, Harcourt Brace Jovanovich Publishers, San Diego, (1988).

[Strang, 1993] Gilbert Strang, The Fundamental Theorem of LinearAlgebra, The American Mathematical Monthly, Vol. 100, No. 9, (1993),848-855.

[Strang, 20031 Gilbert Strang, Introduction to LinearAlgebra, 3rd Edition,Wellesley-Cambridge Press, Wellesley, MA, (2003).

7.5 Minimum Norm Solutions

We have seen that a consistent system of linear equations Ax = b can havemany solutions; indeed, there can be infinitely many solutions and they forman affine subspace.

Now we are in a position to ask, among all of these solutions, is there ashortest one'? That is, is there a solution of minimum norm? The first questionis, which norm? For this section, we choose our familiar Euclidean norm, II x 112 =tr(x`x)=.

DEFINITION 7.8 (minimum norm)We say xo is a minimum norm solution of Ax = b if xo is a solution and11xoII _< IlxiI for all solutions x of Ax = b.

Recall that 1-inverses are the "equation solvers." We have established theconsistency condition: Ax = b is consistent iff AGb = b for some G E A{ 1).In this case, all solutions can he described by x = Gb + (I - GA)z, where z is

7.5 Minimum Norm Solutions 283

Figure 7.4: Vector solution of various length (norm).

arbitrary in C. Of course, we could use A+ for G. The first thing we establishis that there is, in fact, a minimum norm solution to any consistent system oflinear equations and it is unique.

THEOREM 7.1 SSuppose Ax = b is a consistent system of linear equations (i.e., b E Col(A)).Then there exists a unique solution of Ax = b of minimum norm. In fact, it liesin Col(A*).

PROOF For existence, choose xo = A+b. Take any solution x of Ax =b. Then x = A+b + (1 - A+A)z for some z. Thus, IIx112 = (x I x) =(A+b + (I - A+A)z I A+b + (I - A+A)z) = (A+b I A+b) + (A+b I (I-A+A)z) + ((I - A+A)z I A+b) + ((I - A+A)z I (/ - A+A)z) IIA+bII2 +((1 - A+A)A+b I z)+(z I (/ - A+A)A+b) +II(/ - A+A)zII2 = IIA+bII2+0+0+ 11(1 - A+A)zll2 > IIA+bII2, with equality holding iff II(/ - A+A)zjI =

0 iff (I - A+A)z = iff x = A+b. Thus, A+b is the unique minimum normsolution. Since A+ = A*(AA*)+, we have A+b E Col(A*). 0

Once again, we see the prominence of the Moore-Penrose inverse. It turnsout that the minimum norm issue is actually intimately connected with 11,41-inverses. Recall, G E A{ 1, 4} iff AGA = A and (GA)* = GA.

284 Inner Products

THEOREM 7.16LetGEA{1,4}.Then

1. GA=A+A

2. (1 - GA)* = (I - GA) = (I - GA)2

3. A(I - GA)* =

4. (I -GA)A* =O.

PROOF Compute that GA = GAA+A = (GA)*(A+A)* = A*G*A*A+* _(AGA)*A+* = A*A+* = (A+A)* = A+A. The other claims are now easy andleft to the reader. 0

So, { 1,4}-inverses G have the property that no matter which one you choose,GA is always the same, namely A+A. In fact, more can be said.

COROLLARY 7.6G E A{ I, 4} iff GA = A+ A. In particular, if G E A{ I, 4}, then Gb = A+bfor any b E Col(A).

Thus { 1,4 }-inverses are characterized by giving the minimum norm solutions.

THEOREM 7.17Suppose Ax = b is consistent and G E A 11, 4). Then Gb is the unique solutionof minimum norm. Conversely, suppose H E C"" and, whenever Ax = b isconsistent, A Hb = b and II Hb II < llz ll for all solutions z other than Hb; thenHEA( I,4).

PROOF The details are left to the reader. 0

Exercise Set 31

1. Suppose G E A(l, 4). Argue that A{1, 4) _ {H I HA = GA).

2. Argue that A{1, 4} = {H I HA = A' A).

3. Suppose G E A{ 1, 4). Argue that A(1, 4) _ {G + W(I - AG) I Warbitrary l.

7.6 Least Squares

4. Suppose G E A{1, 3} and HE A{1,4}. Argue that HAG = A+.

5. Let u and v be in Col(A*) with Au = Av. Prove that u = v.

285

7.6 Least SquaresFinally, the time has come to face up to a reality raised in the very first chapter.

A system of linear equations Ax = b may not have any solutions at all. Therealities of life sometime require us to come up with a "solution" even in thiscase. Again, we face a minimization problem. Once again, we use the Euclideannorm in this section. Suppose Ax = b is inconsistent. It seems reasonable toseek out a vector in the column space of A that is closest to b. In other words,if b ¢ Col(A), the vector r(x) = Ax - b, which we call the residual vector, isnever zero. We shall try to minimize the length of this vector in the Euclideannorm. Statisticians do this all the time under the name "least squares."

DEFINITION 7.9 (least squares solutions)A vector x0 is called a least squares solution of the system of linear equations

Ax = b i./f IlAxo - bII IIAx - bII for all vectors x.

Remarkably, the connection here is with { 1, 3)-inverses.

THEOREM 7.18Let AEC > andGEA{1,3}.Then

1. AG=AA+

2. (I - AG)* = I - AG = (I - AG)2

3. (I - AG)*A =

4. A'(I - AG) = 0.

PROOF For (1), we compute AG = AA+AG = AA+*(AG)* = A+* x

A*G*A* = A+*(AGA)* = A+*A* = (AA+)* = AA+. The other claims are

now clear. 0

COROLLARY 7.7GEA{1,3}iffAG=AA+

286 Inner Products

THEOREM 7.19Suppose G E A( 1, 3). Then x0 = Gb is a least squares solution of the linearsystem Ax = b.

PROOF Suppose G E A (1, 3). We use the old add-and-subtract trick.IAx - bl12 = IIAx - AGb - b + AGb112 = IIA(x - A+b)) + (AA+b - b)II2= (A(x - A+b)) + (AA+b - b) I A(x - A+b)) + (AA+b - b)) = (A(x-A+b)) I A(x - A+b)))+ (A(x - A+b) I (AA+b - b)) + ((A A+b - b) I A(x-

A+b))) + ((AA+b - b) I (AA+b - b)) = IIA(x - A+b))II2 + II(AA+b-b)112 > II(AA+b - b)II2 = II(AGb - b)112, and equality holds iff

IIA(x - A+b))II2 = 0 iff x - A+b E Null(A). 0

THEOREM 7.20Let G be any element of A{ 1, 3}. Then xi is a least squares solution of the linearsystem Ax =biff 1lAx, -bll = llb - AGbll

PROOF Suppose IIAx, - bll = llb - Axoll, where xo = Gb. By our theo-rem above, Ilb - Axoll < llb - AxII for all x, so IIAxi - bll = Ilb - Axolllb - Ax II for all x, making x, a least squares solution. Conversely, suppose x, is

a least squares solution of Ax = b. Then, by definition, II Ax, - bil < ll Ax - bllfor all choices of x. Choose x = Gb. Then Il Ax, - bil < IIAGb - bll. But Gbis a least squares solution, so IIAGb - bll < IIAx - bll for all x, so if we takex, for x, IIAGb - bll < IlAx, - bll. Hence, equality must hold. 0

THEOREM 7.21Let G be any element of A{ I , 3). Then, xo is a least squares solution of Ax = b

ff Axo = AGb = AA+b.

PROOF Note AG(AGb) = (AGA)(AGb) = AGb, so the system on theright is consistent. Moreover, IIAxo - bll = II AGb - bll, so xo is a least squaressolution of the left-hand system. Conversely, suppose xo is a least squaressolution. Then IIAxo - bll = Ilb - AGbII. But IIAxo - b112 = IIAxo - AGb+AGb-b112 = IIAxo - AGbll2+llb - AGb112, which says IIAxo - AGbll = 0.Thus, Axo = AGb. 0

As we have noted, a linear system may have many least squares solutions.However, we can describe them all.

7.6 Least Squares 287

THEOREM 7.22Let G be any element of A{ I , 3). Then all least squares solutions of Ax = bare of the form Gb + (I - G A)z for z arbitrary.

PROOF Let y = Gb + (1 - GA)z and compute that A(I - GA)y = -6so Ay = AGb. Hence, y is a least squares solution. Conversely, suppose x isa least squares solution. Then Ax = AGb so 6 = -GA(x - Gb) whencex = x-GA(x-Gb) = Gb+x-Gb-GA(x-Gb) = Gb+(1-GA)(x-Gb).Take z = x - Gb to get the desired form. 0

It is nice that we can characterize when a least squares solution is unique.This often happens in statistical examples.

THEOREM 7.23Suppose A E C'""". The system of linear equations Ax = b has a unique leastsquares solution iff rank(A) = n.

PROOF Evidently, the least squares solution is unique iff I - GA = 0if GA = 1. This says A has a left inverse, which we know to be true ifrank(A) = n (i.e., A has full column rank). 0

Finally, we can put two ideas together. There can be many least squaressolutions to an inconsistent system of linear equations. We may ask, among allof these, is there one of minimum norm? The answer is very nice indeed.

THEOREM 7.24 (Penrose)Among the least squares solutions of Ax = b, A+b is the one of minimum norm.Moreover, if G has the the property that Gb is the minimum norm least squaressolution for all b, then G = A+.

PROOF The proof is left to the reader. 0

Isn't the Moore-Penrose inverse great? When Ax = b has a solution, A+b isthe solution of minimum norm. When Ax = b does not have a solution, A+bgives the least squares solution of minimum norm. You cannot lose computingA+b!

We end this section with an example. Consider the linear system

I xt+x2= 1xi-x2=0.

3x2= 4

288 Inner Products

In matrix notation, this system is expressed as

1 I r l I

Ax -1[

XI1 = 0 =b.

0 1 X2 3

A has rank 2, so a full rank factorization trivially is A = FGEvidently,

r1 -1

["J so A+ = (F*F)-IF*

02 0

3 JI [

1 -1 0 ]0 1 [

II 01 1 0

0[ - I ].WecomPuteAA+b= I -1 [I

t _I

=3 3 3 0 1 3 3

44 I 2 1 II I

b 0 = # b = 0 Thus, the system is3

6 6 4 12 4I 1

inconsistent. The best approximate solution is xo = A+b = . Note[12 J

that one can compute a measure of error: JlAxa - bil = IIAA+b - bll =II

012 12 12 TU T2-

12 4

Note that the error vector is given by E = [I - A+A]b.We have chosen this example so we can draw a picture to try to gain some

intuition as to what is going on here.

x2 = 3/4

1/2

Figure 7.5: A least squares solution.

xl

The best approximate solution is (1, 7

7.6 Least Squares 289

Exercise Set 321 0 2 -l

1 -21. Consider the system

22

2 0X2 =

2Verify that

1 2 -2 X3I

this system is inconsistent. Find all least squares solutions. (Hint: Therewill be infinitely many.)

2. We have previously seen that G = A+ + (! - A+A)W + V(1 - AA+)is a 1-inverse of A for any choice of V and W of appropriate size. If youdid not complete exercise 9 on page 205, do it now. Argue that choosingV = 0 makes G a 3-inverse as well. If instead we choose W = ®, arguethat we get a 4-inverse as well.

3. Argue that if G E AA*{ 1, 2), then A*G is a (1, 3)-inverse of A.

4. Argue that if H E A*A(1, 2), then HA* is a 11, 4)-inverse of A.

5. LetG E A{1, 3, 4). Argue that A{1, 3,4) = (G+(!-GA)W(1-AG) IW arbitrary of appropriate size}.

6. LetH E A( 1, 2,4). Argue that A( 1, 2,4) _ {H+HV(I -AH) I Varbitrary of appropriate size}.

7. LetKEA(l,2,3).Argue that A{1,2,3)=(K+(I-KA)WK I Warbitrary of appropriate size).

8. Argue that xp is the minimum norm least squares solution of Ax = b ifa) IIAxo - b11 < 11 Ax - b11 and b)IIxoII < llxll for any x 0 xo.

9. Suppose A = FG is a full rank factorization of A. Then Ax = b ifFGx = b iff a) Fy = b and b) Gx = y. Argue that y = F+b =(F*F)-i F*b and IIFy - bll is minimal. Also argue that Gx = F+b isalways consistent and has minimum norm.

10. Argue that xo is a least squares solution of Ax = b iff xo is a solutionto the always consistent (prove this) system A*Ax = A*b. These latterare often called the normal equations. Prove this latter is equivalent toAx - b E Null(A*).

11. Suppose A = FG is a full rank factorization. Then the normal equationsare equivalent to F*Ax = F*b.

290 Inner Products

Further Reading

[Albert, 19721 A. Albert, Regression and the Moore-Penrose Pseudoin-verse, Academic Press, New York, NY, (1972).

[Bjorck, 1967] A. Bjorck, Solving Linear Least Squares Problemsby Gram-Schmidt Orthogonalization, Nordisk Tidskr. Informations-Behandling, Vol. 7, (1967), 1-21.

Chapter 8

Projections

idempotent, self-adjoint, projection, the approximation problem

8.1 Orthogonal ProjectionsWe begin with some geometric motivation. Suppose we have two nonzero

vectors x and y and we hold a flashlight directly over the tip of y. We want todetermine the shadow y casts on x. The first thing we note is

,-.

z

Figure 8.1: Orthogonal projection.

the shadow vector is proportional to x, so must be of the form ax for somescalar a. If we can discover the scalar a, we have the shadow vector, moreformally the orthogonal projection of y onto x. The word "orthogonal" comesfrom the fact that the light was held directly over the tip of y, so the vector zforms a right angle with x. Note, ax + z = y, so z = y - ax and z 1 x. Thus,

0=<ZIx>=<y-axJx>=<yix> -a <xIx>soa= <yIx>.This is<xIx>great! It gives a formula to compute the shadow vector.

DEFINITION 8.1 Let x, y be vectors in C" with x # -6. We define the

orthogonal projection of y onto x by yIx > X.< xIx >

291

292 Projections

x y xFirst, we note that the formula can he written as P,,(y) = kx = x#yxx xx(xx+)y. Here comes the Moore-Penrose (MP) inverse again! This suggests thatPx can be viewed as the matrix xx+, and orthogonal projection of y onto x canbe achieved by the appropriate matrix multiplication. The next thing we note isthat Px is unchanged if we multiply x by a nonzero scalar. Thus Px depends onthe "line" (i.e., one dimensional subspace) span(x) and not just x.

LEMMA 8.1For any nonzero scalar 0, Ppx(Y) = P,,(Y)

PROOF< l x> < Ix> <Ylx>Ppx(Y) = <PxIPx> x

=<xlx> x = Px(Y) 0

So, from now on, we write Pp(,,) instead of P, indicating the connectionof P,y,(x) with the one dimensional subspace span(x). Next, to solidify thisconnection with sp(x), we show the vectors in sp(x) are exactly those vectorsleft fixed by

LEMMA 8.2{Y1 Pcp(x)(Y) = Y} = sp(x)

PROOF If y = P,y,(x)(y), then y = ` Ylx > x, a multiple of x so y E sp(x).< xIx >

Conversely, if y E sp(x), then y = (3x for some scalar 0, so P,y,(x)(y) _

Psn(x)(Rx) _` (3xJx >x

=R < xIx > x

= (3x = Y. 0< xIx > < xIx >

Next, we establish the geometrically obvious fact that if we project twice, wedo not get anything new the second time.

LEMMA 8.3For all vectors y, P,y,(x)(P,y,(x)(Y)) = P,p(x)(Y)-

CPROOF Pvt,(x)(P,t)(x)(Y)) = P,./,(.)< Yix >

x< xIx ><x<ylx>z> < xIx > < yix >

x = x = P1,(x)(Y)<xix> <xIx>

/ < lx> xlx1\ <xIx> /- =<xix> x

0

8. 1 Orthogonal Projections 293

Taking the matrix point of view, P,p(x) = xx+, we have (P,("))2 = xx+xx+= xx+ = P,p(x). This says the matrix Pp(,,) is idempotent. Let's make itofficial.

DEFINITION 8.2 A matrix Pin C' x" is called idempotent iff P2 = P.

Next, we note an important relationship with the inner product.

LEMMA 8.4For ally, z in C", < P,,,( )(z)ly > = < zlP+p(x)(Y) >.

PROOF We compute both sides:

< Pp(x)(Z)lY >=< zlx >

xiy =< zlx > < xly >, but also,

<XIX> > <xlx><zIP,p(x)(Y)>=<zl<Ylx>x>=<zlx> <Ylx> _ <zlx>

< xlx > < xlx > < xlx >< xly > . 0

Okay let's make another definition.

DEFINITION 8.3 (self-adjoin[)A matrix P is self adjoint iff < Pxly >=< xl Py > for all x, y.

In view of our fundamental formula < Axly >=< xlA*y > for all x, y, wesee self-adjoint is the same as Hermitian symmetric (i.e., P = P*). This pro-perty of Pp(x) is obvious because of the MP-equation: (P,.p(x))* = (xx+)* _

Next, we establish the "linearity" properties of Pp(x).

LEMMA 8.5

1. P,,,(x)(Xy) = XPp(x)(y) for X any scalar and y any vector.

2. P,p(x)(Yi + Y2) = P11p(x)(Yi) + P,p(x)(Y2) for any vectors yi and Y2.

PROOFBy now, it should be reasonable to leave these computations to the reader. 0

Next, we note a connection to the orthogonal complement of sp(x), sp(x)l.

294 Projections

LEMMA 8.6

1. y - P,y,(x)(y) 1 P,,(,,)(y) jor all vectors y.

2. Y - P,p(x) (Y) E sp(x)' for all vectors y.

PROOF Again we leave the details to the reader.

Finally, we end with a remarkable geometric property of orthogonal projec-tions. They solve a minimization problem without using any calculus. Given avector y not in sp(x), can we find a vector in sp(x) that has the least distanceto x?

That is, we want to minimize the distance d(y, z) = llY - zil as we let zvary through sp(x). But lit - z112 = 11(Y - P,y,(X)(Y)) + (P 1,(%)(Y) - z)ll2 =

,,lly -Pt tx,(y)ll` + ll P,t,(x)(y) -zll2

by the Pythagorean theorem. Note(PP,,(x)(Y) - y) 1 (y - PNtx, (z)) since y - P,p(x)(z) E sp(x)1. Now concludeII P,p(x)(y) - yll < lly - zll. So, among all vectors z in sp(x), P,N(x)(y) is closestto y. We have proved the following.

LEMMA 8.7d (Pcp(%)(Y), y) d (z, y) for all z E sp(x).

We can generalize this idea.

DEFINITION 8.4 (the approximation problem)Let M be a subspace of C" and let x be a vector in C. Then by the approx-

imation problem for x and M we mean the problem of finding a vector mo inM such that IIx - moll < IIx - mll for all m E M.

We note that if x is in M. then clearly x itself solves the approximationproblem for x and M. The next theorem gives some significant insight into theapproximation problem.

THEOREM 8.1

1. A vector mo E M solves the approximation problem for x and M if andonly if (x - mo) E M1.

Moreover,

2. If m1 and m2 both solve the approximation problem for x and M, thenm1 = m2. That is, if the approximation problem has a solution, thesolution is unique.

8.1 Orthogonal Projections 295

PROOF

1. First, suppose that mo c M solves the approximation problem for x andM. We show z = x - mo is orthogonal to every m in M. Without loss ofgenerality, we may assume Ilml = 1. For any K E C, we have

IIz-km112=<z-kmlz-Km>=<zlz> - <zlm> i-K <mlz>+k<mlm>),=IIz112-I <zlm> 12+I <zlm> 12-<zlm> IIzIlZ- 1<zim>12+(<zIm> -K)(<z I m> -k)= IIZ112-<zlm> 12 + I <zlm> _K12.

We may now choose K to be what we wish. We wish k = < zlm >.Then, our computation above reduces to IIz - ]tm112 = IIZ112 - I <zlm> 12. Nowz-km=(x-mo)-hm=x-(mo+Km).Since moand m lie in M, so does mo + km, and so, by assumption, Ilx - mollIIx - (mo + km)ll. We can translate back to z and get llzll < Ilz - kmll ,so IlzllZ < Ilx - km112 Therefore, we can conclude IIZI12 < IIZI12 - 1 <zlm > 12. By cancellation, we get 0 < -1 < zlm > 12. But the only waythis can happen is that I < zlm > I = 0, hence < zim >= 0. Thus,z 1 m (i.e., (x - mo) 1 m), as we claim.

Conversely, suppose mo is a vector in M such that x - mo E Ml.We claim mo solves the approximation problem for x and M. By thePythagorean theorem, II(x - mo) + m112 = Ilx - moll2 + IIm112IIx - mo112 for any m E M. This says IIx - moll' IIx - mo + m112Let n E M and take m = mo - n which, since M is a subspace, stillbelongs to M. Then IIx - moll < II(x - mo) + (mo - n)II = IIx - n1l.Since n was arbitrary from M, we see mo solves the approximationproblem for x and M.

2. Now let's argue the uniqueness by taking two solutions of the approxima-tion problem and showing they must have been equal all along. Supposem1 and m2 both solve the approximation problem for x and M. Then bothx - mi E Ml and x - m2 E M1. Let m3 = m1 - m2. We hope of coursethat m3 = '. For this, we measure the length of m3. Then 11m3112 =< m31m3 >=< m31(x-m2)-x+mi >= < m31(x-m2)-(x-MI) >=< m3l(x-m2) > - < m31(x-mi) >= 0-0 = 0. Great! Now conclude11m311 = 0 so m3 = -6; hence, mi = m2 and we are done. a

In particular, this theorem says Pp(x)(y) is the unique vector in sp(x) ofminimum distance from y. We remark that the approximation problem is of greatinterest to statisticians. It may not be recognized in the form we stated it above,but be patient. We will get there. Also, you may have forgotten the problem setout in Chapter I of finding "best" approximate solutions to systems of linear

296 Projections

equations that have no solutions. We have not and we have been working towardthis problem since we introduced the notion of inner product. Finally, note thatthe approximation problem is always solvable in C". That is, let x E C" and M heany subspace. Now, we showed awhile back that C" = M®M1, sox is uniquelyexpressible x = m + n, where m E M, n E M. But then, x - m = n E Ml,so by our theorem, m solves the approximation problem for x and M.

We have seen how to project onto a line. The question naturally arises, canwe project onto a plane (i.e., a two-dimensional subspace). So now take atwo-dimensional subspace M of C". Take any orthonormal basis (ul, U2) forM. Guided by the matrix formula above, we guess PM = [u11u2][ullu2]+.From the MP equations, we have p2 = PM = PM. For any matrix A, recallFix(A) = (xIAx = x). Clearly, 0 is in Fix(A) for any matrix A. We claimM = Fix(PM) = Col(PM). Let Y E Fix(PM). Then y = PMy E Col(P,1t).Conversely, if y E Col(PM), then y = PM(x) for some x. But then PM(y) =PMPM(x) = PM(x) = y. This establishes Fix(PM) = Col(PM). To get thatthese are M, we need a clever trick. For any matrix A, A+ = A*(AA*)+.Then PM = [111 112][[111 1U21+ = [U11112]([ui 1u21`([ul1u2][u} U ]')+ = [u11u21

([] ([1111112] [° ])+) _ [ullu2]([ ] (ului +u2u)+).However, (ului+

U2U*2 )2 (11111*1 +11211,*)(UtU*I +U2U*2) = (U1U*1U1U*1 +UII U*U2U2*+U2U2*U1U1 +

U2LJ U2u,* = 111111 + u2u2*. Also, (ulu* + u2u',)* = utui + u2u2*. Therefore,

(utuT +u2uZ)+ = 111111 +u2uZ. Thus, P M = ([1111112] [! ]) (111111 +u2uZ) _

(UI u7 + 11211;)2 = uI u + u2u;. This rather nice formula implies PM(uI) = uIand PM(u2) = u2, so PMx = x for all x in M since PM fixes a basis. Thisevidently says M C_ Fix(PM). But, if x E Fix(PM), then x = PM(x) =(111111 + u2uZ)x = uIUix + U2UZx = (111x)111 + (u2*x)u2 E sp{u1,U2) = M.This completes the argument that M = Fix(PM) = Col(PM). Finally, weshow that PM(x) solves the approximation problem for x and M. It sufficesto show x - PM(x) E M1. So take m E M and compute the inner product< x-PM(x)Im >_ < xIm > - < PM(X)IM >_ < xIm > - < xI PM(m) >=< xIm > - < xIm >= 0. Note that we used PM = PM and M = Fix(PM) inthis calculation.

Now it should be clear to the reader how to proceed to projecting on a three-dimensional space. We state a general theorem.

THEOREM 8.2Let M be an m-dimensional subspace of C. Then there exists a matrix P5,

such that

1. PM=PM=PM

2. M = Fix(PM) = Col(PM)

3. For any x E C", PM(x) solves the approximation problem for x and M.

8. 1 Orthogonal Projections 297

Indeed, select an orthonormal basis {u1, u2, ... , u,n} for M and form thematrix PM = [Ul IU2I ... IUn,][UI IU2I ... I U,n]+ = UIUI +U2U*2 + +UnnUM* .

Then for any x E C", PM(x) = < xlul > ul+ < xlu2 > u2 + ... +

< xlu > Urn.

PROOF The details are an extension of our discussion above and are left to the

reader. LI

Actually a bit more can be said here. If Q2 = Q = Q* and Fix(Q) =Col(Q) = M, then Q = PM. In particular, this says it does not matter whichorthonormal basis of M you choose to construct PM. For simplicity say PM =

[ullu2][ullu2l+. Then QPM = Q[ullu2][ullu2]+ = [QuIIQu2][ullu2]+ =[Ul lu2][uI lu2]+ = PM. Thus, QPM = PM since Q leaves vectors in M fixed.Also PMQ = [Ul Iu2][ul lu21+Q = [ului +u2u2*]Q = [utui +u2u][gl Iq2] _[ulu*gl + u2u*2gllUluig2 + u2u*g2] = [< gllul > UI+ < q, 11112 > U21 <g2lul > UI+ < g21u2 > u2] = [gllg2] = Q. Thus PMQ = Q. Therefore,Q = Q* = (PMQ)* = Q*P,y = QPM = PM. Now it makes sense to call PMthe orthogonal projection onto the subspace M.

Exercise Set 33I

1. Let v = 2 . Compute the projection onto sp(v) and sp(v)1.-2

2. Compute the projection onto the span of { (1, 1, 1), (- 1, 0, 1)).

3. Prove that if UU* _ /, then U*U is a projection.

4. Let PM.N denote the projector of C" onto M along N. Argue that(PM,N)* = PN1,M' .

5. Argue that P is a projection iff P = P P*.

6. Suppose C" = M ® N. Argue that N = Ml if PM.N = (PM,N)*.

7. Argue that for all x in C", 11 PMx11 < IIxi1, with equality holding iff x E M.

8. (Greville) Argue that G E A(2) if G = (EAF)+ for some projectionsE and F.

9. (Penrose) Argue that E is idempotent iff E = (FG)+ for some projectionsF and G.

298 Projections

10. (Greville) Argue that the projector PM.N is expressible with projectionsthrough the use of the MP-inverse, namely, PM.N = (PNI PM)+ = ((IPN)PM)+.

1 1. A typical computation of the projection PM onto the subspace M is toobtain an orthonormal basis for M. Here is a neat way to avoid that:Take any basis at all of M, say (b1, b2, ... , b,,, ), and form the matrixF = [b, I b2 I I b",]. Argue that PM = FF+ = F(F*F)-I F*.

12. Suppose E is a projection. Argue that G E E{2, 3, 4} iff G is a projectionand Col(G) C Col(E).

13. Let M and N be subspaces of C". Prove that the following statementsare all equivalent:

(i) PM - PN is invertible(ii) C"=M®N(iii) there is a projector Q with Col(Q) = M and Xull(Q) = N.

In fact, when one of the previous holds, prove that (PM - PN)-' _Q+Q*-I.

14. Suppose P = P2. Argue that P = P* iffArull(P) = (Fix(P))-.

Cos2(0) Sin(x)Cos(0)15. Argue that Q = is a projection of the

sin(x)Cos(0) sin2(0)plane R2. What is the geometry behind this projection?

Further Reading

[Banerjee, 20041 Sudipto Banerjee, Revisiting Spherical Trigonometrywith Orthogonal Projections, The College Mathematics Journal, Vol. 35,No. 5, November, (2004), 375-381.

[Gross, 1999] Jdrgen Gross, On Oblique Projection, Rank Additivity andthe Moore-Penrose Inverse of the Sum of Two Matrices, Linear and Mul-tilinear Algebra, Vol. 46, (1999), 265-275.

8.2 The Geometry of Subspaces and the Algebra of Projections 299

8.2 The Geometry of Subspaces and the Algebraof Projections

In the previous section, we showed that, starting with a subspace M of C",we can construct a matrix PM such that P,y = PM = P,, and M = Col(PM) =Fix(PM). Now we want to go farther and establish a one-to-one correspondencebetween subspaces of C" and idempotent self-adjoint matrices. Then we canask how the relationships between subspaces is reflected in the algebra of thesespecial matrices. We begin with a definition.

DEFINITION 8.S (orthogonal projection matrix)A matrix P is called an orthogonal projection matrix (or just projection,

for short) if it is self-adjoint and idempotent (i.e., P2 = P = P'). Let P(C"x")denote the collection of all n-by-n projections.

We use the notation Lat(C") to denote the collection of all subspacesof C".

THEOREM 8.3There is a one-to-one correspondence between ?(C",") and Lat(C") given

as follows: to P in P(C"x"), assign 9(P) = Fix(P) = Col(P) and to M inLat((C"), assign Js(M) = PM.

PROOF It suffices to prove tpp(P) = P and p i(M) = M. We note thati is well defined by the discussion at the end of the previous section. First,*p(P) = 4)(Col(P)) = Pc1,r(p) = PM, where M = Col(P). But then, P =PM since PM is the only projection with M = Fix(PM) = Col(PM). Next,0(ti[) = <p(PM) = Col(PM) = M. 0

Now this correspondence between subspaces and projections opens up awhole world of questions. For example, how can you tell if one subspace iscontained in another one? How does PM.. relate to PM? How do PM+N andPMnN relate to PM and PN?

THEOREM 8.4For any subspace M of C", PMl = I - PM.

300 Projections

PROOF Co!(I - PM) = Fix(I - PM) = {xI(I - PM)x = x} = {xIPMx61 = Null(PM) = Col(PM)1 = Col(PM)1 = M1. Note I - PM is a pro-

jection since(I-P5.r)2=I-P",-Pir,+P,;=I- P,,, and (I-P",)*=/*-Pm=I -P,,,.Thus, PMi =PcoIU-P.)=/-PM.

As a shorthand, write P1- = I - P when P E 1P(C""11).

THEOREM 8.5M C N if PM = PN PM iff PM = PM PN

PROOF SupposeM C N.ThenPMX E M C N = Fix(PN),soPN(PMx)=PMx. Since this is true for all x in C", we get PNPM = PM. Conversely, ifPNPM = PM, then Co!(PM) C Col(PN). That is, M C N. The second 'iff'follows by taking *. 0

DEFINITION 8.6 (partial order)For P, Q in 1P(Cl ' ") define P < Q iff P = P Q.

This gives us a way of saying when one projection is "smaller" than anotherone.

THEOREM 8.6Let P, Q, and R be projections.

1. For all P, P < P.

2. IfP<QandQ<P,thenP=Q.

3. IfP<QandQ<R,thenP<R.

4. P < Q iffCol(P) C Col(Q).

5. 0 and I are projections and ®< P < I for all P in 1P(C"")

6. P<QiffP=QP.

7. If P E P(C"'), then I - P E P(0"').

8. PQEJP(C'") iff PQ = QP.

9. P+QE1P(C0")iffPQ=QP=®.

10. P < Q iff Q1 < P1.


PROOF The proofs are routine and left to the reader. 0

THEOREM 8.7Let M and N be subspaces of C".

I. PM+N is the projection uniquely determined by (i) PM < PM+N andPN < PM+N and (ii) if PM < Q and PN < Q, then PM+N < Q, whereQ is a projection.

2. PMnN is the projection uniquely determined by (i) PMnN < PM andPMnN < PN and if Q < PM and Q < PN, then Q _S PMnN.

PROOF

I . First M, N CM+N, so PM, PN < PM+N. Now suppose Q is a projectionwith PM < Q and PN < Q. Then, M C Col(Q) and N C Col(Q), soM + N C Col(Q), making PM+N < Q. For the uniqueness,suppose H is a projection satisfying the same two properties PM+N does.That is, PM, PN < H and if PM _< Q and PN < Q, then H < Q.Then, with H playing the role of Q, we get PM+N < H. But with PM+Nplaying the role of Q, we get H < PM+N. Therefore, H = PM+N.

2. The proof is analogous to the one above.a

Our next goal is to derive formulas for PM+N and PMnN for any subspaces Mand N of C. First, we must prepare the way. The MP-equations are intimatelyconnected with projections. We will explore this more fully in the next sectionbut, for now, recall (MPI) AA+A = A and (MP3) (AA+)* = AA+. Thefirst equation says AA+AA+ = AA+. In other words MPI and MP3 implyP = AA+ is a projection. But what does P project onto? We need Fix(P) =Fix(AA+) = {xIAA+x = x}. But if y E Col(A), then y = Ax for some x,so AA+y = AA+Ax = Ax = y. Thus we see Col(A) c_ Fix(AA+). On theother hand, if y E Fix(AA+), then A(A+y) = y so y E Col(A). We concludeFix(AA+) = Col(A), so that AA+ is the projection onto the column space ofA. Let's record our findings.

LEMMA 8.8For any matrix A E C"I'll, AA+ E C"'> "' is the projection onto the column

space of A.

Next, we need the following Lemma.

302 Projections

LEMMA 8.9For any matrix A, Col(A) = Col(AA*)

PROOF If X E Co!(AA*) then x = AA*y for some y, so x = A(A*y) isevidently in Col(A). Conversely, if x E Col(A), then x = Ay for some y. Nowx = AA+x = A(A*(AA*)+)x.Inotherwords, AA*z = x, wherez = (AA*)+x.This puts x in Col(AA*). 0

Another fact we will need is in the following Lemma.

LEMMA 8.10

Let A E C"' x", B E C"'" and M = [A: B], the augmented matrix in Cl""'+A).Then Col(M) = Col(A) + Col(B).


Now we come to a nice result that is proved in a bit more generality than weneed.

THEOREM 8.8Let AEC"""' and B E C". Then

1. Col(AA* + BB*) = Col(A) + Col(B)

2. Null(AA* + BB*) = Arull(AA*) n Null(BB*).

PROOF

1. Let M = [A:B] be the m-by-(n +k) augmented matrix. Then Col(M) =Col(A) + Col(B) by (2.10). But also, Col(M) = Col(MM*) by (8.9).

A*

However, MM* = [A:B] .. = AA* + BB*. Hence, Col(AA* +B*

BB*) = Col(MM*) = Col(M) = Col(A) + Col(B).

2. Let X E Null(AA*) nNull(BB*). Then AA*x = -6 and BB*xso (AA* + BB*)x = 6, putting x in NUl1(AA* + BB*). Now let x ENUIl(AA*+BB*). Then [AA*+BB*](x) = 6 so AA*x+BB*x = -.


Thus x*AA*x + x*BB*x = '. That is, IIA*x112 + IIB*x112 = 0 soIIA*xll = 0 = IIB*xll. We conclude x E NUll(A*) = Null(AA*) andx E NUll(B*) = NUll(BB*).Therefore,x E NUll(AA*)nArull(BB*).

0

Now we apply these results to projections. Note that if P and Q are projec-tions, then P + Q = PP* + QQ*. Also note that while P + Q is self-adjoint,it need not be a projection.

COROLLARY 8.1Let P and Q be in P(C' "). Then

1. Col(P + Q) = Col(P) + Col(Q)

2. Null(P + Q) = Null(P) nArull(Q).

Let M = Col(A) for some matrix A. We have noted above that AA+ is theprojection onto M. Thus, if M and N are any subspaces, [ PM + PN ] [ PM + PN ]+ isthe projection on Col (PM + PN) = Col(PM)+Col(PN) = M +N using (8.8)(1).This is part of the next theorem. Recall our shorthand notation P1 = I - P fora projection P.

THEOREM 8.9Let M and N be subspaces of C". Let P = Pit and Q = PN. Then all the

following expressions are equal to the orthogonal projection onto M + N.

1. PM+N

2. [P + Q][P + Q]+

3. [P + Q]+[P + Q]

4. Q + [(PQ1)+(PQ1)]

5. P + [(QP1)+(QP1)]

6. P+P1[P1Q]+

7. Q + Q1[Q'P]+

PROOF There is clearly lots of symmetry in these formulas, since M + N =N + M. That (2) equals (3) follows from the fact that P + Q is self-adjoint. Takethe * of (2) and you get (3), and vice-versa. That (1) and (2) are equal follows

304 Projections

from the discussion just ahead of the theorem. We will give an order theo-retic argument that (1) equals (4). The equality of (1) and (5) will then followby symmetry. Let H = Q + (PQ1)+(PQ') First note (PQ1)+(PQ1)Q =0, so that H is in fact a projection. Also, H Q = Q + 0 = Q so Q <H. Next, PH = P(Q + (PQ')+(PQ')] = PQ + P(PQ')+(PQ1)PQ + P[I - (1 - (PQ J)+(PQI)] = PQ + P - P(1 - (PQ1)+(PQ1))P+[PQ-P(I -(PQ1)+(PQ'))]. But (D = (PQ1)[I -(PQ1)+(PQ1)]P(1 - Q)[I - (PQ1)+(PQl)] _ [P - PQ][I - (PQ1)+(PQ1)] = P(I(PQ1)+(PQ1)) - PQ, since Q < I - (PQ1)+(PQ1). Thus, PQ = P(I(PQ1)+(PQ')) and, consequently, PH = PQ + P - PQ = P. Thus,P < H. Now let K be any projection with K > P and K > Q. Then KH =K[Q+(PQ -)+(PQ1)] = KQ+K(PQ1)+(PQ')) = Q+K(PQ1)+(PQ1).But Q < K so QK1 = 0 whence PQ1K' = PK1 = 0, so K1 =[I - (PQ1)+(PQl)]K1. This says K1 < I - (PQ1)+(PQ1) or, equiva-lently, (PQ')+(PQ1) < K. Thus, K(PQ1)+(PQ1) = (PQ1)+(PQ1)

andso KH = H putting H < K. Therefore, H = PM+N by (8.7)(1).

Now we have (1) through (5) all equal. So let's look at (6). Let U = P1 Q.Then UQ = Q so U+UQ = U+U. This says U+UQP-- = U+UP1. Bytaking the * of both sides, we get P1QU+U = P'U+U so U = UU+U =P1U+U. Thus, UU+ = P1U+UU+ = P--U+. Therefore, UU+ = P'U+;that is, (P'Q)(P1Q) = P1(P1Q)+. But (P1Q)(P'Q)+ = [(P1Q)(P1

Q)+]* = (P1Q)+`(P1Q)` = (QP1)(QP')so(QP')+(QP1) = Pl(P--Q)+.Now (6) follows from (5). Of course (7) follows by symmetry. 0

Thus our first goal is accomplished; namely, in finite dimensions, the projec-tion onto the linear sum of two subspaces is computable in many ways in termsof the individual projections.

We now turn to our second goal of representing the projection on the inter-section of two subspaces in terms of the individual projections.

THEOREM 8.10Let M and N be subspaces of C'. Let P = PM and Q = PN. Then all the

following expressions are equal to the orthogonal projection onto m fl N.

1 I. PMn N

2. 2Q(Q + P)+P

3. 2P(Q + P)+Q

4. 2[P - P(Q + P)+P]

5. 2[Q - Q(Q + P)+Q]


6. P - (P - QP)+(P - QP) = P - (Q1P)+(Q1P)

7. Q - (Q - PQ)+(Q - PQ) = Q - (P'Q)+(P1Q)

8. P - P(PQ')+

9. Q - Q(QP1)+

PROOF As before. we have much symmetry since m fl N = N fl M. Webegin by showing (2) equals (3). To do this, we first show Q(Q + P)+P -P(Q + P)+Q =0. But Q(Q + P)+P = P(Q + P)+Q = Q(Q + P)+P +Q(Q + P)+Q - Q(Q + P)+Q + P(Q + P)+Q = Q(Q + P)+(Q + P) -(Q + P)(Q + P)+ Q. By (2.12), Col(Q) S Col(Q) + Col(P) = Col(Q + P),so Q(Q + P)+ (Q + P) = Q = (Q + P)(Q + P)+ Q. Thus, Q(Q + P)+P -P(Q+P)+Q=Q-Q=0,and so Q(Q+P)+P=P(Q+P)+Q, and itfollows that (2) equals (3).

Next, we argue that (2), and hence (3), also equals (1). We use the uniquenesscharacterization of PMnN. Let H = Q(Q + P)+P + P(Q + P)+Q = 2Q(Q +P)+P =2P(Q+P)+Q.NowHP = [2Q(Q+P)+P]P =2Q(Q+P)+P2 =2Q(Q + P)+P = H and, similarly, HQ = H. Thus, Col(H) C- M fl N.But also H = HPMnN = [Q(Q + P)+P + P(Q + P)+Q]PMnN = Q(Q +P)+PPMnN+P(Q+P)+QPMnN = Q(Q+P)+PMnN+P(Q+P)+PMnN =[Q(Q + P)+ + P(Q + P)+]PMnN = (Q + P)(Q + P)+PMnN = PMnN. Thislast equality follows because M fl N C_ Col(Q + P) and so we concludeH = PMnN

Next, we show (4) and (5) equal (1) by showing P(Q + P)+Q = P -P(Q + P)+ P. The argument that Q(Q + P)+P = Q - (Q + P)+Q is similarand will be left as an exercise. Now P(Q + P)+Q - (P - P(Q + P)+ P) _P(Q + P)+ Q - P + P(Q + P)+ P = P(Q + P)+ Q + P(Q + P)+ P - P =P(Q+P)+(Q+P)-P=P-P=0.

To see that (6) and (7) equal (1), note first that M fl N C M and m fl N C N,so PPMnN = PMnN and QPMnN = PMnN and Q P PMnN = PMnN Nowlet S = P - (P - QP)+(P - QP). We claim S is a projection. ClearlyS' = S and SP = S. It follows PS = S. In particular, Col(S) C Col(P) =M. Next, Sz = P2 - P(P - QP)+(P - QP) - (P - QP)+(P - QP)P+((P - QP)+(P - QP))2 = PEP - (P - QP)+(P - QP)] = PS = S.Now P - QP = (P - QP)(P - QP)+(P - QP) = P(P - QP)+(P -QP) - QP(P - QP)+(P - QP) so S = PS = P[P - (P - QP)+(P -QP)]=P-P(P-QP)+(P-QP)=QP-QP(P-QP)+(P-QP)=Q[P - P(P - QP)+(P - QP)] = QPS = QS. Thus, QS = S and SQ = S,so Col(S) C Col(Q) = N. Therefore, Col(S) c M fl N and so PMnNS = S.ButSPMnN =[P-(P-QP)+(P-QP)]PMnN = PPMnN-(P-QP)+(P-Q P)PMnN = PMnN - 0 = PMnN Thus, S = PMnN. The argument for (8) and

306 Projections

(9) are similar to the ones given for (6). They will be left as an exercise. Thiscompletes our theorem. 0

Next, we look at some special cases whose proofs are more or less immediate.

COROLLARY 8.2Let A E Cmxn and B E Cmxk. Then

1. [A:B][A:B]+ = PC(,!(A)+cot(B)

2. 2AA+[AA+ + BB+]+BB+ = Pco1taux;ouB)-

COROLLARY 8.3Let M and N be subspaces of C" with P = PM and Q = PN. Then if PQ =Q P,

I. PM+N = P -I- Q- P Q

2. PMnN=PQ=QP.

In particular, if PQ = 0 (i.e., if P±Q),

1. Pet+N = P + Q

2. PMnN = 0.

We end this section with an example. Suppose M = span{a,, a,,... , a,]and N = span{b,, b2, ... , b, ] are subspaces of C". We form the matricesA = [a,

I a2 I ... I a,.] and [b, Ib2 I ...

Ib,], which are n-by-r

and n-by-s, respectively. Of course, AA+ = PM and BB+ _Pc,,uB) = PN. Now form the augmented matrix M = [A:B], which is n-by-(r + s) and, with a left multiplication by a suitable invertible matrix R, we pro-duce the row reduced echelon form of M; say RREF(M) = RM = R[A:B] _

E11 E12 E12 E12

[RA:RB] _ 0 E22 . Now B = R-' E22 = R- 0 +

® E + E12 Ell E12R-' E22 and 0 ® _ ® [E 0 0] ®

(D (D (D oE12 E12 E12

® . Thus, R-1 0 has columns in m fl N. If we let R-'

then W W+ is the projection on M fl N. For example, suppose


M = span {(1, 1, 1, 1, 0), (1,2,3,4,0), (3, 5, 7, 9, 0), (0, 1, 2, 2,0)) and N =span J(2, 3, 4, 7,0), (1, 0, 1,0,0), (3, 3, 5, 7, 0), (0, 1, 0, 3, 0)}. Then RM =

0 2 0 -1 0

z-1 -z 1 0

-1 1 1 -1 0Z 1 z 0 00 0 0 0 1

1 1 3 0: 2 1 3 0

1 2 5 1 : 3 0 3 1

1 3 7 2: 4 1 5 0

1 4 9 2: 7 0 7 3

0 0 0 0 0 0 0 0

1 0 1 0 -1 0 -1 -1

0 1 2 0: 3 0 3 2

0 0 0 1 : -2 0 -2 -2 1= RREF(M).

10 0 0 0 0 1 1 -1

L 0 0 0 0 0 0 0 01 1 0 1 0 -1 0 -1 -1

E12 1 2 1 0 0 3 0 3 2

Now W= R-1 = 0 1 3 2 1 0 -2 0 -2 -2 =0 1 4 2 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0-2 0 2 1 1 r3 0 3 1

-1/12 1/12 1/4 -1/12 0

4 0 4 1 and W+ =0 0 0 0 0

The7 0 7 3

-1/12 1/12 1/4 -1/12 01/2 -1/3 -7/6 2/3 0

L O 0 0 0

1/6 0 -1/6 1/3 00 1/6 1/3 1/6 0

projection onto m fl N is W W+ _ -1/6 1 /3 5/6 0 0 . The1/3 1/6 0 5/6 0

0 0 0 0 0trace of this projection is 2, which gives the dimension of m fl N. A basis forMflNis{(2,3,4,7,0),(1, 1, 1,3,0)}.

Exercise Set 34

1. Prove that tr((PM + PN)(PM + PN)+) = tr(PM)+tr(PM) - tr(PMnN).

2. (G. Trenkler) Argue that A is a projection iff 3tr(A*A)+tr(AAA*A*) _2Re(tr(AA + AA*A)).

308 Projections

3. Suppose PM and PN are projections. Argue that PM + PN is a projectioniIT MIN.

4. Suppose P and Q are projections. Argue that PQ is a projection ifPQ = QP. Indeed, argue that the following statements are equivalentfor projections P and Q:

(a) PQ is idempotent(b) tr(PQPQ) = tr(PQ)(c) PQ is self-adjoint(d) PQ = QP.

5. Suppose PM and PN are projections. Argue that PM PN = PN PM iffM = (MnN)®(MnNl).

6. Suppose PM and PN are projections. Argue that the following statementsare all equivalent:

(a) PM - PN is a projection(b) PN < PM(c) IIPNxII < IIPMxII for all x(d) N C M(e) PM PN = PN(f) PNPM=PN

7. Suppose P is an idempotent. Argue that P is a projection if II Pxll Ilxll

for all x.

Further Reading

[P&O&H, 1999] R. Piziak, P. L. Odell, and R. Hahn, Constructing Pro-jections on Sums and Intersections, Computers and Mathematics withApplications, Vol. 37, (1999), 67-74.

[S&Y, 1998] Henry Stark and Yongyi Yang, Vector Space Projections,John Wiley & Sons, New York, (1998).

8.3 The Fundamental Projections of a Matrix 309

8.3 The Fundamental Projections of a MatrixWe have seen how to take a matrix A and associate with it four subspaces

we called "fundamental": Col(A), Mull(A), Col(A*) and Null(A*). All thesesubspaces generate projections. Remarkably, all these subspaces are related tothe MP-inverse of A.

THEOREM 8.11Let A E C"' x" Then

1. AA+ = PCOI(A) = PA(.//(A')'

2. A+A = PNuu(A)1

3. 1 - AA+ = PC,,I(A)1 =

4. 1 - A+A = PNuiuA).

PROOF We have already argued (1) in the previous section. A similar argu-ment shows A+A is a projection. Then I - AA+ and I - A + A are projections.The only real question is what does A+A project onto? Well A+A = A+ A++so, by (1), A+A projects onto Col(A+). So the only thing left to show isthat Col(A+) = Col(A*). This follows from two identities. First, supposex E Col(A*). Then, x = A*z for some z. But A* = A+AA*, so x = A*z =A+AA*z = A+(AA*z), putting x E Col(A+). Next, suppose x E Col(A+).Then x = A+z for some z. But A+ = A*A+*A+, so x = A+z = A*(A+*A+z),putting x in Col(A*). 0

COROLLARY 8.4For any matrix A E 011 , Col(A*) = Col(A+) and Null(A) = Col(A+)'.

Next, we relate the fundamental projections to a full rank factorization of A.

THEOREM 8.12Let A = FG be a full rank factorization of A E C;' x". Then

1. AA+ = FF+

2. A+A=G+G

310

3.I,,,-AA+FF+4. I - A+A = I - G+G.

Projections

PROOF The proof is easy and left to the reader.


Example 8.113 6 13 3 13 r 1

Let A= 2 4 9 = FG = 2 9L 0 0 1

. As before,

1 2 3 1 3 1

+ - 1 /5 0 - -3/26 -11/13 79/26we compute G 2/5 0 and

F+

0 1

1/13 3/13 -9/13 ,.

1/5 2/5 0

Then G+G = 2/5 4/5 0 is a rank 2 projection onto Col(A*);0 0 1

4/5 -2/5 0I - G+G = -2/5 1 /5 0 is a rank I projection onto Null (A). Next

0 0 017/26 6/13 3/26

FF+ = 6/13 5/13 -2/13 is a rank 2 projection onto Col(A),3/26 -2/13 25/26

9/26 -6/13 -3/26and I - FF+ _ 6/13 8/13 2/13 is a rank I projection onto

-3/26 2/13 1/26Mull(A*).

Next, we develop an assignment of a projection to a matrix that will proveuseful later. The most crucial property is (3), which we use heavily later. Thisproperty uniquely characterizes A:

DEFINITION 8.7 (the prime mapping)For any matrix A E CmX", we assign the projection A' = I - A+A E

P(C' "') That is, A' is the projection onto the null space of A.

We next collect some formulas involving A'.

8.3 The Fundamental Projections of a Matrix 311

THEOREM 8.13Let A E C,nxn Then

1. AA' _ 0; (A*A)' = A'; (A *)'A = 0.

2. If P E p(Cnxn), then P'= P1 = I - P.

3. AB = 0 iff B = A'B. In fact, A' is the unique projection with thisproperty.

4. If B B* and AB = BA, then AB' = B'A and A'B = BA' andA'B BA'.

5. IfP,QElin(Cnxn),then PAQ=(Q'P)'PandPVQ=(P'AQ')'.

6. If P c Q, thenQ= PV(QAP').

PROOF

1. AA'= A(I - A+A) = A - AA+A = 0.

2. Suppose P is a projection. Then P+ = P so P' = I - P+P = I - P2 =I-P=P1.

3. Suppose AB=O. Then A'B=(I-A+A)B=B-A+AB=B-0=B. Conversely, if B = A'B, then AB = AA'B = OB = 0. Now let P beany projection with the property AB = 0 iff B = PB. Then AA' = 0,so A' = PA', making A' < P. But also, P = PP so AP = 0 whenceA+AP=0, soA'P=(I-A+A)P=P-A+AP=P, soP<A'.Thus, P = A'.

4. Suppose B = B* and AB = BA. Now BY = 0, so ABB' = 0, soBAB' = 0, so AB' = B'(AB'). Similarly, A*B = BA*, so A*B' =B'A*B'. Taking *, we get B'A = B'AB'. Thus, AB' = B'AB' = B'A.Also, AA'= O, so B A A' = O, so A B A' = O, so B A' = A'BA'. Taking* we get A'B = A'BA'. Therefore, A'B = BA'.

5. (Q'P)'P = (Q1P)'P = (I -(Q1P)+(Q'P))P = P-(Q1P)+(Q1P)= P A Q by (8.10)(6). Now (P'A Q')' = ((Q"P')'P')' = ((QP1)'P1)' =((I - (QP1)+(QP1))P1)' = (P1 - (QP1)+(QP1))' = I - (P1 -(QP1)+(QP1)) = P + (QP1)+(QP1) = P V Q by (8.9)(5). Notewe have used that P1 - (QP1)+(QP1) is a projection since P1 -(QP1)+(QP1) < P1

312 Projections

6. Suppose P < Q. Then P V (Q A P) = P V (Q'P')'P' = P V I(1 -Q)(I-P)]'P1=Pv[I-Q-P+PQJ'(I-P)=Pv(Q(I-P))=Pv(Q-QP)=Pv(Q-P).ButP(Q-P)=OsoPV(Q-P)=P+(Q-P)=Q. 0

Well if one prime is so good, what happens if you prime twice? It must betwice as good don't you think? Again A" is a projection we assign to A.

THEOREM 8.14

1. A" = A+A =

P E 1P(C""') then P" = P.

3. A=AA"=(A*)"A.

4. If AB = A, then A"<B".

5. If P is a projection, AP = A iff A" < P. That is, A" is the smallestprojection P such that A P = A.

6. (A B)" < B".

7 (AB)" _ (A"B)

8. (A*A)" - A"; (AA*)" - A*".

9. ((AB)'B*)" < A'.

10. AB* = 0 iff A"-LB" iff A"B" = 0.

/ /. If AB = AC, then A"B = A"C.

Exercise Set 35

1. Fill in the proofs of Theorem 8.12.

2. Let a = IJ .

Compute a+ and P = as . Verify that P is an orthogonalL

iprojection. Onto what does P project?

3. Fill in the proofs of Theorem 8.14.

8.4 Full Rank Factorizations of Projections 313

Further Reading

[Greville, 1974] T. N. E. Greville, Solutions of the Matrix Equation XAX= X and Relations Between Oblique and Orthogonal Projectors, SIAMJ. Appl. Math., Vol. 26, No. 4, June, (1974), 828-831.

[B-I & D, 1966] A. Ben-Israel and D. Dohen, On Iterative Computation ofGeneralized Inverses and Associated Projections, J. SIAM Numer. Anal.,III, (1966), 410-419.

8.3.1 MATLAB Moment

8.3.1.1 The Fundamental Projections

It hardly seems necessary to define M-files to compute the fundamental pro-jections. After we have input a matrix A, we can easily compute the four pro-jections onto the fundamental subspaces:

A*pinv(A) the projection onto the column space of Apinv(A)*A the projection onto the column space of A*eye(m)-A*pinv(A) the projection onto the null space of A*eye(n)-pinv(A)*A the projection onto the null space of A.

Since "prime mapping" plays a crucial role later, we could create a file asfollows:

I function P=prime(A)2 [m n] = size(A)3 P = eye(n)-pinv(A)*A.

Experiment with a few matrices. Compute their fundamental projections.

8.4 Full Rank Factorizations of ProjectionsWe have seen that every matrix A in C;'"" with r > 0 has infinitely many fullrank factorizations A = FG, where F E and G E (Cr"". The columnsof F form a basis for column space of A. Applying the Gram-Schmidt process,we can make these columns of F orthonormal. Then F*F = J. But then,it is easy to check that F* satisfies the four MP-equations. Thus, F* = F+,

314 Projections

which leads us to what we called an orthogonal full rank factorization, aswe delined in (7.6) of Chapter 7. Indeed, if U is unitary, A = (FU)(U*G) isagain an orthogonal full rank factorization if A = FG is. We summarize witha theorem.

THEOREM 8.15Every matrix A in C;'"" with r > 0 has infinitely many orthogonal full rankfactorizations.

Next, we consider the special case of a projection P E C"r "". We have alreadynoted that P+ = P. But now take Pin a full rank factorization that is orthogonal.P = FG, where F+ = F*. Then P = P* = (FG)* = G*F* = G*F+. ButP = P+ = (FG)+ = G+ F+ = G+F* and GP = GG+F* = F*. But thenP = PP = FGP = FF*. This really is not a surprise since FF* = FF+ isthe projection onto Col(P). But that is P! Again we summarize.

THEOREM 8.16Every projection P in C"" has a full rank factorization P = FG whereG=F*=F+.

9/26 -6/13 -3/26For example, consider the rank I projection 6/13 8/ 13 2/ 13

-3/26 2/ 13 1/269/26_1

[ 1 -4/3 - 1/3 ]. This is a full rank factorization but it-3/26

is not orthogonal. But Gram-Schmidt is easy to apply here. Just normalize9/26 -6/13 -3/26 3/,/2-6

the column vector and then 6/13 8/13 2/ 13 = 4/ f2-6-

-3/26 2/13 1/26 -1/26

[3/ 26 -4/ 26 -1/,/-2-6] . We can use this factorization of a projectionto create invertible matrices. This will he useful later. Say F is m-by-r of rank r,that is, F has full column rank. Then F+ _ (F*F)-1 F*, as we have seen. Thissays F+ F = Ir. Now write I - F F+ = F, Fj in orthogonal full rank factoriza-

F+tion. Form them -by-m matrix S = . . . . We claim S-t = [ F: F, ]. We corn-

F1+

puce SSF+

=F.+

[FF,] F+F + F, Ir +=

F1+F ] - [ F1+F Im-r

8.5 Affine Projections 315

hIrsince Fi = F+(FI F+) = F+(I - FF+) so F, +F =

® /,,,-rF+(I - FF+)F = 0 and F, = FI Fi F, = (I - FF+)FIas well.

To illustrate, let F =3

2

1

13

9 , which has rank 2. We3

soF+F, = 0

saw above / -

3

26 F+FF+ = 26 [

3 -426 26 ] = FI F+ so S =I

-1 F+26 1

3 - 11 7 9

26 13 2 6 3 13 3

I 3 - 9 26

13 13 1 3 i s invertible with inverse S-1 = 2 9 6

26 26 2 6 1 3 -1

Similarly, if G has full row rank, we write 1 = G+G = F2 F2 in orthogonalG

full rank factorization and form T G+ F2 I. Then T-1 =F2

Exercise Set 361 1

1. Find an orthogonal full rank projection ofz

8.5 Affine Projections

We have seen how to project (orthogonally) onto a linear subspace. In thissection, we shall see how to project onto a subspace that has been moved awayfrom the origin. These are called affine subspaces of C".

DEFINITION 8.8 (affine subspace)By an affine subspace of C", we mean any set of vectors of the form M(a, U){a + ul u E U), where U is a subspace of C" and a is a fixed vector from C".

The notation M(a, U) = a + U is very convenient.

316 Projections

We draw a little picture to try to give some meaning to this idea.

Figure 8.2: Affine subspace.

The following facts are readily established and are left to the reader.

1. a E M(a, U).

2. M(U)=U.3. M(a,U)=UiffaEU.

4. M(a, U) c M(b, W) iff U C W and a - b E W.

5. M(a, U) = M(a, W) iff U = W.

6. M(a, U) = M(b, U) if a - b E U.

7. M(a, U) is a convex subset of C".

8. M(a,U)nM(b, W)00iffa-bE U+W.

9. If Z E M(a, U) n M(b, W) then M (a, U) n M(b, w) = M(z, u n W).

In view of (5), we see that the linear subspace associated with an ahnesubspace is uniquely determined by the affine subspace. Indeed, given the affinesubspace, M, U = (y - aly E M) is the uniquely determined linear subspacecalled the direction space of M. We call the affine subspaces M(a, U) andM(b, W) parallel if U C W. If M(a, U) and M(b, W) have a point c in


common, then M(a, U) = M(c, U) c M(c, W) = M(b, W). Thus, parallelaffine subspaces are either totally disjoint or one of them is contained in theother. Note that through any point x in C", there is one and only one affinesubspace with given direction U parallel to W, namely x + U. Does all thissound familiar from geometry'? Finally, we note by (8) that if U ® W = C",then M(a, U) fl M(b, W) is a singleton set.

Next, we recall the correspondence we have made between linear subspacesand orthogonal projections; U H Pu = Pu = Pu, where U = Col(Pu) =Fix(Pu ). We should like to have the same correspondence for affine subspaces.The idea is to translate, project, and then translate back.

DEFINITION 8.9 (affine projection)For x in C", define fM(a,u)x = a + Pu(x - a) = a + Pu(x) - Pu(a) _

Pu(x) + Pul (a) as the projection of x onto the affine subspace M(a, U).

The following are easy to compute.

1. nM( U)(x) = fM(a,u)(x) for all x in C".

2. M(a, u) = {yly = nM(a.u)(x)} for all x in C".

3. M(a, U) = {xlnu(a,u)(x) = x}.

4. nL(a,ul)(x) = X nM(a.u)(x)

5. fM(a.u)(') = a - Pu(a) = Pul(a).

6. If a E U, f M(a.U)(x) = Pu(x)

7. 11 fs(a.u)(y) - nM(a.u)(Z)II = II Pu(z) - Pu(y)11.

8. 11nM(a.U)(x)MM2 = IIPu(x)112 + IIPui(a)112.

9. X = Il m(a.u)(x) = (x - a) - Pu(x - a) = Pul(x - a).

As a concrete illustration, we take the case of projecting on a line. Let b 0 'be in C" and let U = sp(b) the one dimensional subspace spanned by b.We shall compute the affine projection onto M(a, sp(b)). From the definition,

< x - alb > _ b*(x - a)b1IM(a.,v(b)(x) = a + < bib > b = a + b*b = a + bb+(x - a)

>a - x xlb

b =(I - bb+)(a) + bb+(x) and nM(a.,p(b),() = (x - a)W +<

< >b1b(x - a) + bb+(a - x) = (I - bb+)x - (I - bb+)a = (I - bb+)(x - a). Wesee that, geometrically, there are several ways to resolve the vector nM(a,u)(x)as the algebraic formulas have indicated.

318 Projections

Now, if A is a matrix in C""', then we write M(a, A) for the affine subspaceM(a, Co!(A)). Consider a line L of slope in in R2. Then L = {(x, mx + yo)Ix E

118} e R2. We associate the matrixI

1 0 I of rank I with this line. Then

( 1 0 1 x 0 lL = Sl m 0 /I + fix, y E ]E81. We can then use the pscu-

y yodoinverse to compute the orthogonal projection onto the direction space of thisline, which is just the line parallel passing through the origin. The affine projec-

x 0 1 0 1 0tion can then be computed as 171L

=+

y yo in 0 in 0I ,

I}m I x + 0 -M n12

I TM2 -IT n-17

xSo, for example, say y = -3x +4. Then 11L _

y

ISO 11L

x-3v+1210

-3x+9y+410

10+30+12 1

10 J-30+90+4

10

.8

E L.6.4

Next, consider a plane in R3 given by ax + by + cz = d with c # 0. We1 0 0

associate the rank 2 matrix I 0 1 0 I with this plane. Then the plane

0I _u -hhr c

is described by the set of vectors

0 0

1I;]+[]XYZER}.1

0 z

x

We compute the affine projection 11 y

z

0 0 I 0 U + xI o 0 1 0 y

-)' 0 h 0 zc r c


11 0 0The pseudoinverse above is easily computed and the product 0 1 0

a (3 0

1 0 0 +

0 1 0 =of 03 0

1+p2

a a1+a p2 1+az+p2

1 taeI+

I+al+l321 0 0

For example, to the plane 3x+2y+6z = 6, we associate 0 1 0-I -I 02 3

so the plane consists of the vectors in1 0 0 x 00 1 0 y + 0 Ix, y, z E R . The affine projection

z 31 0 z 1

11 0 0x

n Y o 1 0Z

2 3!0

21 310 Z

+1 0 0 x 00 1 0 y o +

1

40 -6 -1849 49 49 X o

0 -6 L5 ZL2- 49 49 49 Y + 0 =L1 -18 -12 13 Z - 1 1

49 49 4940 -649 X 49 y-6 45T9- X 49Y

-18X -1249 X 49 Y

simply compute II

I

wish to project 1

5

-18 T9-(Z

_ 2(Z492(z - 1) . So, to project [ 11 1 onto this plane, we3

49(Z-1)+I-2

1 = IS

349

As one last illustration, suppose we

onto the line in R3 given by L(t) = (1, 3, 0)+t(1, -1, 2).

1 0 0

We associate the matrix -1 0 0 of rank I to this line so the line be-2 0 0

comes the set of vectorsl 0 0 x

-1 0 0 y +2 0 0 z

1

3

0

Ix,y,ZER .

I -I 2

6 6X

The affine projection onto this line is II y = 6Z 2

1 -26 6

-2 4

6 6 6

320 Projections

X

I 1 '-Y (x - I)

Y ([1_rn+1]3 = 6(x-I>Z 0 0 6(x - I)

1 3

So, for example, n 1 = 1

5 4

.:6'(y - 3) z + I'(y-3) 6 z+3

6(y-3)6z

Since orthogonal projections solve a minimization problem so do affine pro-jections. If we wish to solve the approximation problem for x and M(a, U) -that is, we wish to minimize Ilx - yll for y E M(a, U) - we see that this isthe same as minimizing ll(x - a) - (y - a)Ii, as y - a runs through the linearsubspace U. But this problem is solved by PU(x-a). Thus Ilx - fM(a.U)(x)II =ll(x - a) - PU(x - a)ll solves the minimization problem for M(a, U). To

1

illustrate, suppose we want to minimize the distance between I and the3

plane 3x + 2y + 6z = 6. What vector should we use? n I of course!3

Then 1 -n I

3 3I I

=49L1-9- E= . Similarly,

I

the minimum distance from I to the line L(t) _ (1, 3, 0) + t(I , -1, 2) is5

1 I 1 3 -2I' n 1 = I - 1 = 0 5.5 5 5 4 1

Our last task is to project onto the intersection of two affine subspaces whenthis intersection is not empty. You may recall it was not that easy to projectorthogonally onto linear subspaces. It takes some work for affine subspaces aswell. We offer a more computational approach this time.

Suppose matrices A and B determine affine subspaces that intersect(i.e., M(a, A) fl M(b, B) # 0). By (8), we have b - a E Col(A) + Col(B)

xiCol([AB]). Thus, b - a = Ax1 + Bx2 = [AB] . One useful fact

X2

about pseudoinverses isY+ _ (y*y)+y*, so we compute that [A:B]+ _\

B*]

[B*

JI = I A*(AA* + BB*)+ 1

Let D = AA*+B*(AA* + BB*)+ J

BB*. Then the projection onto Col(A) + Col(B) is [A:B][A:B]+ _


r A* D+[A:B] I

BA = AA*D++BB*D+ = DD+. So M(a, A) fl M(b, B) # 0

iff b - a = D D+ (b - a). Next, b - a = D D+(b - a) iff b - a = A A * D+(b -a) + BB*D+(b - a) if AA*D+(b - a) + a = -BB*D+(b - a) + b. AlsoCol(A), Col(B) c Col(A) + Col(B) so DD+A = A and DD+B = B. ButDD+A = A implies (AA* + BB*)D+A = A, so BB*D+A = A - AA*D+Awhence BB*D+AA* = AA* - AA*D+AA*. But this matrix is self-adjoint,so BB*D+AA* = (BB*D+AA*)* = AA*D+BB*.

THEOREM 8.17M(a, A) fl M(b, B) 0 0 i.,fb - a = DD*(b - a), where D = AA* + BB*. Inthat case, M(a, A) fl M(b, B) = M(c, C), where c = AA*D+(b - a) + a andC = [BB*D+A:AA*D+B].

PROOF Let Y E M(a, A) fl M(b, B). Then y = Ax, + a and y = Bx2 + bfor suitable x, and x2. Then Ax, + a = Bx2 + b so b - a = Ax, - Bx2. In

x,matrix notation we write this as [A - B] . . . = b - a. The solution set

r 1

X2

to this equation isL

XZ J= [A: - B]+(b - a) + [I - [A: - B][A: - B]+]z =

A*D+(b - a) I- A*D+A A*D+B z, l-B+D+(b - a)

I+ [ B*D+A I - B*D+B J

[ z2 J, where z =

x ].ThenY=Axi+a=B_BB*D+B]z=B[AA*D(b_a)+a]+L X2

[A - DD*D+A:A*D+B]z, = c + [BB*D+A:AA*D+B]z,. This puts y inM(c, C).

Conversely, suppose y E M(c, C). Then y = c+[BB*D+AAA*D+B]z for

some z E Col(C). But then y = (AA*D+(b - a) + a) + [A - AA*D+A:AA*

D+B]z = A[A*D+(b - a)] + A[I - A*D+A:A*D+B]z + a = Awl + a E

M(a, A).Similarly,y =(-BB*D+(b-a)+b)+[BB*D+AB-BB*D+B]z =

B[-B*D+(b - a) + B[B*D+A:1 - B*D+B]z + b = Bw2 + b E M(b, B).This puts y E M(a, A) fl M(b, B) and completes the proof.

Before we write down the projection onto the intersection of two affinesubspaces, we obtain a simplification.

322 Projections

THEOREM 8.18

[BB*D+A.AA*D+B][BB*D+A:AA*D+B]+ = [BB*D+AA*][BB*D+AA*]+.

PROOF ComputeCC+ = [BB*D+A.AA*D+B][BB*D+A:AA*D+B]+CC*(CC*)+ = CC*[BB*D+A:AA*D+B][BB*D+A:AA*D+B]+ _CC*[BB*D+AA*D+BB* + AA*D+BB*D+AA*]+ =CC*[(BB* + AA*)D+BB*D+AA*]+ = CC*[DD+BB*D+AA*]+ _CC*[BB*D+AA*]+ = [BB*D+AA*][BB*D+AA*]+, which is whatwe want. 0

We can now exhibit a concrete formula for the affine projection onto the inter-section of affine subspaces: fI M(a.A)nM(b.B)(x) = [B B* D+AA*] [ B B* D+AA* ]+(x - c) + c, where c = AA*D+(b - a) + a. We note with interest thatBB*D+AA* = BB*[AA* + BB*]+AA* is just the parallel sum of BB* andAA*. In particular the orthogonal projection onto Col(A) flCol(B) can be com-puted [BB*D+AA*][BB*D+AA*]+. This adds yet anotherformula to our list.

We illustrate with an example. Consider the plane in R3, 3x - 6y - 2z = 15,1 0 0 0

to which we associate the matrix A = 0 1 0 and vector a = 0

3 -3 0 -;5

Also consider the plane 2x + y - 2z = 5, to which we associate the matrix1 0 0 0

B = 0 1 0 and vector b = 0 . We shall compute the projec-t

10 ZS

1 0 z

tion onto the intersection of these planes. First, AA* = 0 1 -33 -3 45z 2

1 0 13 -11 -1

4 5

BB* = 0 1 z , and D+ = (AA* + BB*)+ _ -1 3 1

4 4 5

I 1 s -I I

252 4 5 5

Now DD+ = 1 ensuring that these planes do intersect without appealing to di-I 1 49 7 215 1 I 100 100 40 I

mension considerations. Next, c = I 57 and C =100

1L -i40 andJ

L 3

J

21 3 940 40 16


425 (x - s) + 44255)

+ 85 (z + 3) + 5

85(x - 5) + 85(y + 5) + 17(z + 3) - 3

14t

2t , where t = is (x -s)

+ 4i5 (y +s)

+ 4z5 (z + 3).

15t

-3

The formula we have demonstrated can be extended to any number of affinesubspaces that intersect nontrivially.

Exercise Set 371 0 1 1 0 1 1 0 1 0 1 1

0 1 1 1 1 0 0 1 0 1 0 1

1. Consider A =1 1 0 1 0 1 and B = 0 1 1 0 1 0

1 1 0 0 1 1 1 1 0 1 0 1

1 1 0 0 1 1 1 1 0 1 0 0LO 0 1 0 1 1 0 0 1 0 0 1

1 00 1

n nLet a = 0 and b = 0 . Compute AA*, BB *, and D + _

0 01 1

(AA* + BB*), and c. Finally, compute fl(x, y, z), the projection ontothe intersection of M(a, A) and M(b, B).

Further Reading

[P&O, 2004] R. Piziak and P. L. Odell, Affine Projections, Computersand Mathematics with Applications, 48, (2004), 177-190.

L96-s)+ 2(Y+5)+85(z+3)+s425

=z

[Rock, 1970] R. Tyrrell Rockafellar, Convex Analysis, PrincetonUniversity Press, Princeton, NJ, (1970).

324 Projections

8.6 Quotient Spaces (optional)In this section, we take a more sophisticated look at affine subspaces. Thissection may be omitted without any problems. Let V be a vector space and Ma subspace of V. Let VIM (read "VmodM") denote the collection of all affinesubspaces of V generated by M

VIM= IV +M IV E V).

Our goal is to make this set VIM into a vector space in its own right usingthe same scalars that V has. In other words, affine subspaces can be viewed asvectors. First, we review some basics.

THEOREM 8.19Let M be a subspace of V Then TA.E.:

1. u+M=v+M

2. vEU+M

3. UEV+M

4. U-VEM.


Let's talk about the vector operations first. There are some subtle issues here.Suppose u + M and v + M are affine subspaces. (By the way, some peoplecall these sets cosets.) How would you add them? The most natural approachwould be

(u+M)®(v+M):= (u+v)+M.

The problem is that the vector u does not uniquely determine the affine spaceu + M. Indeed, u + M could equal u, + M and v + M could equal v, + M.How do we know (u + v) + M = (u, + v,) + M? The fancy language is, howdo we know ® is "well defined"? Suppose u + M = u, + M and v + M =v, + M. Then u - u, E M and v - v, E M. But M is a subspace, so the sum(u-u,)+(v-v,) E M. A little algebra then says (u+v)-(u, +v,) E M whence(u+v)+M = (u, +v,)+M. This says (u+M)®(v+M) _ (u, +M)®(v, +M).In other words, it does not matter what vectors we use to determine the affinespace; the sum is the same.

8.6 Quotient Spaces (optional) 325

The next issue is scalar multiplication. The same issue of "well definedness"must be confronted. Again, the most natural definition seems to be

a(u + M) = au + M for a any scalar.

Suppose u + M = v + M. Then u - v E M, so a(u -V) E M since M isa subspace. But then au - ON E M, so au + M = av + M, and again thisdefinition of scalar multiplication does not depend on the choice of vector torepresent the affine subspace. Note, the zero vector in VIM is V + M = Mitself! We can visualize VIM as being created by a great collapsing occurringin V where whole sets of vectors are condensed and become single vectors ina new space.

V= M

u3+MU1 +M u2+M 1 -+ V/M =

THEOREM 8.20Let V be a vector space and M a subspace. Then, with the operations definedabove, VIM becomes a vector space in its own right.

PROOF The hard part, proving the well definedness of the operations, wasdone above. The vector space axioms can now be checked by the reader. 0

There is a natural mapping T1 from V to VIM for any subspace M of Vdefined by -q(u) = u + M.

THEOREM 8.21With the notation above, -9 : V -+ VIM is a linear transformation that is onto.Moreover, ker(i) = M.

PROOF We supply a quick proof with details left to the reader. First, compute''(au+0v) = (au+(3v)+M = (au+M)®({3v+M) = a(u+M)®{3(v+M) =cog(u) + (3q(v) and v E ker(q) iff 9(v) _ if v + M = M if v E M soker('q) = M. 0

In finite dimensions, we can get a nice relationship between the dimensionof V and that of VIM.

THEOREM 8.22Suppose V is a finite dimensional vector space and M is a subspace of V If{m1, m2, ... , mk} is a basis for M and {vl+ M, ... , v,, +M) is a basis forVIM, thef.3={mj,m2,...,mk,vi,...,v}isabasisforV

326 Projections

PROOF Suppose first that {vi +M,... , is a linearly independent setof vectors. Then, if r- a,v, = 0 , we have -'0 v/,t: = Tl(-6) = q(ra,v,) _E il(a,v,) = r a,(v, + M). Thus, by independence, all the as are zero. Thissays n < dim(V) so that VIM must he finite dimensional. Moreover, consider alinear combination of the ms and vs, say a,v, + (3j mj. Then q(> a, v, +

(3imj) = E a,(v, + M), which implies all the as are zero. But then,we are left with E 13jmi = -6. Again, by the assumption of independence,all the (3s are zero as well. Hence we have the independence we need. Forspanning, let v he in V. Then q(v) is in VIM, so there must exist as such thatv+ M = E a,(v, + M) _ (E a,v,)+ M, and so v - (1: a,v,) E M. But sincewe have a basis for M, there must exist (3's with v - (E a,v,) = (3imi.Therefore, v = E a,v, + E 13jmj, which is in the span of 5. This completesthe argument. 0

COROLLARY 8.5Infinite dimensions, dim(V)=dim(M) + dim(VIM) for any subspace M of V.


We end with a theorem about linear transformations and their relation toquotient spaces.

THEOREM 8.23Suppose T : V -a V is a linear transformation and M is an invariant subspace

n nfor T. Then T : VIM -* VIM, defined by T(v + M) = T(v) + M, is a well-defined linear map whose minimal polynomial divides that of T.


Exercise Set 38


2. Finish the proof of Theorem 8.20.

3. Let M be a subspace of V. For vectors v, u in V, define u " v iff u - v EM. Argue that ' is an equivalence relation on V and the equivalenceclasses are exactly the affine subspaces generated by M.

4. Prove Corollary 8.5.

5. Fill in the details of Theorem 8.21 and Theorem 8.23.

8.6 Quotient Spaces (optional) 327

6. Suppose T : V -+ W is a linear map and K is a subspace of V containedin ker(T). Prove that there exists a unique linear map T so that T: V/K -

W, with the property, that T o Tj = T. Moreover, ker(T) = (v + Kv E ker(T)) and i m(T) = im(T). (Hint: Begin by proving that T is welldefined.) Then compute im(T) and ker(T).

7. (first isomorphism theorem) Let T : V -- W be a linear map. Arguethat T:V/ker(T) W defined by T(v + ker(T)) = T(v) is not onlylinear but is actually one-to-one. Conclude that V/ker(T) is isomorphicto im(T).

8. Suppose V = M ® N. Argue that N is isomorphic to V/M.

9. Suppose T : V -* W is a linear map. Prove that dim(ker(T)) +ditn(im(T)) = dim(V), where V is finite dimensional.

10. (second isomorphism theorem) Let V be a vector space with subspacesM and N. Prove that (M + N)/N is isomorphic to M/(M fl N). (Hint:Define T : M+N -+ M/(Mf1N) by T(m+n) = m+(MnN). Prove Tis well defined and determine the kernel. Then use the first isomorphismtheorem.)

11. (third isomorphism theorem) Let V be a vector space with subspaces Mand N, and suppose M C_ N. Prove that (V/M)/(N/M) is isomorphic toV/N. (Hint: Define T : V/M -> V/N by T(v + M) = v + N. Prove Tis well defined and determine the kernel. Then use the first isomorphismtheorem.)

12. Suppose you have a basis b1, ... bk of a subspace M of V. How couldyou use it to construct a basis of V/M?

Chapter 9

Spectral Theory

eigenvalue, eigenvector, eigenspace, spectrum,geometric multiplicity, eigenprojection, algebraic multiplicity

9.1 EigenstuffIn this chapter, we deal only with square matrices. Suppose we have a matrix

A in C11" and a subspace M e C""" Recall, M is an invariant subspacefor A if A(M) c M. That is, when A multiplies a vector v in M, the resultAv remains in M. It is easy to see, for example, that the null space of A isan invariant subspace for A. For the moment, we restrict to one-dimensionalsubspaces. Suppose v is a nonzero vector in C" and sp(v) is the one-dimensionalsubspace spanned by v. Then, to have an invariant subspace of A, we wouldneed to have A(sp(v)) g sp(v). But v is in sp(v), so Av must be in sp(v) also.Therefore, Av would have to be a scalar multiple of v. Let's say Av = Xv.Conversely, if we can find a nonzero vector v and a scalar K with Av = Xv,then sp(v) is an invariant subspace for A. So the search for one-dimensionalinvariant subspaces for a matrix A boils down to solving Av = Xv for a nonzerovector v. This leads to some language.

DEFINITION 9.1 (eigenvalue, eigenvector, eigenspace)A complex number K is called an eigenvalue of A in C""" if and only if

there is a nonzero vector v in C" with Av = Xv. The vector v is called aneigenvector. The set of all possible eigenvalues of A is called the spectrum ofA and is denoted X(A). Fork, an eigenvalue of A, we associate the subspaceEig(A, K) = Mt, = Null(Xl - A) and call Mx the eigenspace associatedwith X. The dimension of M,, is called the geometric multiplicity of X. Finally,Mx, being a subspace, has a projection PMA associated with it called the Xtheigenprojection. In fact, PM. = (XI - A)'.

329

330 Spectral Theory

Eigenvalues were used in 1772 by Pierre Simon Laplace (23 March 1749-5 March 1827) to discuss variations in planetary motions. Let's look at someexamples to help understand what is going on.

Example 9.1

1. Consider the identity matrix I,,. Then v for all v, so I is the onlyeigenvalue o f I,,. T h e r e f o r e , X 1 } . Also, Eig(I, 1) = M, =Null (l I, -1") =Null(O) = C", so the eometric multiplicity of 1 is n.Similarly, the zero matrix 0 has 0v = F = Ov for all v so X(0) = {0}and Mo = Null(01 - 1) = Null(I) The geometric multiplicityof0is0.

1 0 0

2. For a slightly more interesting example, consider A = 0 1 00 0 0

Then I is an eigenvalue of A since1 0 0 1 I

0 1 0 0 =00 0 0 0

1

1 0 . We see an eigenvectorfor I is0

0 0I = 1 = 10 0

0 0I ,so I

0 0

l

pendent of the other eigenvector 00

0 0 0-x, 0 0

x l X32 = 0 = j [ 0 I I Of E C and dim (MO) =

0 0 0

0 0 0 x, 0 (3

xI 0 0 0 X2 = 0 = y0 0 1 x3 0 0

dim(M,)=2.

1. Meanwhile, M, = Null(1 - A) = Null 0 0 0 =0 0 1

1 1 0 00 . But also, 0 I 0

0 0 0 0

is an eigenvector for 1, inde-

1 0 0 0

. Also 0 1 0 0 =0 0 0 1

so 0 is an eigenvalue for A as well. Indeed, Mo =

-1 0 0 x, 0xl 0 -1 0 x2 = 0

X3 0

IR,yEC so

9.1 Eigenst uff 331

1 0 0 1 0 0 x2

3. Let A = 0 2 0 . Then Av = Xv iff 0 2 0 x2 =0 0 3 0 0 3 x3

xi XI Xx' I

A x2 iff 2x2 = 1,x2 . Thus 0 has eigenvalue A _

X3 3X3 AX3 0

0 01, 1 hash = 2, and 0 has A = 3. Thus X (A) = 11, 2, 31 and

0each eigenvalue has geometric multiplicity equal to 1. Can you generalizethis example to any diagonal matrix?

1 2 3 1 2 3

4. Let A = 0 4 5 . Then Av = Xv iff 0 4 5

0 0 6 0 0 6

x, x1 xi + 2x2 + 3x3 A xI

X2 = 1\ x2 if 4X2 + 5X3 = A x2 iffX3 X3 6X3 A X3

(1 - K)xl + 2X2 + 3X3 = 0(4 - )Ox2 + 5x3 = 0 . Solving this system we find that z is

(6-X)x3 =0 1

2 1

an eigenvector for A = 6, 3 has eigenvalue A = 4, and 0 has0 0

eigenvalue A = 1. Thus, A (A) = 11, 4, 6} . Do you see how to generalizethis example?

5. (Jordan blocks) Fork E C and m any integer, we can form the m-by-mmatrix formed, by taking the m-by-m diagonal matrixdiag(A, A, ... , A) and adding to it the matrix with all zeros except ones

r 1on the superdiagonal. So, for example, J, (k) = [>\], JZ()\) =

x 1

L 0 X J ,

A 1 01 0 0

J3(A) = 0 A 1 , JA) = 0 0 X

0

0 0 A0 0 0 A

, and so on. It is

relatively easy to see that x is the only eigenvalue of Jm(A).

THEOREM 9.1Let A E Cnx". Then these are all equivalent statements:

1. A E \ (A).

2. Null( Al - A) is not the trivial subspace.

332 Spectral Theory

3. A/ - A is not an invertible matrix.

4. The system of equations ( Al - A)v = has a nonzero solution.

5. The projection ( XI - A)' is not 0.

6. det(XI -A)=0.

PROOF The proof is left as an exercise. II

We note that it really does not matter whether we use Al - A or A - AIabove. It is strictly a matter of convenience which we use.

For all we know, the spectrum of a matrix A could be empty. At this point,we need a very powerful fact about complex numbers to prove that this neverhappens for complex matrices. We use the polynomial connection.

THEOREM 9.2Every matrix A in C" x" has at least one eigenvalue.

PROOF Take any nonzero vector v in C" and consider the vectors v, Av,A2v, ... , A"v. This is a set of n + I vectors in a vector space of dimension n,hence they must he dependent. This means there exist scalars a0, I ,.-- . , an,not all zero, so that aov + aoAv + + a,A"v = 6. Choose the largestsubscript j such that aj j4 0. Call it in. Consider the polynomial p(z) =ao + aiZ + ... + a,nz"'. Note a # 0 but a,,,+1 = am+2 = ... = an = 0by our choice of m. This polynomial factors completely in C[z]; say p(z) _y (r, - z) (r2 - z) . (r,,, - z) where y # 0. Then 6 = (aol + a1 A +-..+

(v) = [y (r1l -A)(r2/ -A)...(rn,I - A)] v. IfNull(r,,,I - A)(W), then r,,, is an eigenvalue and we are done. If not, (r,,,I - A)v #If r,,,_1 / - A has a nontrivial nullspace, then r,,,_1 is an eigenvalue and weare done. If not, (r,,,_11 - A)(rn,I - A)v # 6. Since we ultimately get thezero vector, some nullspace must be nontrivial. Say (rj1 - A) has nontrivialnullspace, then ri is an eigenvalue and we are done. a

Now we know the spectrum of a complex matrix is never empty. At theother extreme, we could wonder if it is infinite. That is also not the case, as thefollowing will show.

THEOREM 9.3The eigenvectors corresponding to distinct eigenvalues of A E C" xn are linearlyindependent.

9.1 Eigenstuff 333

PROOF Suppose X1, 1\2. ... .Xn, are a set of distinct eigenvalues of A withcorresponding eigenvectors V1, V2, ... , v,,,. Set aivi + a2v2 + + a,,,v,n =6. Ourhope isthat at a] I +re zero.

+a v n

Let) iA

eq A vi + A) + amA1vm =Then A, - A

-ai()\2-1\1)(1\3-XI)...(X,,,-XI)vi+W+ +-6, so a I must equal zero.In a similar fashion we show all the a's are zero. 0

COROLLARY 9.1Every matrix in C""" has at most n distinct eigenvalues. In particular, thespectrum of an n-by-n complex matrix is a finite set of complex numbers.

PROOF You cannot have more than n distinct eigenvalues, else you wouldhaven + I independent vectors in C". This is an impossibility. 0

COROLLARY 9.2If A E C"11' has it distinct eigenvalues, then the corresponding eigenvectorsform a basis of C.

We end this section with a remark about another polynomial that is associatedwith a matrix. We have noted that K is an eigenvalue of A iff the system ofequations (XI - A)v = 6 has a nontrivial solution. This is so iff det(XI -A) = 0. But if we write out det(XI - A), we see a polynomial in X. Thispolynomial is just the characteristic polynomial. The roots of this polynomialare the eigenvalues of A. For small textbook examples, this polynomial is aneat way to get the eigenvalues. For matrices up to five-by-five, we can intheory always get the eigenvalues from the characteristic polynomial. But whowould try this with an eight-by-eight matrix'? No one in their right mind! Nowthe characteristic polynomial does have interesting theoretical properties. Forexample, the number of times an eigenvalue appears as a root of the characteristicpolynomial is called its algebraic multiplicity, and this can be different from thegeometric multiplicity we defined above. The coefficients of the characteristicpolynomial are quite interesting as well.

Exercise Set 39

1. Prove that k E K(A) iff K E \(A*).

2. Argue that A is invertible iff 0 X(A).

3. Prove that if A is an upper (lower) triangular matrix, the eigenvalues ofA are exactly the diagonal elements of A.

334 Spectral Theory

4. Argue that if S is invertible, then X(S-'AS) = k(A).

5. Prove that if A = A*, then k(A) c R.

6. Argue that if U* = U-', then X E X(U) implies IXI = 1.

7. Prove that if X E X(A), then k" E X(A").

8. Argue that il' A is invertible and X E X(A), then X-' E k(A-').

9. Suppose X and µ are two distinct eigenvalues of A. Let v be an eigenvectorfor A. Argue that v belongs to Col (X l - A).

10. As above, if v is an eigenvector for p., then v is an eigenvector for M - A,with eigenvalue X - p..

1 1 . Suppose A is an n-by-n matrix and {v, , v2, ... , v" } is a basis of C"consisting of eigenvectors of A. Argue that the eigenvectors belongingto k, one of A's eigenvalues, form a basis of NUll(XI - A), while thosenot belonging to X form a basis of Col (X l - A).


13. Prove that if A is nilpotent, then 0 is its only eigenvalue. What does thatsay about the minimum polynomial of A?

14. Solve example (9.1.4) in detail.

15. Find explicit formulas for the eigenvalues ofa bc d

16. If A is a 2-by-2 matrix and you know its trace and determinant, do youknow its eigenvalues?

17. Prove that if U E C"X" is unitary, the eigenvectors of U belonging todistinct eigenvalues are orthogonal.

18. Argue that if the characteristic polynomial XA(x) = det(xl - A) hasreal coefficients, then the complex eigenvalues of A occur in complexconjugate pairs. Do the eigenvectors also occur this way if A has realentries?

19. Suppose A and B have a common eigenvector v. Prove that v is also aneigenvector of any matrix of the form aA + {3B, a, {3 E C.

20. Suppose v is an eigenvector belonging to h j4 0. Argue that v E Col(A).Conclude that Eig(A, X) g Col(A).

9.1 Eigenstuff 335

21. If X $ 0 is an eigenvalue of AB, argue that X is also an eigenvalue ofBA.

22. Suppose P is an idempotent. Argue that X(P) c {0, 1).

23. Suppose p is a polynomial and k E X(A). Argue that p(X) E X(p(A)).

24. How would you define all the eigenstuff for a linear transformation T:C" -+ C"?

25. Suppose A = All A iz Argue that XA = XA XA Conclude[ ® A22

that X(A) = X(A i,) U X(Azz)

26. Find an explicit matrix with eigenvalues 2, 4, 6 and eigenvectors (l, 0,0), (1, 1, 0), (1, 1, 1).

27. Argue that the product of all the eigenvalues of A E Cn " is the determi-nant of A and the sum is the trace of A.

28. Suppose A E C""" has characteristic polynomial XA(x) _ 2c;x;. If youi=o

are brave, argue that c, is the sum of all principal minors of order n-r of Atimes (-1)"-'. At least find the trace and determinant of A among the cs.

29. Do A and AT have the same eigenvalues? How about the same eigenvec-tors? How about A and A*?

30. Suppose B = S-'AS. Argue that XA = XB Do A and B necessarilyshare the same eigenvectors?

31. Suppose v is an eigenvector of A. Argue that S-lv is an eigenvector ofS-'A.

32. Here is another matrix/polynomial connection. Suppose p(x) = x" +a,,_,x"-i + + alx + ao. We associate a matrix to this polynomialcalled the companion matrix as follows: C(p(x)) =

0 l 0 0

0 0 1 0 .Whatisthe

1

-ao -a, -612 ... ... ... -an_2 -a._Icharacteristic polynomial of C(p(x))? What are the eigenvalues?

33. Suppose A is a matrix whose rows sum up to m. Argue that m is aneigenvalue of A. Can you determine the corresponding eigenvector?

336 Spectral Theory

34. Suppose C = C(x" - I). Suppose A = a,I + a2C + + a,,C"-I

Determine the eigenvalues of A.

35. Argue that A B and BA have the same characteristic polynomial and hencethe same eigenvalues. More generally, if A is m-by-n and B is ti-by-m,then AB and BA have the same nonzero eigenvalues counting multiplic-

ity.(Hint: [0 ! ] '[ B ®] [ ® ! ] - [ B BA so

36. Argue that if A B = BA, then A and B have at least one common eigen-vector. Must they have a common eigenvalue?

37. Suppose M is a nontrivial subspace of C", which is invariant for A. Arguethat M contains at least one eigenvector of A.

38. Argue that Eig(A, A) is always an invariant subspace for A.

39. (J. Gross and G. Trenkler) Suppose P and Q are orthogonal projectionsof the same size. Prove that P Q is an orthogonal projection iff all nonzeroeigenvalues of P + Q are greater or equal to one.

40. Argue that A E X(A) implies ale E X(aA).

41. Argue that if X E X(A) with eigenvector v, then X E X(A) with eigenvectorv.

42. Prove that if X E X(A ), then k + T E X(A + TI).

43. Prove that h E X(A) if A E A(S-'A S), where S is invertible.

I N-I 0 0 0 0N

0

N2 N-2 0 0 ... 0N N

0

44. Consider the matrix PN = 0 0 k N-k 0N N

0

0 0 0 0 N-I I

0 0 ...N

0

N

1

Find all the eigenvalues of PN and corresponding eigenvectors.

9.1 Eigenstuff 337

Further Reading

[A-S&A, 2005] Rhaghib Abu-Saris and Wajdi Ahmad, Avoiding Eigen-values in Computing Matrix Powers, The American MathematicalMonthly, Vol. 112, No. 5., May, (2005), 450-454.

[Axler, 1996] Sheldon Axler, Linear Algebra Done Right, Springer,New York, (1996).


[Holland, 1997] Samuel S. Holland, Jr., The Eigenvalues of the Sum ofTwo Projections, in Inner Product Spaces and Applications, T. M. Rassias,Editor, Longman, (1997), 54-64.

[J&K, 1998] Charles R. Johnson and Brenda K. Kroschel, Clock HandsPictures for 2 x 2 Real Matrices, The College Mathematics Journal,Vol. 29, No. 12, March, (1998), 148-150.

[Ol9, 2003] Gregor Olgavsky, The Number of 2 by 2 Matrices over Z/pZwith Eigenvalues in the Same Field, Mathematics Magazine, Vol. 76,No. 4, October, (2003), 314-317.

[Scho, 1995] Steven Schonefeld, Eigenpictures: Picturing the EigenvectorProblem, The College Mathematics Journal, Vol. 26, No. 4, September,(1995), 316-319.

[Tr&Tr,2003] Dietrich Trenkler and Gotz Trenkler, On the Square Rootof aaT + bbT, The College Mathematics Journal, Vol. 34, No. 1, January,(2003), 39-41.

[Zizler, 1997] Peter Zizler, Eigenpictures and Singular Values of a Matrix,The College Mathematics Journal, Vol. 28, No. 1, January, (1997), 59-62.

9.1.1 MATLAB Moment

9.1.1.1 Eigenvalues and Eigenvectors in MATLAB

The eigenvalues of a square matrix A are computed from the function

eig(A).

338 Spectral Theory

More generally, the command

[V, D] = eig(A)

returns a diagonal matrix D and a matrix V whose columns are the correspond-ing eigenvectors such that A V = V D. For example,

> > B = [1 +i2+2i3+i; 2+214+4i91; 3+3i6+6i8i]B =

1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 1.0000i2.0000 + 2.0000i 4.0000 + 4.00001 0 + 9.0000i3.0000 + 3.0000i 6.0000 + 6.00001 0 + 8.00001

> > eig(B)

ans =6.3210+ 14.1551i0.0000 - 0.00001-1.3210- 1.1551i

> > [V, D] = eig(B)V =

0.2257 - 0.1585i 0.8944 0.68390.6344 + 0.06931 -0.4472 - 0.00001 -,5210 + 0.35641

0.7188 0.0000 + 0.00001 -0.0643 - 0.3602iD=

6.3210+ 14.1551i 0 00 0.0000 + 0.0000i 00 0 -1.3210- 1.15511

A really cool thing to do with eigenvalues is to plot them using the "plot"command. Do a "help plot" to get an idea of what this command can do.Or, just try

plot(eig(A),' o'), grid on

and experiment with a variety of matrices.

9.2 The Spectral TheoremIn this section, we derive a very nice theorem about the structure of certain

square matrices that are completely determined by their eigenvalues. Ourapproach is motivated by the theory of rings of operators pioneered by IrvingKaplansky (22 March 1917-25 June 2006). First, we present some definitions.

9.2 The Spectral Theorem 339

DEFINITION 9.2 (nilpotent, normal)A matrix A E C"'n is called nilpotent iff there is some power of A that

produces the zero matrix. That is, An = O for some n E N. The least such n iscalled the index of nilpotency.

A matrix A is called normal iff AA* = A*A. That is, A is normal iff itcommutes with its conjugate transpose.

Example 9.2

1. If A = A* (i.e., A is self-adjoint or Hermitian), then A is normal.

2. If U* = U-l (i.e., U is unitary), then U is normal.

3. If A* = -A (i.e., A is skew Hermitian), then A is normal.

0 1 0

4. N = 0 0 1 is nilpotent. What is its index?0 0 0

Recall that we have proved that AA* = 0 if A = 0 for A E Cnxn The firstfact we need is that there are no nonzero normal nilpotent matrices.

THEOREM 9.4If A is normal and nilpotent, then A = 0.

PROOF First, we show the theorem is true for A self-adjoint. The argumentis by induction. Suppose A = A*. Suppose A' = 0 implies A = 0 as long asi < n. We want the same implication to hold for it. We look at two cases: n iseven, say n = 2m. Then 0 =A" = A2in = A'A' = A'"A*m = Am(Am)*. Butthen, Am = 0, so applying induction (m < n), we conclude A = 0. On theother hand, if n is odd other than 1, say n = 2m + 1, then 0 =A" = An+' =A2m+2 = Am+' Am+l = Am+l (A*)m+l = (Am+i)(A`"+l )* so again An'+l = 0.

Applying induction, A = 0 since in + 1 < n.Now, if A is normal and A" = 0, then (AA*)" = An(A*)" = 0 so, since

AA* is self-adjoint, AA* = 0. But then A= 0 and our proof is complete. 0

Next, recall that for any matrix A, we defined A' = I - A+A; that is, A' isthe projection onto the null space of A. This projection was characterized asthe unique projection P that satisfies AX = 0 iff X = PX. Finally, recall theminimal polynomial p A(x), which is the monic polynomial of least degree suchthat PA (A) = 0.

340 Spectral Theory

THEOREM 9.5Let A be normal. Then A' is a polynomial in A and hence

0=AA'=A'A.

PROOF We first dispose of some trivial cases. If A = 0, 0' - 1, soA' is trivially a polynomial in A. Let's assume A # 0. Suppose AA(x) _ao + a,x + a2x2 + + x"'. If ao $ 0, then by a result from Chapter 1, wehave A-' exists so A' _ 0, again a trivial case. Now let's assume ao = 0. ThusPA(x) = aix+aZx2+ +x"'. Of course, at could be zero oroo could be zeroalong with at. But this could not go on because if a2 = a3 = a,,,-i were allzero, we would have A"' = 0 hence A"' = 0, whence A = 0 by (9.4) above.So some al in this list cannot be zero. Take the least j with at # 0. Call thisj,i.Then ai 0but ai-i =0.Note i <m-1 soi <m.

Thus, AA(x) = aix'+ai+ix'+i+ +x'. Form E = l+Ei=, A = 1+ai

ai+i A+L+-22 A2+ .+ A"'. Now E isa polynomial in A, so clearly AE _ai ai ai

EA. Also E is normal, EE* = E*E. Next, suppose AX = 0 for some matrixX. Then EX = X since X kills off any term with an A in it. Conversely, supposeX = EX for some matrix X. Now 0 =AA(A) = A'E, so (AE)i = A'E' =(A' E)Ei-' = 0, so by (9.4) above AE = 0. Note AE is a polynomial in thenormal matrix A, so is itself normal. But then AX = AEX = 0. Hence wehave the right property: AX = 0 if X = EX. The only question left is if E is aprojection.

But AE = 0 so EE = E, making E idempotent. Moreover, (A E*)(A E*)* _AE*EA* = E*AEA* = 0, using that E* is a polynomial in A* and Acommutes with A*. Thus AE* = 0, so E* = EE*. But now we have it sinceE* = EE* = (EE*)* = (E*)* = E. Thus E is a projection and by uniquenessmust equal A'. 0

Let us establish some notation for our big theorem. Let A E C"". Let(X1 - A)' denote the Xth eigenprojection of A, where X is any complex

number. Recall X is in the spectrum of A, X(A), iff P, # 0.

THEOREM 9.6Suppose A is normal. Then

1. A Px = Px A = k PP.

2. (Al - A)P5 = (p. - X)Px.

9.2 The Spectral Theorem

3. {PXJX E X(A)} is a finite set of pairwise orthogonal projections.

4. If k(A) = {0}, then A = 0.

341

PROOF

1. Since (XI - A)P, = (XI - A)(X! - A)' = 0, we have APB, = \P,.Also, A! - A is normal so by (9.5), PP(XI - A) = (Al - A)PP soPx - PA = - AP. Canceling XP,, we get PA = AP.

2.

3.

(µl -A)Px µPx-APt\=VP),-APx=(p.-k)PP.

We know by Theorem 9.3 that k(A) is a finite set. Moreover, if k, # k2 in

X(A), then A,, =(A - ]\i 1)

PX2. Thus, Pa, Pat = Pa, (A-X 11) Pat =1\1 - X2

(P A - XI P. Pk2 = 0, making P -..Pa, whenever Ai and A2 areX2-1\1

distinct eigenvalues.

4. Suppose 0 is the only eigenvalue of A. Look at the minimum polynomialof A, p A (x) = ao + a l x + + x'. Since we are working over complexnumbers, we can factor µA completely as linear factors µA(x) = (x -Ri)(x - 132) . (x - (3). If all the 13s are zero, IJA(A) = 0 =A', soA' = 0. By Theorem 9.4, A = 0 and we are done. Suppose not allthe 13s are zero. Relabeling if necessary, suppose the first k, 13s are notzero. Then p.A(x) = (x - R1)(x - R2)...(x - Rk)xm-k. Then 0 =

ILA(A)=(A-Ril)...(A-Rkl)A'"-k.Now13, #OsoQi X(A)soPp, _ 0. But (A - 13i l)[(A - 021) ... (A - (3k/)A'"-k] = ® implies

[(A-R21)...(4-Rk1)Am-k] =(A-R1!)'[(A-R2l)..Am-k] = 0.

The same is true for 02 so (A -1331) .. (A - Rk l )A"'-k = ®. Proceedinginductively, we peal off each A -131 term until we have only A'"-k =But then using (9.4) again we have A = 0. 0

We need one more piece of evidence before we go after the spectral theorem.

THEOREM 9.7Suppose A is normal and F is a projection that commutes with A (i.e.,

AF = FA). Then

1. A F is normal.

2. (Al-AF)'=(Al-A)'F=F(A!-A)'ifA54 0.

342 Spectral Theory

PROOF

1. First (AF)(AF)* = AFF*A* = AF2A*=FAA*F = F*A*AF(A F)*(AF).

2. Suppose k # 0 and X is any matrix. Then (Al - FA)X = 0 iff AX =FAX iff X = F(,\-' AX) iff X = FX and X = \-I AX iff X = FXand \X = AX iff X = FX and (A/ - A)X = 0 if X = FX andX = (Al - A)'X iff X = F(AI - A)'X. By uniqueness, F(Al - A)'= (Al - FA)'. Also, (Al - A)' is a polynomial in Al - A and hence Fcommutes with (,\/ - A). 0

Now we are ready for the theorem. Let's establish the notation again. LetA E C"'" and suppose the distinct eigenvalues of A are A 1 , A2, ... , A,,,.

Suppose the eigenspaces are Mt10*12, ... , Mi,,,, and the eigenprojections arePx, , Pat, ... , Ph.

THEOREM 9.8 (the spectral theorem)With the notations as above, the following statements are equivalent:

1. A is normal.

2. A= A i Pa, +1\2812 + ... +'\"I pX 811 + ....+ Ph,, = /,and Pa, 1 Px,for all \i j41\j.

3. The eigenspaces M),, are pairwise orthogonal and M,,, ® *12 ® . . . ®

Al. = CI I.

PROOF Suppose A is normal. Let F = I - E"', P. Then F* = (I -E" P\,)* = 1* - >" P* = I P,, = F, so F is self-adjoint. Next,

F2 = F(l PP,) = F - >"'I FPr, . But FPa, = P,\, - m P,,pa,

p2 = 0 for each j, so F2 = F, using that the eigenprojections areorthogonal. Thus, F is a projection. Moreover, A F = FA by (9.7) (1) so(X/ - AF)' _ (Al - A)'F = F(A/ - A)' by (9.6) for all \ # 0. Thus,for all A A 0, (A/ - A)'(/ - F) = J :(XI - A)'(PP) = ((A/ - A)')2 =(A - Al)'. Then (Al - FA)' = (A/ - A)'F = 0 for all k # 0. Thissays that the only eigenvalue possible for FA is zero so FA = AF = 0by (9.6)(4). Hence F = A'F = PoF = 0 for, if 0 E MA), then PoF =FPo = 0 and if 0 ¢ A(A), then Po = 0. Thus, in any case, I


Next, A = Al = A(j I Px,) AP.,, X P), by (9.7)(1).Thus, we have the hard part proved (i.e., (1) implies (2)). Next, we show (2)implies (1). Suppose (2) is true. Then AA*

Ein I1,\,12 P.\, and A*A = X1P )(r" I X Px,) _ rm IX1I2Px,. Thus

AA* = A*A. 0

Again suppose (2) is true. The eigenspaces are orthogonal pairwise becausetheir projections are. Moreover if V E C", then v = Ix = (Px, + + P,.)(v) =Px,v+Px2v+...+PX,,V E Mx,®1Mx2®1...®1Mx,,,,SO Cn = M, \ ,

M , \ . .xm . Conversely, if we assume (3), then since Mx,-LM?,, we have Px, I Px,when i j. Moreover if v E C", then v = v1 +v2+ +v,n, where v1 E M.Then ='I VI+X2v2+

+k,nvn,. But P,\, vi = v; since Fix(Px,) = Mx,. Px,v; = 6 fori # j. ThusPx,v; = v; soAv= X1Px,v+1\2Px2v+...+k11,P.,,,V=(XIPx,+...+X,nP),.)y.

Px, + + PPM)v and so (2) follows. This completes the proof of the spectraltheorem.

There is uniqueness in this spectral representation of a normal matrix, as thenext theorem shows.

THEOREM 9.9Suppose A is a nonempty finite set of complex numbers and ( Qx IX E A) is

an orthogonal set of projections such that EXEA Qx = 1. If A = XQx,then A is normal, Qx = (X! - A)' for all K E A, and X(A) c A. Moreover,A=X(A)iffQx AO for eachnonzeroX E A.

PROOF We compute A*Qx = (IIEA XQx)*Qx,, = (>xeA XQx)Qx =`xEA XQ.Qx,, = AflQx and AQx0 = XQx)Qx =

[L.XEA XQxQ1, =

XoQx,,. Thus, A*A = A*(EXQx) = EXA*Qx = EXXQx = EXAQx =A is normal. We adopt the convention that Qx = 0

if K E A. Since for any µ E C, µl - A = µl - EXQx = E(µ - X)Qx,(µ! - A)Qµ = ExEA(µ - X)QxQ . = O. Then, for all µ E C with X APx < Q. Then, for any µ E C, Q' = 1 - Q R = ExENlµl Qx P.Thus, P,, < Q . for each µ E C so that P. _ (Al - A)' = Q . for allµ E C. Now, if µ 0 A, then P,,, = Q1, = 0, which says µ X(A). Thus,X(A) C A. If µ E A such that Q. A 0, then Pµ = Q1, 54 0, which meansµ E X(A). Thus, A = X(A) if Q , 0 for every µ E A, which completesour proof. 0

344 Spectral Theory

One of the nice consequences of the spectral theorem is that certain functionsof a matrix are easily computed.

THEOREM 9.10Let A be normal with spectral representation A = a EX(A) X, Ph,. Then

1. AZ = L.A,EX(A) ki Ph,

2. A"= FX,EX(A)kP),,.

3. If q(x) is any polynomial in C[x], then q(A) _ E),,Ea(A)9(,\r)P.,

Moreover, if X E X(A), there exists a polynomial Px(x) in C[x] such that& = Pk(A).

PROOF

1. Compute A2 = (EXPx)(EPµ)=Ea r \µPaPµ = EX2Pi' = EX2P),.

2. Use induction.

3. Compute µA"+vA"' ovP\+ µa",Pa = E"(fLk"+UA,")P),

so clearly, q(A) _ EX q(k)PP. O

To get the polynomial P),(x), we recall the Lagrange interpolation polynomial:

PA (x) -(X k )(.l-A2). (A A,_I)(A' /+I) (a Then Pa,(Xi) = bij, the

/ (a/-ki)(A -k2). (k,-k _i)(X,-A,+i).. (A,-k,,)

ICronecker S. Then P,\, (A) _ ; Pa,(k,)P), = P), _ (XjI - A)'.

Recall that if k is a complex number, X+ '\ For0 if X=0*

normal matrices, we can capture the pseudoinverse from its eigenvalues andeigenprojections.

THEOREM 9.11Suppose A is normal with spectral representation A =EXEX(A) XP),; then A+ = >XEX(A) i+P),. In particular A+ is normal also.



Exercise Set 40

1. Give an example of a matrix that is not self-adjoint, not unitary, and notskew adjoint and yet is normal.

2. The spectral theorem is sometimes formulated as follows: A is normal ifA = >2 X u, u*, where the u; are an orthonormal basis formed from theeigenspaces of A. Prove this form.

3. Suppose A is normal. Argue that Ak is normal for all k E N; moreover,if p(x) E C[x], argue that p(A) is normal.

4. Argue that A is normal iff the real part of A commutes with the imaginarypart of A. (Recall the Cartesian decomposition of a matrix.)

5. Suppose A E C' . Define Ra = (Al - A)-' fork 0 k(A). Prove thatR,, - RN, = (FL - X) R,\ RI,.

6. Argue that an idempotent P is normal iff P = P*.

7. Argue that A is normal iff A* is normal and the eigenspaces associatedwith the mutually conjugate eigenvalues of A and A* are the same.

8. Suppose A is normal and k is an eigenvalue of A. Write A = a + bi,where a and b are real. Argue that a is an eigenvalue of the real part ofA and b is an eigenvalue for the imaginary part of A.

9. Argue that a normal matrix is skew-Hermitian iff all it eigenvalues arepure imaginary complex numbers.

10. Write a matrix A in its Cartesian decomposition, A = B + Ci, where B

and C have only real entries. Argue that it A is normal, so isB -C

[C BCan you discover any other connections between A and this 2-by-2 blockmatrix?

11. Prove that A is normal iff every eigenvector of A is also an eigenvectorof A*.

12. Argue that A is normal iff A* is expressible as a polynomial in A.

13. Prove that A is normal iff A commutes with A*A.

14. Argue that A is normal if A commutes with A + A* iff A commuteswith A - A* if A commutes with AA*-A*A.

346 Spectral Theory

15. Prove the following are all equivalent statements.

(a) A is normal(b) A commutes with A + A*(c) A commutes with A - A*(d) A commutes with AA* - A*A

16. Prove that a matrix B in C'""" has full row rank iff the mapping A HAB* + B*A takes C0"" onto the set of m-by-m Hermitian matrices.

17. Let A E C0"' be normal and suppose the distinct eigenvalues of A areX 1 , X2, ... , X,,. Suppose XA(x) = Hk" i(x - Xk)'. Argue that PA (X)fl'T_i(x - Xk) and dim(Eig(A, Xk)) = dk.

18. Give an example of 2-by-2 normal matrices A and B such that neitherA + B nor AB is normal.

Group ProjectA long-standing problem on how the eigenvalues of Hermitian matrices A andB relate to the eigenvalues of A + B has been solved. Read about this and writea paper. See [Bhatia, 20011.

Further Reading

[Bhatia, 20011 Rajendra Bhatia, Linear Algebra to Quantum Cohomol-ogy: The Story of Alfred Horn's Inequalities, The American MathematicalMonthly, Vol. 108, No. 4, (2001), 289-318.

[Brown, 1988] William C. Brown, A Second Course in Linear Algebra,John Wiley & Sons, New York, (1988).

[Fisk, 2005] Steve Fisk, A Very Short Proof of Cauchy's InterlaceTheorem, The American Mathematical Monthly, Vol. 112, No. 2,February, (2005), 118.

9.3 The Square Root and Polar Decomposition Theorems 347

[Hwang, 2004] Suk-Geun Hwang, Cauchy's Interlace Theorem forEigenvalues of Hermitian Matrices, The American MathematicalMonthly, Vol. 111, No. 2, February, (2004), 157-159.

[J&S, 2004] Charles R. Johnson and Brian D. Sutton, Hermitian Matrices,Eigenvalue Multiplicities, and Eigenvector Components, SIAM J. MatrixAnal. Appl., Vol. 26, No. 2, (2004), 390-399.

[Zhang 1999] Fuzhen Zhang, Matrix Theory: Basic Results andTechniques, Springer, New York, (1999).

9.3 The Square Root and Polar Decomposition Theorems

The spectral theorem has many important consequences. In this section, wedevelop two of them. We also further our analogies with complex numbers andsquare matrices.

THEOREM 9.12Let A E C""". If A A*= X P where P is a projection and A 0 0, then k is

a positive real number.

PROOF Suppose AA* = XP, where Pisa projection and A # 0. Then AA*is not 0, so there must exist v E C" with AA*v # 6. Then Pv # -6 either.But IIA*v112 =< A*vIA*v >= < AA*vlv >=< XPvly >= X < Pvly >=

X < Pvl Pv >= X II Pv112. Now A*v # 0 so k =IIA*y11211 Pv112 is positive and real. 0

COROLLARY 9.3The eigenvalues of AA* are nonnegative real numbers.

PROOF Suppose A # 0. Since A A * is normal, it has a spectral representationAA* _ Es E AA*)' for any p. in X(AA*).Then (EA)(EA)* = EAA*E = EX Ea(AA*) X; E(XiI - AA*)'E = p.E. Now ifEA = 0 then µE _ 0, so p = 0 since E 0 because µ E K(AA*). Finally,if EA 54 0, then p. > 0 by Theorem 9.12 above. 0

348 Spectral Theory

THEOREM 9.13 (the square root theorem)For each A E C" x", there exists B = B* such that B2 = AA*.

PROOF By the spectral theorem, AA* = I:A Ea(AA*) ki(Xil -- AA*)'. ByCorollary 9.3,wehaveX >_ OforallX E h(AA*).PutB = ,X,EO(AA*)

AA*)'. Then clearly B = B* and B22 = AA*. 0

One of the useful ways of representing a complex number z = a + bi isin its polar form z = reie. Here r > 0 is real, ere has magnitude 1, and(ere)* = e-ie = (eie)-'. The analog for matrices is the next theorem.

THEOREM 9.14 (polar decomposition theorem)For any A E Cnx", there exists B = B* in C"'" and U with U = U+ suchthat A = BU. Moreover, U" = A", U*" = A*", and B22 = AA*.

PROOF By the square root theorem, there exists B = B* in C"," such thatB2 = AA*. Let U = B+A, Then BU = BB+A = (BB+)*A = B+*B*A =(B*+B*)A = B*"A = B"A = (B*B)"A = (B2)"A = (AA*)"A = A*"A = A.Also, since B*" = (BB*)" _ (AA*)" = A*", U" = (B+A)" (B+"A)" =(B*"A)" _ (A*"A)" = A" and U* _ (A*B+*) = (A*B*+)" (A*B+)', _(A*"B+)l" _ (B"B+)" = [(B+*)"B+] = B+ = B` = A*", Finally, we

claim U* = U+. But UU* = (B+A)(A*B+) = B+(AA*)B+ = B+B2B+ =(B+B)(BB+) = B"B" = B*"B*" = B*" = A*" = U*". Also, U*U =(A*B+)(B+A) = A*(B+)2A = A*[(B*B)+B*B+]A = A*[(B2)+BB+]A =

A*[(B2)+B*,']A = A*[(B2)+B ]A = A*(B2)+B l]A = A*[(B+)2(B"A)] _A*(B+)2A = A*(AA*)+A = A*A*+ = A" = U". Finally, UU*U = UU" _U and U*UU* = U"U* _ (U+*)"U* = U*. Thus, U* satisfies the fourMoore-Penrose equations, and so U* = U+. 0

We end this section with an interesting result about Hermitian matrices.

THEOREM 9.15Suppose A = A *, B = B *, and AB = BA. Then there exists C = C* and polynomialsp and q with real coefficients such that A = p(C) and B = q(C).

PROOF Use the spectral theorem to write A = X, E, + + X, E, and B =µ, F, +.. + µs F,,. Now AB = BA implies BEj = Ej B for all j, so Ej Fk =Fk Ej for all j, k. Let C = >c jk Ej Fk. Note C = C*. Choose polynomials

j,k

such that p(cjk) = Xj and q(cjk) = µk for all j, k. Then C" _ > cilkEj Fk and

9.3 The Square Root and Polar Decomposition Theorems 349

P(C)=EXJEjFk=F_ (F_XiEj) Fk=>AFk=A1: Fk=Aandq(C)=E µk EJ Fk = E (>2 EJ (µk Fk) = B. 0

Exercise Set 41

1. Write A in its polar form A = BU. Argue that A is normal if B and Ucommute.

2. Write A in its polar form A = BU. Argue that h is an eigenvalue ofiff \2 is an eigenvalue of B.

3. Write A in its polar form A = BU. Suppose X is an eigenvalue of Awritten in its polar form X = reie. Argue that r is an eigenvalue of B andei0 is an eigenvalue of U.

4. Prove that A is normal if in a polar decomposition A = BU and AU =UAiffAB=BA.

Further Reading

[Higham, 1986a] N. J. Higham, Computing the Polar Decomposition-With Applications, SIAM J. Sci. Statist. Comput., Vol. 7, (1986), 1160-1174.

[Higham, 1986b] N. J. Higham, Newton's Method for the Matrix SquareRoot, Math. Comp., Vol. 46, (1986), 537-549.

[Higham, 1987] N. J. Higham, Computing Real Square Roots of a RealMatrix, Linear Algebra and Applications, 88/89, (1987), 405-430.

[Higham, 1994] N. J. Higham, The Matrix Sign Decomposition and ItsRelation to the Polar Decomposition, Linear Algebra and Applications,212/213, (1994), 3-20.

Chapter 10

Matrix Diagonalization

10.1 Diagonalization with Respect to EquivalenceDiagonal matrices are about the nicest matrices around. It is easy to add,

subtract, and multiply them; find their inverses (if they happen to be invertible);and find their pseudoinverses. The question naturally arises, is it possible toexpress an arbitrary matrix in terms of diagonal ones? Let's illustrate just how

1 1 2

easy this is to do. Consider A =1

2 2 . Note A is 4-by-3 with rank 2.

L 1 2 2

1 1

First, write A in a full rank factorization; A1 2 1 0 2 _1 1 0 1 01 2

FG. Now the columns of F and the rows of G are independent, so there arenot be any zero columns in F or zero rows in G. We could use any nonzeroscalars really, but just to he definite, take the lengths of the columns of F,normalize the columns, and form the diagonal matrix of column lengths. So A =

Z ioZ 2 2 0 1 0 2

Now do the same with the rows ofz 0 10 0 1 0

io

1 2

IOri

G. Then A =2 io

2-3-

l0 ][ 0 1][ 0 1 0[ 0io

2 io

351

352 Matrix Diagonalization

1

2 10

1 2f 0 L 0 22 to f

[ 0 0 ] [ 0 10 It is clear that you can do

z to1

2

this with any m-by-n matrix of rank r: A = F, DG1, where F, has independentcolumns, D is r-by-r diagonal with nonzero entries down the main diagonal,and G 1 has independent rows. All this is fine, but so what? The numbers alongthe diagonal of D seem to have little to do with the original matrix A. Only thesize of D seems to be relevant because it reveals the rank of A. All right, supposesomeone insists on writing A = F, DG1, where F, and G, are invertible. Well,this imposes some size restrictions; namely, F, and G, must be square. If A is inC!' , then F, would have to be m-by-m of rank m, G 1 would have to be n-by-nof rank n, and D would have to be m-by-n of rank r, meaning exactly r elementson the diagonal of D are nonzero. Now theory tells us we can always extendan independent set to a basis. Let's do this for the columns of F, and the rowsof G1 . This will get us invertible matrices. Let's do this for our example above:

2 ,o

2 io

2

a, b,

a2 b2

a3 b3

a4 b4

245- 0 0 f 0 f0 ]0 0 0 1 00 0 0 ... ... ...0 0 0 Cl C2 C3

= A Notice it does not matter how we completed those bases..

1 2 2

Note the rank of A is revealed by the number of nonzero elements on thediagonal of D. In essence, we have found invertible matrices S and T such thatSAT = D, where D is a diagonal matrix. The entries on the diagonal of Das yet do not seem to be meaningfully related to A. Anyway, this leads us to adefinition.

DEFINITION 10.1 (diagonalizable with respect to equivalence)Let A E Cr"". We say A is diagonalizable with respect to equivalence iff

there exist invertible matrices S in C"'x"' and T in Cnxn such that SAT =

D'0 (D®

, where D, E c;xr is diagonal of rank r. Note that if r = m, we

D,write SAT = [D,®], and if r = n, we set SAT =

10.1 Diagonalization with Respect to Equivalence 353

THEOREM 10.1Let A E C"'. Then A is diagonalizable with respect to equivalence if andonly if there exists nonsingular matrices S E Cmxm and T E Cnxn such that

SAT = Ir

PROOF Suppose first such an S and T exist. Choose an arbitrary diago-

nal matrix D, in C;xr. Then D = I D' ®J is nonsingular and so isL 0 Im-r

rDS. Let SI = DS and T, = T. Then SIATI = DSAT = D I Ir ® _

L

Dr00

Conversely, suppose A is diagonalizable with respect to equiva-0 1.11

lence. Then there exist S1, TI invertible with SI AT,D,0 00

J=

r / ®1 f ® ®1 = D ® ® Choose S = DI SI and T0

=m-r J 11 J LL

r 11 r 1

TI.Then SAT = D-I SI ATI = D-1 DL

Jr ®J =.

®0 This com-

pletes the proof. U

This theorem says there is no loss in generality in using Ir in the definition ofdiagonalizability with respect to equivalence. In fact, there is nothing new here.We have been here before. This is just matrix equivalence and we proved inChapter 4 (4.4) that every matrix A E C01 " is equivalent to a matrix of the form

I,®® O ], namely rank normal form, and hence, in our new terminology, that

every matrix A E C;'x" is diagonalizable with respect to equivalence. What wedid not show was how to actually produce invertible matrices S and T that bringA to this canonical form where only rank matters from a full rank factorizationof A.

Let A E Cr x" and let A = FG be a full rank factorization of A. ThenFixm

A[G+ : (In - G +G) W2WI [Im - FF+]mxm nxr nxn nx(n-r)

(m-r)xm

FixmFG[G+:(I - G+G)W2] _

WI [IM - FF+]F+AG+ F+A(I - G+G)W2 Ir 0

.W,(Im - FF+)AG+ WI(I - FF+)A(I - G+G)W2 0 0l

Note that the arbitrary matrices W1 E C(m-r)xmand W2 in Q`nx(n-r) were needed

354 Matrix Diagonulization

to achieve the appropriate sizes of matrices in the product. At the moment, theyare quite arbitrary. Even more generally, take an arbitrary matrix M E Cr"'

F+ r 1

Then J A[G+M : (I - G+G)W21 =L

® ®J. We see

W,(1 - FF+)that every full rank factorization of A leads to a "diagonal reduction"of A.

Of course, there is no reason to believe that the matrices flanking A aboveare nonsingular. Indeed, if we choose W, = 0 and W2 = 0, they definitelywill not he invertible. We must choose our W's much more carefully. We againappeal to full rank factorizations. Begin with any full rank factorization of A,say A = FG. Then take positive full rank factorizations of the projectionsI,,, - FF+ = FIG, and I, - G+G = F2G2. That is, F1* = G, and F2 = G2.Then we have the following:

1. F,=Gt

2. G1 F,+

3. F2 G+

4. G2 F2

5. G,F=G,Gi G,F=G,(I - FF+)F=G

6. F+Gi = F+F1 = F+F, F,+F, = F+F,G, F, = F+(I - FF+)F, =0

7. G2G+ = G2G+ G2G+ = G2F2G2G+ = G2(I - G+G)G+ =

8. GGZ =GF2=GF2F2 F2=GF2G2F2=G(I-G+G)F2=®.

F+Now we shall make invertible by a judicious choice

W1(I - FF+)F+

of W,. Indeed, we compute that . . . [F : (I - FF+)Wi*1 =W1(I - FF+)

F+F F+(I - FF+)Wj - Ir

W1(I - FF+)F W1 (I - FF+)Wi ] [0 W,(I - FF+)W1We badly need W1(I - FF+)Wl* All right, choose W, = F1+ _G,. Then W1 (I - FF+)Wl = W, FIG,Wi = F 1+F, G1FI+* = IG,FI+*

F+GIG*, = I. Thus, if we take S = , then S is invertible and

F1+ (I - FF+) J

10.1 Diagonalization with Respect to Equivalence 355

S-' = [F : (1 - FF+)Fi *]. But F,+(l - FF+) = Fi F1G1 = G1 = Fl +,F+

so we can simplify further. If S = .. , then S-' = [F : F1]. Sim-

F

ilarly, if T = [G+ : (1 - G+G)W2], we choose W2 = F2 and find thatG

T-1 _ . These invertible matrices S and T are not unique. TakeI F2F++W3(1 -FF+)

S =I.

Then S-1 = [F : (1 - FW3)F1], as the

LF' J

reader may verify. Here W3 is completely arbitrary other than it has to be

the right size. Similarly for T = [G+ + (I - G+G)W4 : F2 ], we haveG

T-1 _ . . . , where again W4 is arbitrary of appropriate size.

F2(1 - W4G)Let us summarize our findings in a theorem.

THEOREM 10.20

Every matrix A in Cr` is equivalent to1r

® ® . If A = FG is any1 ]

full rank factorization of A and I - F F+ = F1 F1 and 1 - G + G = F2 F2*r F+ + W3(1 - FF+) 1

are positive full rank factorizations, then S = I I and

L F1 J

T = [G+ + (1 - G+G)W4 : FF ] are invertible with S-' = [F : (I - FW3)F1 ]C

rand T-' _ and SAT =

L

1` ®JF2(1 - W4G)


Example 10.11 2 2 1 2

LetA= 7 6 10 =FG= 7 6 ( 1 0 1 1

4 4 6 4 4 0 1 i in C4.3. First,2

1 0 1 1 0 2


r 17 I -10 16 138 38 T8 T8

1 9 -14 -838 38 38 381-FF+= . Next, we compute a positive full

-10 -14 26 5

38 38 38 38

16 -8 4 2438 38 38 38 J

rank factorization of this projection as described previously: I - FF+-1 4

s7-1 -2

f 57

2 I

f 57

0 657

-1 -I 2

a 16- f6

= FI FI . Then we find FI+ _57 57 57 57

-32 16 -876 76 76

37 -9 14

76 76 762 16

76 76

37 -976 76

S =

0

57 57 57 57 J

28

76

30

67

Now we can put S together;

-8 2876 76

14 -3076 76

-1 -1 2f f v4 -2 I 6

and S-1 =

r1 2

7 6

4 42 I

f 57

1 0 0 657

T is computed in a similar manner. We find T =

1 0 0

SAT =0 0 0

as the reader can verify.

0 0 0

5 -2 -69 9 9

2 8 -3i9 9

4 2 6

9 9 9

:.-_ and9

Actually, a little more is true. The matrix S above can always be chosen tobe unitary, not just invertible. To gain this, we begin with an orthogonal fullrank factorization of A, A = FG, where F* = F+. That is, the columnsof F form an orthonormal basis of the column space of A. Then SS* _

I I(F+)*.(1-FF+)Wil =WI (1 - FF+)

r F+F

Ir 0As before, select WI = FI+ and get SS* = /.

0 W1(1 - FF+)WI

10.2 Diagonalization with Respect to Similarity 357

Exercise Set 42

1. Verify the formulas labeled (1) through (8) on page 354.

Further Reading

[Enegren, 1995] Disa Enegren, On Simultaneous Diagonalization ofMatrices, Masters Thesis, Baylor University, May, (1995).

[Wielenga, 1992] Douglas G. Wielenga, Taxonomy of Necessary andSufficient Conditions for Simultaneously Diagonalizing a Pair of Rect-angular Matrices, Masters Thesis, Baylor University, August, (1992).

similar matrices, principal idempotents, minimal polynomialrelative to a vector, primary decomposition theorem

10.2 Diagonalization with Respect to Similarity

In this section, we demand even more of a matrix. In the previous section,every matrix was diagonalizable in the weak sense of equivalence. That willnot be the case in this section. Here we deal only with square matrices.

DEFINITION 10.2Two matrices A, B E C"' are similar (in symbols A - B) if there exists

an invertible matrix S E C""" with S -' A S= B. A matrix A is diagonalizablewith respect to similarity if A is similar to a diagonal matrix.

Let's try to get a feeling for what is going on here. The first thing we note isthat this notion of diagonalizability works only for square matrices. Let's justlook at a simple 3-by-3 case. Say S-' AS = D or, what is the same, AS =

Xi 0 0SD, where D = 0 k2 0 and S = [sIIs21s3. Then A[s11s21s3] =

0 0 k3


X, 0 0

[si Is21s31 0 1\2 0 or [As, IAs2IAs31 = [Xisi 1X2s21X3ss1. This says0 0 1\3

As, = #\,s,, As2 = X2s2 and As3 = X3s3. Thus, the diagonal elements ofD must be the eigenvalues of A and the columns of S must be eigenvectorscorresponding to those eigenvalues. Moreover, S has full rank, being invertible,so these eigenvectors are independent. But then they form a basis of C3. Thenext theorem should not now surprise you.

THEOREM 10.3Let A E Cnx". Then A is diagonalizable with respect to similarity iff A hasn linearly independent eigenvectors (i.e., C" has a basis consisting entirely ofeigenvectors of A).

PROOF The proof is left as an exercise. Just generalize the exampleabove. 0

Thus, if you can come up with n linearly independent eigenvectors of A EC""" you can easily construct S to bring A into diagonal form.

THEOREM 10.4Let A E C""" and let Xi , X2, ... X, be distinct eigenvalues o f A. Sup-

pose v,, v2, ... , vs. are eigenvectors corresponding to each of the Xs. Then{v,, V2.... , v,} is a linearly independent set.

PROOF Suppose to the contrary that the set is dependent. None of theseeigenvectors is zero. So let t be the largest index so that vi, V2, ... V, is anindependent set. It could be t = 1, but it cannot be that t = s. So I _<

t < s. Now vi, ... v,, v,+, is a dependent set by definition of how we choset so there exists scalars a, , a , ... , a, , a,+, , not all zero with a, v, + a2 V2 +

+ a,v, + a,+i v,+i = . Multiply by A and get a, X, v, + a2X2v2 ++ a,+IXf+,vt+, = Multiply the original dependency by X,+, and get

ai X,+i v, + ... + a,+, Xt+, v,+i = -6. Subtracting the two equations yieldsoil (XI-X,+i)vi+...+az(X,-X,+,)v, ='.Butv,,... ,v,isanindependentset, so all the coefficients must be zero. But the Xs are distinct so the only wayout is for all the as to be zero. But this contradicts the fact that not all the asare zero. This completes our indirect proof. 0

COROLLARY 10.1If A E C" x" has n distinct eigenvalues then A is diagonalizable with respect

to similarity.

/0.2 Diagonalization with Respect to Similarity 359

Matrices that are diagonalizable with respect to a similarity have a particularlynice form. The following is a more general version of the spectral theorem.

THEOREM 10.5Suppose A E C01 "1 and h 1, A2, ... , hk are its distinct eigenvalues. Then A isdiagonalizable with respect to a similarity iff there exist unique idempotentsE1, E2, ... , Ek, such that

1. E, E j= 0 whenever i 96 j

k

2. EEi=1,.

k

3. A= > X1 E1.1=1

Moreover, the E are expressible as polynomials in A.

PROOF Suppose first such idempotents E; exist satisfying (1), (2), and (3).Then we know from earlier work that these idempotents effect a direct sumdecomposition of C", namely, C" = Col(EI) ® Col(E2) ® . . . ® Col(Ek).Let ri = dim(Col(Ei)). Choose a basis of each summand and union theseto get a basis of the whole space. Then create an invertible matrix S by

k k

arranging the basis vectors as columns. Then AS = (>XiEi)S = > X1E;S =i=1 i=1

S

Xk 1ri

. Thus A is diagonalizable with respect to

a similarity. Conversely, suppose AS = SX21rz

L Xkl,. JDefine Ej = [®... ®/3i®... ®]S-1, where Bi is a collection of column vec-tors from a basis of Col(E,). Then the Ej satisfy (I), (2), and (3), as weask the reader to verify. For uniqueness, suppose idempotents Fi satisfy (1),(2), and (3). Then E;A = AE, = X E; and FjA = AF/ = XjFj. ThusE(AF) = kj E1 Fj and (Ei A) Fj = hi Ej Fj, which imply Ej Fj = 0 wheni # j. Now E, = E; E Fj = E,Fi = (E Ej)F1 = F1. Finally, consider the

360 Matrix Diagonalization Pi(A)IComputepolynomial p;(x) = n(x - Xi). Note pi(,\i) 54 0. Let F;

i#'that F; Ei = E; if i = j and 0 if i 0 j. Then F; = F; (E Ei) - E,. 0

The E, s above are called the principal idempotents of A. For example, theyallow us to define functions of a matrix that is diagonalizable with respect to asimilarity; namely, given f and A, define f (A) = E f (k; )E; .

The minimum polynomial of a matrix has something to say about diagonal-izability. Let's recall some details. Let A E C""". Then 1, A, A2.... , A"2 isa list of n2 + 1 matrices (vectors) in the n2 dimensional vector space C""",These then must be linearly dependent. Thus, there is a uniquely determinedinteger s < n2 such that 1, A, A2, ... , A''-' are linearly independent but1, A, A2.. , A'-', A' are linearly dependent. Therefore, A' is a linear combi-nation of the "vectors" that precede A' in this list, say A' _ (3o1 + [3, A +... +R,._, A". The minimum polynomial of A is p A W = x' - R, _, x , (3o

in C[x].

THEOREM 10.6Let A E Cnx". Then

I. pA(x)00inC[x]yetpA(A)=0inC`2.

µA is the unique monic polynomial (i.e., leading coefficient I) of leastdegree that annihilates A.

3. If p(x) E C[x] and p(A) = 0, then p.A(x) divides p(x).

PROOF

I. The proof fellows from the remarks preceding the theorem.

2. The proof is pretty clear since if p(x) were another monic polynomialwith deg(p(x)) = s and p(A) = 0, then p.A(x) - p(x) is a polynomialof degree less than s, which still annihilates A, a contradiction.

3. Suppose p(x) E C[xJ and p(A) = 0. Apply the division algorithm inC[x] and write p(x) = p. (x)q(x) + r(x) where r(x) = 0 or degr(x) <s = deg p A(x). Then r(A) = p(A) - ILA(A)q(A) = 0. This contradictsthe definition of µA unless r(x) = 0. Thus, p(x) = p A(x)q(x), as was tohe proved. 0

Let's look at some small examples.

10.2 Diagonulization with Respect to Similarity 361

Example 10.2

1. Let A = [ a d J. In theory, we should compute 11, A, A2, A3, A') and

look for a dependency. But A'- = a2 + be ab + bdca + cd cb + d ]

f a2+be b(a+d2 so A2-(a+d)A= f be - ad 0c(a + d) cb + d J L 0 be - ad

so A2 - tr(A)A + det(A)l = 0. Thus, the minimum polynomial of A,RA(x)divides p(x) = x2-tr(A)x+det(A). Hence either pA(x) = x-kor JA(x) = x2 - tr(A)x + det(A). In the first instance, A must be ascalar multiple of the identity matrix. So, for 2-by-2 matrices, the situ-

ation is pretty well nailed down: A = L 0 51 has minimal polyno-

mial I A(x) = x - 5, while A =L

55 J

has minimal polynomial

RA(x) = x2 - lOx + 25 = (x - 5)2. Meanwhile, A = f . 2J

has3 4

minimum polynomial VA (x) = x2 - 5x - 2. L

a b c2. The 3-by-3 case is a bit more intense. Let A = d e f . Then

g h jA2=[a2+bd+cg

a(a2 + bd + cg)+

A3= d(a2+bd+cg)+ soA3-tr(A)A2+(ea+ja+je-g(a2 + bd + cg)+

db - cq - f h)A - det(A)I = 0. Thus, the minimum polynomial µA isof degree 3, 2, or I and divides (or equals)

x3-tr(A)x2-(ea+ja+je-db-cg- fh)x - det(A) = 0.

Well, that is about as far as brute force can take us. Now you see why we havetheory. Let's look at another approach to computing the minimum polynomialof a matrix.

Let A E C". For any v E C", we can construct the list of vectorsv, Av, A2v, .... As before, there is a least nonnegative integer d withv, Av,... , Adv linearly dependent. Evidently, d < n and d = 0 iff v = -6 andd = 1 iff v is an eigenvector of A. Say Adv = Rov + 13, Av + + 13d_, Ad-IV.

Then define the minimal polynomial of A relative to v as µA.,,(x) = xd -Rd_,xd-I - - (3,x - 130. Clearly, V E JVull(µA,(A)) so VA.-, is theUnique monic polynomial p of least degree so that v E Null(p(A)). Now,


if PA.v,(x) and µA.V,(x) both divide a polynomial p(x), then v, and v2 belongto Null(p(A)). So if you are lucky enough to have a basis VI, v2, , v anda polynomial p(x) divided by each µA,,,(x), then the basis (v,, ... , v,,)Null(p(A)) soAIull(p(A)) = C" whence p(A) = 0.

THEOREM 10.7Let A E C""" and let (b,, b2, ... , be a basis of C". Then µA(x)

LCM(µA.b,(x), ... , That is, the minimum polynomial of A is theleast common multiple of the minimum polynomials of A relative to a basis.

PROOF Let p(x) = LCM(µA.b,(x), µA.b,(x), , Then p(A)0 as we noted above. Thus AA(x) divides p(A). Conversely, for each j, 1 <j < n, apply the division algorithm and get µA(x) = yi(x)µA.b,(x) + ri(x),where ri(x) = 0 or degri(x) < deg(µA.bjx)). Then = µA(A)bi =9i(A)(.A.b,(A)bi) + ri(A)bi = rj (A)bj.

By minimality of the degree of AA.b, we must have rj(x) = 0. This saysevery µA.b, divides p.A(x). Thus the LCM of the µA.b,(x) divides AA(x) butthat is p(x). Therefore, p(x) = AA(x) and our proof is done. 0

-I -l 2

For example, let A = - I 0 1 . The standard basis is probably0 -1 1

-1 -1 2 l -1the easiest one to think of, so Ae, -1 0 I 0 = - I

0 -I 1 0 0

-1 -I 2 -I 2 -1 -1 2

A22e, _ -1 0 I -l = I , A3e1 = -1 0 1

0 -1 1 0 I 0 -1 1

-1 I = Ae,, so (A3 -0

A)e, = -0 . Thus l.A.ei (x) = x3 -X =

-1 -I 2 0 -1x(x - I)(x + 1). Ae2 = -1 0 1 I = 0 , A2e2 =

0 -1 1 0 -1-1 -1 2 -1 -1

-1 0 1 0 = 0 = Ae2, so p.A,,(x) = x2 - x =0 -1 1 -I -I

-1 -1 2 0 2

x(x - 1). Finally, Ae3 = - I 0 1 0 = I , A2e3 =0 -I I I I

I


-I -1 2 2 -I -1 -1 2 -I-1 0 1 1 = -1 , A3e3 = -1 0 1 -I =

0 -1 1 1 0 0 -1 1 02

1 Ae3, SO µA,,,,(x) = x3 - x = x(x - 1)(x + 1) and pA(x)1

LCM(x3 - x, x2 - x) = x(x - l)(x + I) = x3 - x. Did you notice wenever actually computed any powers of A in this example?

The next theorem makes an important connection with eigenvalues of a matrixand its minimum polynomial.

THEOREM 10.8Let A E CI ". Then the eigenvalues of A are exactly the roots of the minimumpolynomial VA(X)-

PROOF Suppose k is a root of p. (x). Then ILA(x) = (x - k)q(x) wheredegq < degp.A But 0 = (A - k/)q(A) and q(A) 0. Thus there exists avector v, necessarily nonzero, such that q(A)v # 6. Let w = q(A)v. Then6 = (A - k/)w so k is an eigenvalue of A.

Conversely, suppose k is an eigenvalue of A with eigenvector v. Then A2v =AKv = KAv = k2v. Generally, Aiv = kiv. So if µA(x) = x` - a,_,x'-' -...-aix+aothen = µA(A)v= (A' -a,-,A'-'>\'v - a,-i k`-'v - ... akv - aov = ()\' - 0t,-,1\'- I - ao)v = WA(k)V.Buty# V soILA(X)=0. 0

THEOREM 10.9Let A E Cn"n with distinct eigenvalues X1, k2, ....,. Then the following areequivalent:

1. A is diagonalizable with respect to similarity.

2. There is a basis of C"consisting of eigenvectors of A.

3. ILA(x)=(x-ki)(z-K2)...(x-K,).

4. GCD(µA, A) = I

5. µA and µA do not have a common root.

PROOF We already have (I) is equivalent to (2) by (10.3). (4) and (5)are equivalent by general polynomial theory. Suppose (2). Then there is a ba-sis b,, b2, ... , b of cigenvectors of A. Then RA(x) = LCM(RA,b, (x), ... ,

But ILA.b,(x) = x - Ro since the b; are eigenvectors and (3o is someeigenvalue. Thus (3) follows. Conversely, if we have (3), consider b, , b2, , b,


eigenvectors that belong to X1, X2, ... , X respectively. Then (b1, b2, , b,}is an independent set. If it spans C", we have a basis of eigenvectors and (2)follows. If not, extend this set to a basis, {b1, , b w,+,, , W11). If, byluck, all the ws are eigenvectors of A, (2) follows again. So suppose some wis not an eigenvector. Then µA,W, (x) has degree at least 2, a contradiction. Q

Now we come to a general result about all square matrices over C.

THEOREM 10.10 (primary decomposition theorem)Let A E C"". Suppose X, , X2, ... , X, are the distinct eigenvalues of A. Let

PA(x) = (x - X,)e,(x - X2 )e2 ... (x - X,)ey. Then

C" = JVull((A - X,1)ej) ®Null((A - X2 1)e-) ® . ®JVull((A - X, 1)e')

Moreover, each Nu ll ((A _X, I)e,) is A-invariant. Also if XA (x) = (x - X 1)d' (x -X2)d2 . . (x-X,.)d,, then dim(JVull((A-X11p)) = d;. Moreover, an invertible

A2matrix S exists with S-1 AS = where each A; is

d1-by-di.

PROOF Let qi(x) = (x - X1)ei ...(x - Xi_,)e,-'(x - X,+,)e,+ ...(x -X,)e'. In other words, delete the ith factor in the minimum polynomial. Thendeg(gi(x)) < deg(p.A(x)) and the polynomials q,(x), q2(x), ... , qs(x) are rel-atively prime (i.e., they have no common prime factors). Thus, there existpolynomials p1(x), p2(x), ... , p,(x) with I = gl(x)p,(x) + g2(x)p2(x) +...+q,(x)ps(x). But then 1 = qi(A)pi(A)+g2(A)p2(A)+...+q,(A)p,(A).

Let hi(x) = gi(x)pi(x) and let Ei = hi(A) i = 1,... , s. Already we seeI = E,+E2+ +E,.Moreover,fori # j,EiEj =q,(A)pi(A)gi(A)pj(A)_AA(A)times (something)=O. Next, Ei = Ei(1) = Ei(E,+ +E,) = E,Ei,so each E, is idempotent. Therefore, already we have C" = Col(E,)+Col (E2 +

+Col(E,). Could any of these column spaces be zero? Say Col(E1) = ( ).

Then C" = Col(Ej) + + Col(Ef_1) + Col(E1t1) + + Col(E,). Nowgi(A)EJ = g1(A)g1(A)pj(A) = 0 for j # i so, since every vector in C" is asum of vectors from the column spaces Col (E, ), ... , Col (E,) without Col (Ei ),it follows qi(A) = 0. But this contradicts the fact that µA(x) is the minimumpolynomial of A so Col(Ei) # (0) for i = 1, , s. Thus, we have a directsum C" = Col(E1)® . . ®Col(E,). To finish we need to identify these columnspaces. First, (A - X11)', Ei = (A - X;1)e,g1(A)p,(A) = IiA(A)p1(A) = 0so Col(E,) C Null(A - X/ 1)e,. Next, take v E Mull(A - Xi1)ei and write


v = E,v, + E2v2 + + Esv, where Ejvj E Col(Ej), j = 1, ... , s.

Then -6 = (A - X,I)ejV = (A - X,1)e'E1vI + ... + (A - XiI)e'E,V. =E,(A -Xi1)e'v, +...+E,(A-X,1)e'v,.

Note the Es are polynomials in A, so we get to commute them with (A -X,I)ej, which is also a polynomial in A. Since the sum is direct we have eachpiece must be zero - in other words, (A - Xi 1)e' E jvj = . Thus for j # i,(A-X,I)ejEjvj =(A-XjI)eJEjvj = '. Now GCD((x-X,)e',(x-Xj)ei)=I so there exist polynomials a(x) and b(x) with l = a(x)(x - y, )e, + b(x)(x -Xj)e'.Therefore, Ejvj = IEjvj =(a(A)(A-X,I)ej+b(A)(A-Xj)eJ)Ejvj =a(A)(A - X,1)ej Ejvj + b(A)(A - X j)e%Ejvj = 6 + -6 = -6. Thus v =Eivi +... + E,v, = Eivi, putting v in the column space of E,. Now, by takinga basis of each Null ((A - X, I )ej ), we construct an invertible matrix S so that

S-BAS =A2

Let AA; (x) be the minimum polynomial of A1. We know that PA(x) = lps-' AS(x)LCM(p.A,(x),... , ILAs(x)) and XA(x) = Xs-'AS(x) = XA,(x)XA2(x)

XA, (x). Now (A - X,1)ej = 0 on Null [(A - X, I )e' ] so µA; (x) divides(x - X, )ej and so the PA, (x)'s are relatively prime. This means that ILA (x) =LA, (x) . Pri(x) = (x - X,)e'(x - X2)e2 (x - X,)e,. Thus, for each i,

IiA,(x) = (x - X,)ei. Now XA,(x) must equal (x - X,)'', where ri > e,. But(x-X,)'1(x-X2)'2.. .(x-X,)' =XA(x)=(x-XI)" (x-X2)d2...(x-X,.)J.By the uniqueness of the prime decomposition of polynomials, ri = di for alli. That wraps up the theorem. 0

This theorem has some rather nice consequences. With notation as above, letD = X, E, + X2 E2 + ... + X, E, . We claim D is diagonalizable with respectto similarity. Now, since I = E, + E2 + ... + E any v E C" can be writtenas v = v, + v2 + ... + vs, V; E Col(E1) = Fix(Ei). But Dv, = DE,v1 =(X, El + + X, Es) E; v; = X1 E?vi = Xi v; . Thus every nonzero vector in thecolumn space of E; is an eigenvector and every vector is a sum of these. Sothe eigenvectors span and hence a basis of eigenvectors of D can be selected.Moreover, A = AE, + A E2 + + A Es, so if we take N = A - D =

we seeX,I)2E N3 = (A - X,1)3E, + + (A - X51)3Es, and so on. Eventually,since (A - X, 1)1, E1 = 0, we get N' = 0, where r > all e1 s. That is, N isnilpotent. Note that both D and N are polynomials in A, so they commute (i.e.,DN = N D). Thus we have the next theorem that holds for any matrix over C.


COROLLARY 10.2 (Jordan decomposition)Let A E C""". Then A= D+ N, where D is diagonalizable and N is nilpotent.Then there exists S invertible, with S-1 AS = D + N, where b is a diagonalmatrix and N is nilpotent.

PROOF The details are left to the reader. 0

COROLLARY 10.3Let A E Cn'". The D and the N of (10.2) commute and are unique. Indeed,each of them is a polynomial in A.

PROOF Now D =XI El +,\.)E,)+. .+,\,E, and all the Ei s are polynomialsin A and so N = A - D is a polynomial in A. Suppose A = D, + N,,where D, is diagonalizable with respect to similarity, and N, is nilpotent andDIN, = N, D, . We shall argue D = DI, N = N, .Now D+ N = A = D, + N,so D - D, = N, - N. Now D and D, commute and so are simultaneouslydiagonalizable; hence D- D, is diagonalizable. Also N and /N, commute and so

N, - N is nilpotent. To see this, look at (N, - N)' s ) N, -1 Ni by

the binomial expansion, which you remember works for commuting matrices.By taking an s large enough - say, s = 2n or more - we annihilate theright-hand side. Now this means D - D, is a diagonalizable nilpotent matrix.Then S-'(D - D,)S is a diagonal nilpotent matrix. But this must he the zeromatrix. 0

We could diagonalize all matrices in C""I if it were not for that nasty nilpotentmatrix that gets in the way. We shall have more to say about this representationlater when we discuss the Jordan canonical tbrm. It is of interest to know if twomatrices can be brought to diagonal form by the same invertible matrix. Thenext theorem gives necessary and sufficient conditions.

THEOREM 10.11Let A, B E C" "" be diagonalizable with respect to similarity. Then there existsS invertible that diagonalizes both A and B if and only if A B = BA.

PROOF Suppose S exists with S-' AS = D, and S-' BS = D2. Then A =SD, S-' and B = SD2S-'. Then AB = SD, S-'SD2S-' = SD, D2S-' =SD2D,S-I = SD2S-'SD,S-1 = BA using the fact that diagonal matricescommute. Conversely, suppose AB = BA. Since A is diagonalizable, its mini-mal polynomial is of the form p (x) = (x - X,)(x - A2) ... (x - X,), where the


X, s are the distinct eigenvalues of A. By the primary decomposition theorem,C" = Null(A-X1I)® .®NUII(A-X,1). Now for any vi inArull(A-X11),ABv1 = BAv1 = B(X1v1) = X,Bv1. This says each NUIl(A - X11) is B-invariant. We can therefore find a basis ofNull(A - 111), say B,, consisting

of eigenvectors of B. Then U 131 is a basis of C" consisting of eigenvectors of1_i

both A and of B. Forming this basis into a matrix S creates an invertible matrixthat diagonalizes both A and B. 0

It follows from the primary decomposition theorem that any square complexmatrix is similar to a block diagonal matrix. Actually, we can do even better.For this, we first refine our notion of a nilpotent matrix, relativising this idea toa subspace.

DEFINITION 10.3 (Nilpotent on V) Let V be a subspace of C" and letA E C"". We say that A is nilpotent on V iff there exists a positive integer ksuch that Ak v = for all v in V. The least such k is called the index of A onV, Indv(A).

Of course, Akv = -6 for all v in V is equivalent to saying V c_ Null(A").Thus, Indv(A) = q iff V C_ Null(A9) but V 9 Null(Ay-1). For example,consider

C4 = V ® W = span lei, e2) El) span Je3, e4)

0 1 0 0

and let A =00 00

100

Note that1

0 0 0 1

Aei =Ae7 =Ae3 =Ae4 =

e1

e3

e4

0 0 0 0

and A2 =0 0 0 0

soAisnil }otent ofindex 2onV =s an le e0 0 1

0 .p p i, 2

0 0 0 1

Now we come to the crucial theorem to take the next step.

THEOREM 10.12Suppose C' = V ® W, where V and W are A-invariant for A E C""" andsuppose dim(V) = r. If A is nilpotent of index q on V, then there exists a basis


of V, {vi,v2, ... ,v,} such that

Av1 = VAv2 E span{vi )

Av3 E spun{vi, v2}

Av,_1 E span{vi, V2, ... , Vr_2}

AVr E span{vi, V2, ... , Vr_1).

PROOF The case when A is the zero matrix is trivial and left to the reader. Bydefinition of q, V Null(Ay-' ), so there exists v 0- 16 such that A"-'v ¢ V.Let vi = Ay-lv -A-6 but Avi = -6. Now if span(vj) = V, we are done.If not, span(vi} V. Look at Col(A) fl V. If Col(A) fl V _c span{vi),choose any v2 in V\span{vi}. Then {v1,v2} is independent and Av, = 6,Av2 E Col(A)fl V e span{vi ). IfCol(A)fl V span{vi }, then we first noticethat Col(Ay) fl V = (-'0 ). The proof goes like this. Suppose X E Col(Ay) fl V.Then X E V and x = Ayy for some y. But y = v + w for some v E V, W E W.Thus x = Ay(v)+ Aq(w) = Aq(w) E V fl W = (t). Therefore x = 6. Nowwe can consider a chain

Col(A) fl V D Col(A2) fl V D ... D Col(Ay) fl V = (d).

Therefore, there must be a first positive integer j such that Col(AJ) fl vspan(vi) but Col(Aill) fl V e span(vi). Choose V2 E (Col(A') fl V)\span(vi ). Then in this case, we have achieved Avi = ' and Av2 E spun{vi ).Continue in this manner until a basis of V of the desired type is obtained. 0

This theorem has some rather nice consequences.

COROLLARY 10.4Suppose N is an n-by-n nilpotent matrix. There is an invertible matrix S suchthat S-' NS is strictly upper triangular. That is,

0 a12 a13 a i,,

S-'NS =0 0 a23 a2

0 0 0 ... an-In0 0 0 ... 0

We can now improve on the primary decompostion theorem. Look at thematrix A, - ki/d,. Clearly this matrix is nilpotent on Null(A - k, 1)r., so wecan get a basis of Null(A - lei1)ei and make an invertible matrix Si with S,

/0.2 Diagonalization with Respect to Similarity 369

0 a12 a13 ... aid,

10 0 a23 ... a2d,

(Ai - ki /d, )Si = .But then Si-i (Ai - hi ld, )Si =

0 0 0 ad,-Id,0 0 0 ... 0

0 a12 a13 ... aid,

0 0 a23 ...a2d

Si 1AiSi - kild, _0 00 0

ki a12 a13 aid,

0 ki a23 a2d,

0

0ad,-Id,

0

so S, I A; S; =

. By piecing together a basis for each0 0 0 . . ad,-td,

0 0 0 ... k;primary component in this way, we get the following theorem.

THEOREM 10.13 (triangularization theorem)Let A E C"". Suppose X 1 , k2, ... , Xs are the distinct eigenvalues of A. LetlLA(x) _ (x - X 0" (x - k2)02 ... (x - XS)e . Also let XA(x) _ (x - kl )dl (x -X2)d2 ... (x - k,)dr. Then there exists an invertible matrix S such that S-I ASis block diagonal with upper triangular blocks. Moreover, the ith block has theeigenvalue ki on its diagonal. In particular this says that A is similar to anupper triangular matrix whose diagonal elements consist of the eigenvalues ofA repeated according to their algebraic multiplicity.

It turns out, we can make an even stronger conclusion, and that is the subjectof the next section.

Exercise Set 43

1. Argue that similarity is an equivalence relation on n-by-n matrices.

2. Prove if A is similar to 0, then A = 0.

3. Prove that if A is similar to P and P is idempotent, then A is idempotent.Does the same result hold if "idempotent" is replaced by "nilpotent"?

4. Suppose A has principal idempotents Ei. Argue that a matrix B commuteswith A iff it commutes with every one of the Ei.


5. Suppose A has principal idempotents E,. Argue that Col(E;) -Eig(A, X,) and Null(E,) is the direct sum of the eigenspaces of A notassociated with X j.

6. Suppose A B and A is invertible. Argue that B must also be invertibleand A-' -B-'.

7. Suppose A B. Argue that AT BT.

8. Suppose A B. Argue that Ak - Bk for all positive integers k. Alsoargue then that p(A) ^- p(B) for any polynomial p(x).

9. Suppose A B. Argue that det(A) = det(B).

10. Suppose A B and X E C. Argue that (A - XI) - (B - It 1) and sodet(A - Al) = det(B - Al).

11. Find matrices A and B with tr(A) = tr(B), det(A) = det(B), andrank(A) = rank(B), but A is not similar to B.

12. Suppose A B. Suppose A is nilpotent of index q. Argue that B isnilpotent with index q.

13. Suppose A - B and A is idempotent. Argue that B is idempotent.

14. Suppose A is diagonalizable with respect to similarity and B A. Arguethat B is also diagonalizable with respect to a similarity.

15. Give an example of two 2-by-2 matrices that have identical eigenvaluesbut are not similar.

16. Argue that the intersection of any family of A-invariant subspaces ofa matrix A is again A-invariant. Why is there a smallest A-invariantsubspace containing any given set of vectors?

17. Let A E C' and v # 0 in C. Let µA.v(x) = xJ - [3,t-ixd-iRix - Ro(a) Prove that {v,.AV, A`t-Iv} is a linearly independent set.(b) Let Kd(A, v) = span{v, AV, ... , Ad-'v}. Argue that Kd(A, v) is

the smallest A-invariant subspace of C' that contains v. Moreover,dim(Kj(A, v)) = deg(p.A,,.(x))

(c) Prove that Kd(A, v) = {p(A)v I P E C[x]}.0 -1 1

(d) Let A = I 0 I . Compute K,I(A, v), where v = el.0 0 2

10.3 Diugonalization with Respect to a Unitary 371

(e) Extend the independent set (v, AV, ... Ad-1v) to a basis of C",say{v, AV, ... , Ad-I v, wI , .. wd_ }. Form the invertible matrix S =[v I Av I ... I Ad-Iv I W, I I wd_"]. Argue t hat S-'AS =

0 0 0 ... [30

1 0 0 ... PI

r C ? 0 1 0 ... 02

L 0 ?where C =

L 0 0 0 ... Rd-I J

(t) Suppose µA.V,(x) = (x - \)d. Argue that v, (A - XI)v,... JA -kI)d-Iv is a basis for lCd(A, v).

18. Suppose A E C" "". Prove that the following statements are all equivalent:

(a) There is an invertible matrix S such that S-1 AS is upper triangular.(b) There is a basis {vi, ... , v"} of C" such that Avk E span{vi, ... ,

vk}forallk = 1,2,... n.(c) There is a basis (vi, ... , v"} of C" such that span{vI, ... , vk} is

A-invariant for all k = 1, 2, ... , n.

Further Reading

[Abate, 1997] Marco Abate, When is a Linear Operator Diagonal izable?,The American Mathematical Monthly, Vol. 104, No. 9, November, (1997),824-830.

[B&R, 2002] T. S. Blyth and E. F. Robertson, Further Linear Algebra,Springer, New York, (2002).

unitarily equivalent, Schur's lemma, Schur decomposition

10.3 Diagonalization with Respect to a Unitary

In this section, we demand even more. We want to diagonalize a matrix usinga unitary matrix, not just an invertible one. For one thing, that relieves us ofhaving to invert a matrix.


DEFINITION 10.4 Two matrices A, BE C""" are called unitarily similar(or unitarily equivalent) iff there exists a unitary matrix U, (U* = U-') suchthat U-I AU = B. A matrix A is diagonalizable with respect to a unitary iff Ais unitarily similar to a diagonal matrix.

We begin with a rather fundamental result about complex matrices that goesback to the Russian mathematician Issai Schur (10 January 1875-10 January1941).

THEOREM 10.14 (Schur's lemma, Math. Annalen, 66,(1909),408-510)Let A E C""". Then there is a unitary matrix U such that U*AU is uppertriangular. That is, A is unitarily equivalent to an upper triangular matrixT. Moreover, the eigenvalues of A (possibly repeated) comprise the diagonalelements of T.

PROOF The proof is by induction on the size of the matrix n. The result isclearly true for n = 1. (Do you notice that all books seem to say that when doinginduction?) Assume the theorem is true for matrices of size k-by-k. Our job isto show the theorem is true for matrices of size (k + I)-by-(k + 1). So supposeA is a matrix of this size. Since we are working over the complex numbers,we know A has an eigenvalue, say X, and an eigenvector of unit length w, be-longing to it. Extend w, to an orthonormal basis of Ck+i (using Gram-Schmidt,for example). Say (WI, w2, ... , wk+, } is an orthonormal basis of Ck+i . Formthe matrix W with columns w, , w2, ... , wk+,.Then W is unitary and W* A W =

wi -W* [Awl, I Awe, I ... l Awk+,] = - wZ - [X,w, I Awe 1 ... I

wk+,A, * * ...0

Awk+, ] = 0 C , where C is k-by-k. Now the induction

L0 Jhypothesis provides a k-by-k unitary matrix V, such that V1*CV, = T, is upper

1 0 ... 00

triangular. Let V = and note that V isV,

0

10.3 Diagonalization with Respect to a Unitary 373

unitary. Let U = WV. Then V*W*AWV =

f x1* * ... *

00 TI

L0hoped. 0

DEFINITION 10.5 (Schur decomposition)The theorem above shows that every A E C"X" can be written as A = UT U*

where T is upper triangular. This is called a Schur decomposition of A.

THEOREM 10.15Let A E C""" Then the following are equivalent:

1. A is diagonalizable with respect to a unitary.

2. There is an orthonormal basis of C" consisting of eigenvectors of A.

3. A is normal.

PROOF Suppose (1). Then there exists a unitary matrix U with U-' AU =U*AU = D, adiagonal matrix. But then AA* = UDU*UD*U* = UDD*U*= UD*DU* = (UD*U*)(UDU*) = A*A so A has to be normal. Thus (1)implies (3).

Next suppose A is normal. Write A in a Schur decomposition A = UT U*,where T is upper triangular. Then A* = UT*U* and so TT* = U*AUU*A*U= U*AA*U = U*A*AU = U*A*UU*AU = T*T. This says T is normal.But if T is normal and upper triangular it must be diagonal. To see this compute

... tin ill 0 ... 0

t22 ... t2n 712 0

00 . . . tnn tin tnn

00 VI*CVI0

which is an upper triangular matrix as we

Matrix Diugonalizution

. Now compare the diagonal entries:

ItiII' + Ith212 + It1312 +... + Ihn12 = (l, 1) entry = It, 112

It22122 + It2312 + ... I ten 12 = (2, 2) entry = Itt212 + 12212 + It2212

Itnnl2 = (n, n) entry = ItinI2 +t2/, + ... + I tnnl2

W e see that I tj j I2 = 0 whenever i # j. Thus (3) implies (1). If (1) holdsthere exists a unitary U such that U-' AU = D, a diagonal matrix. ThenAU = UD and the usual argument gives the columns of U as eigenvectors ofA. Thus the columns of U yield an orthonormal basis of C" consisting entirelyof eigenvectors of A.

Conversely, if such a basis exists, form a matrix U with these vectors ascolumns. Then U is unitary and diagonalizes A. 0

COROLLARY 10.5Let A E C""n

1. If A is Hermitian, A is diagonalizable with respect to a unitary.

2. If A is skew Hermitian (A* _ -A), then A is diagonalizable with respectto a unitary.

3. If A is unitary, then A is diagonalizable with respect to a unitary.

It is sometimes of interest to know when a family of matrices over C canbe simultaneously triangularized by a single invertible matrix. We state thefollowing theorem without proof.

THEOREM 10.16Suppose .T' is a family of n-by-n matrices over C that commute pairwise. Thenthere exists an invertible matrix S in C1 " such that S-' A S is upper triangularfor every A in F.

10.3 Diagonalization with Respect to a Unitary

Exercise Set 44f

matrix?2 a unitari -i zi2 2

y

375

2. Argue that U is a unitary matrix iff it transforms an orthonormal basis toan orthonormal basis.

1 1 1 1

1 1 1 1

3. Find a unitary matrix U that diagonalizes A =1

1

1

1

1

l

1

1

}4. Argue that unitary similarity is an equivalence relation.

5. Prove that the unitary matrices form a subgroup of GL(n,C), the groupof invertible n-by-n matrices.

6. Argue that the eigenvalues of a unitary matrix must be complex numbersof magnitude 1 - that is, numbers of the form eie.

7. Prove that a normal matrix is unitary iff all its eigenvalues are of magni-tude 1.

8. Argue that the diagonal entries of an Hermitian matrix must be real.

9. Let A = A* and suppose the eigenvalues of A are lined up as X1 > X2 >> X,,, with corresponding orthonormal eigenvectors ui,u2, ... ,u,,.

For v A 0. Define p(v) =(Av V)

(v I v). Argue that X < p(v) < X1. Even

more, argue that maxp(v) = X, and minp(v) _ X .vOO V96o

10. Prove the claims of Corollary 10.5.

11. Argue that the eigenvalues of a skew-Hermitian matrix must be pureimaginary.

12. Suppose A is 2-by-2 and U*AU = T = rt2 J

be a Schur decom-

position. Must ti and t2 be eigenvalues of A?

13. Use Schur triangularization to prove that the determinant of a matrix isthe product of its eigenvalues and the trace is the sum.

14. Use Schur triangularization to prove the Cayley-Hamilton theorem forcomplex matrices.


Further Reading

[B&H 1983] A. Bjorck and S. Hammarling, A Schur Method for theSquare Root of a Matrix, Linear Algebra and Applications, 52/53, (1983),127-140.

[Schur, 1909] I. Schur, Uber die characteristischen Wurzeln einer linearenSubstitution mit einer Anwendung auf die Theorie der Integralgleichun-gen, Math. Ann., 66, (1909), 488-5 10.

10.3.1 MATLAB Moment

10.3.1.1 Schur Triangularization

The Schur decomposition of a matrix A is produced in MATLAB as follows:

[Q, T] = schur(A)

Here Q is unitary and T is upper triangular with A = QTQ*. For example,

>> B = [1 +i2+2i3+i;2+2i4+4i9i;3+3i6+6181]B=

1.0000 + 1.0000i2.0000 + 2.0000i3.0000 + 3.0000i

2.0000 + 2.0000i4.0000 + 4.0000i6.0000 + 6.0000i

3.0000 + 1.0000i0 + 9.0000i0 + 8.0000i

> > [U,T] = schur(B)

U=0.2447 - .01273i 0.7874 + 0.4435i -0.0787 + 0.317910.6197 + 0.15251 -0.2712 - 0.3003i -0.1573 + 0.6358i0.7125 + 0.09501 0.1188 - 0.07401 0.1723 - 0.65881T=

6.3210 + 14.1551 i 2.7038 + 1.3437i 1.6875 + 5.4961i0 0.0000 - 0.00001 -3.0701 - 0.0604i0 0 -1.3210- 1.1551i

> > format rat

> > B = [1+i2+2i3+i;2+2i4+4i9i;3+316+6i8i]

10.4 The Singular Value Decomposition

B=1+1i 2+2i 3+1i2+2i 4+4i 0+9i3+3i 6+6i 0+8i

U=185/756 - 1147/9010i1328/2143 + 329/21571280/393 + 130/1369i

T=6833/1081 + 4473/3161

00

377

663/842 + 149/3361-739/2725 - 2071/6896i

113/951 - 677/91471

-227/2886 + 659/20731-227/1443 + 1318/2073i

339/1967 - 421/639i

2090/773 + 1079/803i

0

20866/12365 + 709/1291

-5477/1784 - 115/1903i-1428/1081 - 1311/11351

10.4 The Singular Value Decomposition

In the previous sections, we increased our demands on diagonalizing a matrix.In this section we will relax our demands and in some sense get a better result.The theorem we are after applies to all matrices, square or not. So supposeA E Cr X". We have already seen how to use a full rank factorization of A to

0 , where S is unitary and T is invertible. The naturalget SAT = 11Ir

question arises as to whether we can get T unitary as well.So let's try! Let's begin with an orthogonal full rank factorization A = FG

with F+ = F*. We will also need the factorizations I - FF+ = F1 F1 = Fi FjF+

G+D11A(I-G+G)W21=and I-G+G=F2F2 =F2F2.Then

Fi 111JJJ

F+D® ® ]. The matrix U* is unitary. At the moment, W2 is

F

arbitrary. Next, consider V = [G+D : (I - G+G)W21. We would like to make

V unitary and we can fiddle with D and W2 to make it happen.

378 Matrix Diugonalizution

But V*V = (G D)*G+D : (I - G+G)W2

[(1 - G+G)W2]* lJ L I

D*G+*G+D D*G+*(I - G+G)W2W'(1 - G+G)G+D W, (I - G+G)W,D*G+*G+D 0

®

We have already seen that by choosing W2 = F2 we can achieve I, _r inthe lower right. So the problem is, how can you get 1, in the upper left bychoice of D? By the way, in case you missed it, G+* _ (G+GG+)* _G+*(G+G)* = G+*G+G so that D*G+*(I - G+G)W2 = 0. Anyway ourproblem is to get D*G+*G+D = I,. But G = F+A = F*A so D*G+*G+D =D*(GG*)-'D = D*(F*AA*F)-1D since G+ = G*(GG*)-' and so G+*(GG*)-1G. So to achieve the identity, we need F*AA*F = DD*, where D isinvertible. Equivalently, we need AA* F = FDD*. Let's take stock so far.

THEOREM 10.17Let A E C111", A = FG be an orthogonal full rank factorization of A. If thereexists D E C;,,, with GG* = DD*, then there exist unitary matrices S and T

with SAT = D10 0

One way to exhibit such a matrix D for a given A is to choose for the columnsof F an orthonormal basis consisting of eigenvectors of AA* corresponding tononzero eigenvalues. We know the eigenvalues of AA* are nonnegative. ThenAA* F = FE where E is the r-by-r diagonal matrix of real positive eigenvaluesof AA*. Let D be the diagonal matrix of the positive square roots of theseeigenvalues Then D* [ F* AA* F] D = 1.

Thus we have the following theorem.

THEOREM 10.18 (singular value decomposition)Let A E C" A = FG where the columns of F are an orthonormal basisof Col(A) = Col(AA*) consisting of eigenvectors of AA* corresponding tononzero eigenvalues. Suppose I - FF+ = F1 Fi = F1 Fi and I - G+G =F2 F. = F2 F2+. Then there exist unitary matrices U and V with U * A V =

where Dr is an r-by-r diagonal matrix whose diagonal entries areDr

01,10the positive square roots of the nonzero eigenvalues of AA*. The matrices Uand

F* 1 r l1

Vcan be constructed explicitly from U* =F+ J

and V = I G+D : F2

Wz (I - G+G)W2 I'

10.4 The Singular Value Decomposition 379

It does not really matter if we use the eigenvalues of AA* or A*A, as the nexttheorem shows.

THEOREM 10.19Let A E Cmx". The eigenvalues of A*A and AA* differ only by the geometricmultiplicity of the zero eigenvalue, which is n - r for A* A and in - r for AA*,where r = r(A*A) = r(AA*).

PROOF Since A*A is self-adjoint, there is an orthonormal basis of eigen-vectors v1,v2,... ,v" corresponding to eigenvalues X1,12,... ,k (not nec-essarily distinct). Then (A*Av; I v1) = A; (v; I v1) =)\ibi1 for l < i, j <n. So (Av; I Avj) = (A*Av; I vj) = A;Sii. Thus (Av; I Av;) = X;, fori = 1, 2, ... , n. Thus Av; = ' iff X, = 0. Also AA*(Avi) = A(A*Av;) =h (Av;) so for hi 54 0, Av; is an eigenvector of AA*. Thus, if 1t is an eigenvalueof A* A, then it is an eigenvalue of AA*. A symmetric argument gives the otherimplication and we are done.

Let's look at an example next.

Example 10.31 0 1 I 3 -1

Consider A = 0 1 -1 0 . Then AA* _ -1 2

1 1 0 1 2 1

1

eigenvalues 5, 3, and 0 and corre sponding eigenvectors 0 ,

I

0

2

1 with3

-12 and1

-I s S-1 . With the help of Gram-Schmidt, we get F = 0 f . Then

_L _L1 fF*AA*F = [ 0

O ]soD=L

]andD*FF*AA*F1_1D=

0 0 l r 01/5

I

01/5

0 ' J- L1

The singular value decomposition really is quite remarkable. Not only isit interesting mathematically, but it has many important applications as well.It deserves to be more widely known. So let's take a closer look at it. Ourtheorem says that any matrix A can he written as A = U E V *, where U andV are unitary and E is a diagonal matrix with nonnegative diagonal entries.Such a decomposition of A is called a singular value decomposition (SVD)of A. Our proof of the existence rested heavily on the diagonalizability of


AA* and the fact that the eigenvalues of AA* are nonnegative. In a sense, wedid not have much choice in the matter since, if A = UEV*, then AA* =UEV*V E*U* = UE2U* so AA*U = UE2, which implies that the columnsof U are eigenvectors of AA* and the cigenvalues of AA* are the squares of thediagonal entries of E. Note here U while A * A = VE*U*UEV* =V E2V * says the columns of V, which is n-by-n, form an orthonormal basis forC". Now since permutation matrices are unitary, there is no loss of generalityis assuming the diagonal entries of E can be arranged to he in decreasing orderand since A and E have the same rank we have a1 >_ a2 ... > a, > 0 =ar+I = ar+2 = ar+3 = .. = an Some more language, the columns of Uare called left singular vectors and the columns of V are called right singularvectors of A.

So what is the geometry behind the SVD? We look at A E C"' III as tak-ing vectors from C" and changing them into vectors in C". We have A =

aI

[uI I U2 I ...1

a,

ar

00

viV2 or A [vi I V2 ...

L 0Vnl = [a1UI 1 a2 U2 arur ... } so Av1 = (7101, ... , AVr = UiUr,

AVr+1 = . .... , Av,,, _ 4. So, when we express a vector v E C" in terms ofthe orthonormal basis {v1, V2, ... , we see A contracts some componentsand dilates others depending on the magnitudes of the singular values. Then thechange in dimension causes A to discard components or append zeros. Notethe vectors {vr+I , ... , v" } provide an orthonormal bases for the null space ofA. So what a great pair of bases the SVD provides!

Cn

A, A

Au; = a;u1

A+E AT, A=AT

Cm

Figure 10.1: SVD and the fundamental subspaces.


Let's try to gain some insight as to the choice of these bases. If all you wantis a diagonal representation of A, select an orthonormal basis of Null(A),{vr+], ... , vn } and extend it to an orthonormal basis of Arull(A)1 = Col(A*)so that {vi, v2, ... , v } becomes an orthonormal basis of C'1.

Avi Av,Define Jul, ... , u,}

IIAvi II' , IlAvrll } ' Then extend with a basis of

Arull(A*) = Col(A)l. Then AV = A [v, I ... I v, I ... ] = [Avi I ... I Av,

IIAvl II

I l = [U2 I I ur I ] IIAv211 . But the problem is

the us do not have to be orthonormal! However, this is where the eigenvectorsof A*A come in to save the day. Take the eigenvalues of A*A written k, > X2 >

X, > kr+i = ... = kn = 0. Take a corresponding orthonormal basis ofAv,

eigenvectors v,, V2, ... , v" of A*A. Define u, _ , ... , Ur=77.

AThe

key point is that the us are orthonormal. To see this, compute (ui I uj) = 1kT, < vi I vj >= b,, . Now just extend the us to an orthonormal basis of C'.

Then A V = U E where the ki and zeros appear on the diagonal of E zerosappear everywhere else.

We note in passing that if A = A*, then A*A = A2 = AA*, so if k isan eigenvalue of A, then \2 is an eigenvalue of A*A = AA*. Thus the leftand right singular vectors for A are just the eigenvectors of A and the sin-gular values of A are the magnitudes of its eigenvalues. Thus, for positivesemidefinite matrices A, the SVD coincides with the spectral diagonalization.Thus for any matrix A, AA* has SVD the same as the spectral (unitary) di-agonalization. There is a connection between the SVD and various norms of amatrix.

THEOREM 10.20Let A E c"' with singular values a > Q2 > ... >_ Q > 0. Then IIA112 = Qtwhere II A II Av 112 and 11 A 112 = (Qi +QZ+' '+Q2012 If A is invertible

vn > 0 and IIA -I I 1/an. Moreover, Qk =

IIAv112min max

din,(M)=n-k+1 vEM\{Tf} 11v112

There is yet more information packed into the SVD. Write A = U E V* =

382

Q, 0

[U, I U2] a,. 0

, 0-

Matrix Diagonalization

(i) U, UJ = AA' = theprojection onto Col(A).(ii) U2UZ = 1 - AA+ = the pro-

Vi*Then

jection onto Col(A)1 = Jvull(A*).[ V* (iii) V, V1* = A+A = the pro-

2jectiononto Aiull(A)1 =Col(A*).(iv) V2V2 = 1 - A+A = theprojection onto Mull(A) -

Indeed, we can write A = U,

Q, 0

F, G =

L0 Orr

1/Q,

V,*. Setting U, =

V,* , we get a full rank factorization of A. In

fact, A+ = V, Ui . Even more, write A = [u, I . . I url

I/a,V,.

= , u, V +... +Qrur V* which is the outer prod-vVr

uct form of matrix multiplication, yielding A as a sum of rank I matrices.The applications of the S VD seem to be endless, especially when the matrices

involved are large. We certainly cannot mention them all here. Let's just mentiona few.

One example is in computing the eigenvalues of A*A. This is importantto statisticians in analyzing covariance matrices and doing something calledprincipal component analysis. Trying to compute those eigenvalues directlycan be a numerical catastrophe. If A has singular values in the range .001 to100 of magnitude, then A * A has eigenvalues that range from .000001 to 10000.So computing directly with A can have a significant advantage. With large datamatrices, measurement error can hide dependencies. An effective measure ofrank can be made by counting the number of singular values larger than the sizeof the measurement error.

The SVD can help solve the approximation problem. Consider Ax - b =UEV*x - b = U(EU*x) - U(U*b) = U(Ey - c), where y = V*x andc = U*b.


Also IIAx - bite = IIU(Ey - c)II = IIEy - cll since U is unitary. This sec-I cl/91

and minimization is easy. Indeed y = E+c + cr/9r so z = VE+U*b =000

A+b gives the minimum length solution as we already knew. There is a muchlonger story to tell, but we have exhausted our allotted space for the SVD.

Exercise Set 45

1. Argue that the rank of a matrix is equal to the number of nonzero singularvalues.

2. Prove that matrices A and B are unitarily equivalent iff they have thesame singular values.

3. Argue that U is unitary iff all its singular values are equal to one.

4. For matrices A and B, argue that A and Bare unitarily equivalent iff A*Aand B*B are similar.

5. Suppose A = U E V* is a singular value decomposition of A. Argue thatIIAIIF = IIEV*IIF = IIE*IIF = (19112 + 19212 +... + 19ni2)Z.

6. Compare the SVDs of A with the SVDs of A*.

7. Suppose A = UEV* is an SVD of A. Argue that IIA112 = 1911

8. Suppose A = QR is a QR factorization of A. Argue that A and R havethe same singular values.

9. Argue that the magnitude of the determinant of a square matrix A is theproduct of its singular values.

10. Argue that if a matrix is normal, its singular values are just the magnitudesof its eigenvalues.

11. Argue that the singular values of a square matrix are invariant underunitary transformations. That is, the singular values of A, AU, and UAare the same if U is unitary.


12. Suppose A = U E V * is a singular value decomposition of A. Prove thatx = V E+U*b minimizes Ilb - Ax112 .

13. Suppose A = U E V * is a singular value decomposition of A. Argue thatQ 11Vl12 <_ IlAvll2 <- o IIVI12

14. Suppose A is an m-by-n matrix of rank r and 0 < k < r. Find a matrixof rank k that is closest to A in the Frobenius norm. (Hint: SupposeA = U E V * is a singular value decomposition of A. Take UV*,

vi 0

where Ek = )vk 0

0 0

15. Suppose A is an m-by-n matrix. Argue that the 2-norm of A is its largestsingular value.

16. Suppose A is a matrix. Argue that the condition number of A relative tothe 2-norm is the ratio of its largest singular value to its smallest singularvalue.

17. Argue that BB* = CC* iff there exists U with UU* = I such thatC = BU.

18. Suppose A = U ®' ®J V* is the singular value decomposition

1

of A. Argue that B = V[ D' GJ]U* is a 1-inverse of A and all

I-inverses look like this where E, F, and G are arbitrary.

Further Reading

[B&D, 1993] Z. Bai and J. Demmel, Computing the Generalized SingularValue Decomposition, SIAM J. Sce. Comput., Vol. 14, (1993), 1464-1486.

[B&K&S, 1989] S. J. Blank, Nishan Krikorian, and David Spring,A Geometrically Inspired Proof of the Singular Value Decomposi-tion, The American Mathematical Monthly, Vol. 96, March, (1989),238-239.

/0.4 The Singular Value Decomposition 385

[C&J&L&R, 20051 John Clifford, David James, Michael Lachance, andJoan Remski, A Constructive Approach to Singular Value Decomposi-tion and Symmetric Schur Factorization, The American MathematicalMonthly, Vol. 112, April, (2005), 358-363.

[M&R, 1998] Colm Mulcahy and John Rossi, A Fresh Approach tothe Singular Value Decomposition, The College Mathematics Journal,Vol. 29, No. 3, (1998), 199-206.

[Good, 1969] I. J. Good, Some Applications of the Singular Decomposi-tion of a Matrix, Technometrics, Vol. 11, No. 4, Nov., (1969), 823-831.

[Long, 1983] Cliff Long, Visualization of Matrix Singular Value Decom-position, Mathematics Magazine, Vol. 56, No. 3, May, (1983), 161-167.

[M&M, 1983] Cleve Moler and Donald Morrison, Singular Value Anal-ysis of Cryptograms, The American Mathematical Monthly, February,(1983), 78-86.

[Strang, 1980] G. Strang, Linear Algebra and Its Applications, AcademicPress, New York, (1980).

[T&B, 1997] L. N. Trefethen and D. Bau III, Numerical Linear Algebra,Society of Industrial and Applied Mathematicians, Philadelphia, (1997).

[G&L, 1983] G. H. Golub and C. Van Loan, Matrix Computations, JohnsHopkins University Press, Baltimore, (1983).

[H&O, 1996] Roger A. Horn and Ingram Olkin, When Does A*A=B*Band Why Does One Want to Know?, The American MathematicalMonthly, Vol. 103, No 6., June-July, (1996), 470-481.

[Kalman, 19961 Dan Kalman, A Singularly Valuable Decomposition:The SVD of a Matrix, The College Mathematics Journal, Vol. 27, No. 1,January, (1996), 2-23.

[Koranyi, 20011 S. Koranyi, Around the Finite-Dimensional Spectral The-orem, The American Mathematical Monthly, Vol. 108, (2001) 120-125.


10.4.1.1 The Singular Value Decomposition

Of course, MATLAB has a built-in command to produce the singular valuedecomposition of a matrix. Indeed, many computations in MATLAB are based


on the SVD of a matrix. The command is

IU,S,V)=svd(A).

This returns a unitary matrix U, a unitary matrix V, and a diagonal matrix Swith the singular values of A down its diagonal such that

A = USV'

up to roundoff. There is an economy size decomposition. The command is

[U, S, V] = svd(A, 0).

If all you want are the singular values of A, just type

svd(A).


> > A = vander(1:5)

A=I I

16 8

81 27

256 64

625 125

I I

4 2

9 3

16 4

25 5

1

I

I

1

1

> > [U,S,V] = svd(A)

U=-0.0018 -0.0739 0.6212 -0.7467 -0.2258-0.0251 -0.3193 0.6273 0.3705 0.6055-0.1224 -0.6258 0.1129 0.3594 -0.6720-0.3796 -0.6165 -0.4320 -0.4046 0.3542-0.9167 0.3477 0.1455 0.1109 -0.0732

S=695.8418 0 0 0 0

0 18.2499 0 0 0

0 0 2.0018 0 00 0 0 0.4261 00 0 0 0 0.0266

V=-0.9778 0.1996 0.0553 0.0294 -0.0091-0.2046 -0.8500 -0.3898 -0.2675 0.1101-0.0434 -0.4468 0.4348 0.6292 -0.4622-0.0094 -0.1818 0.6063 0.0197 0.7739-0.0021 -0.0706 0.5369 -0.7289 -0.4187


>> format rat

> > [U,S,V] = svd(A)

U=

S=

-18/10123 -476/6437 579/932 -457/612 -240/1063-298/11865 -692/2167 685/1092 379/1023 637/1052

-535/4372 -2249/3594 297/2630 1044/2905 -465/692

-976/2571 -3141/5095 -499/1155 -547/1352 559/1578

-39678/43285 491/1412 657/4517 207/1867 -100/1367

13221/19 0 0 0 0

0 33306/1825 0 0 0

0 0 1097/548 0 0

0 0 0 268/629 0

0 0 0 0 23/865

V=- 2029/2075 395/1979 157/2838 367/12477 -218/24085

-593/2898 -2148/2527 -313/803 -811/3032 82/745

-439/10117 -915/2048 1091/2509 246/391 -403/872

-79/8430 -313/1722 596/983 644/32701 2091/2702

-87/41879 -286/4053 792/1475 -277/380 -675/1612

> > svd(A)

ans =13221/1933306/18251097/548268/62923/865

Now try the complex matrix B = [ I+i 2+2i 3+i;2+2i 4+4i 9i;3+3i 6+6i 8i J.The SVD is very important for numerical linear algebra. Unitary matrices

preserve lengths, so they tend not to magnify errors. The MATLAB functionsof rank, null, and orth are based on the SVD.

Chapter 11

Jordan Canonical Form

11.1 Jordan Form and Generalized EigenvectorsWe know that there are matrices that cannot be diagonalized. However, there

is a way to almost diagonalize all complex matrices. We now develop thissomewhat long story.

11.1.1 Jordan Blocks

There are matrices that can be considered the basic building blocks of allsquare complex matrices. They are like the prime numbers in N. Any n E N isuniquely a product of primes except for the order of the prime factors that canbe permuted. An analogous fact holds for complex matrices. The basic buildingblocks are the Jordan blocks.

DEFINITION 11.1 (k-Jordan block)k 1 0 0 . .. 0 00 k 1 0 . .. 0 0O O k 1 . .. 0 0For X E C, k E N, def ine Jk(k) _ E

0 0 0 0 . . . 0 k(Ckxk

This is a k-Jordan block of order k.

r 1k 1 0

,For example, J1(k) = lk], JZ(k) = r 0 k J, J3(k) = 0 k I

0 0 k

and so on.r 1

4 1 05

More concretely, J2(5) = r 05

.13(4) = 0 4 1

L 0 0 4

389

390 Jordan Canonical Form

Now we collect some basic facts about Jordan blocks. First we illustrate someimportant results.

X 1 0 1 X I

Note that 0 X 1 0 = 0 = X 0 , so X is an eigen-0 0 X 0 0 0

xivalue ofJ3(X)with eigenvectorthestandard basis vector e1.Moreover, xz E

x3

x, 0 I 0 xiEig(J3(X); X) iff X2 e Null(J3(X) - X/) iff 0 0 1 xz =

x3 0 0 0 x3

00 iff xz = 0 and x3 = 0. Thus Eig(J3(X); X) = span(ei) = Cei, so0

the geometric multiplicity of X is 1. Moreover, the characteristic polynomial offx 0 0 X 1 0J3(X), Xj,tXt(x) = det(x1- J3(X)) = det 0 x 0 - 0 X I =

O O x 0 0 Xx - X -1 0 0 1 0

det 0 x - X - I = (x-X)3. Also, J3(X)-XI = 0 0 1

0 0 x - X 0 0 0

which is nilpotent of index 3, so the minimal polynomial µz,(x)(x) = (x - X)3as well. In particular, the spectrum of J3(X) is {X). We extend and summarizeour findings with a general theorem.

THEOREM 11.1Consider a general Jordan block Jk(X) :

1. Jk(X) is an upper triangular matrix.

2. det(Jk(X)) =X4 and Tr(Jk(X)) = kX.

3. Jk(X) is invertible ifX # 0.

4. The spectrum X (JL(X)) = {X); that is, X is the only eigenvalue of Jk(X)and the standard basis vector ei spans the eigenspace Eig(Jk(X); X).

5. dim (Eig(JJ(X); X)) = 1; that is, the geometric multiplicity of X is 1.

6. The characteristic polynomial of Jk(X) is X.t,(a)(x) _ (x - X)k.

7. The minimal polynomial of Jk(X) is µ j, (,\)(x) = (x - X)k.

11.1 Jordan Form and Generalized Eigenvectors

8. Jk(k) = Jk(0) + k/k and Jk(0) is nilpotent of index k.

9. rank( Jk(0)) = k - I.

10. Jk(h)n =

1

\n (11xn-1

0 \n

0 0

0

11. If k # 0, Jk(X)-I =I _ I I

0 i, - l0 0 a

0 0 ... kn

1

-1\7

. (-1)i+i -L

0 0 0 0 ...

Jk(0) - 1 Jk(0)2 + Jk(O)3

12. Suppose p(x) E C [x] ; then p(Jk(X)) zz:

13. Jk(0)Jk(0)r to I 0

0

forn E N.

(- 1)I+k I

k

391

' ... 'P(,\) (k LLI, 2! (r-1)!

0 p(X) ...

0 0 p(X) .

0 0 0 ... p(X)

14. Jk(O)T Jk(0)0 00 ]k-I

15. Jk (O)+ = Jk (0)T .

16. Jk(0)ei+1 = ei f o r i = 1, 2, ... , k-1.

17. If k # 0, rank(Jk(k))n' = rank(Jk(X))n,+I -- k for m = 1,2,3, ... .

18. rank(Jk(0)"') - rank(Jk(0)ii+1) = 0 if nt > k.

19. rank(Jk(0)"') - rank(Jk(0)'n+l) = 1 for m -- 1, 2, 3, ... , k-l.

PROOF The proofs are left as exercises.


Exercise Set 464 1 0 0 0 0

0 4 1 0 0 0

1 . Consider J = J6(4) =0 0 0 4 10 0 . What are the deter-

0 0 0 0 4 1

0 0 0 0 0 4

minant and trace of J? Is J invertible? If so, what is J-1. What is thespectrum of J and what is the eigenspace? Compute the minimal andcharacteristic polynomials. What is J7? Suppose p(x) = x2 + 3x - 8.What is p(J)?


3. Show that Jk(h) _)\l + E e;e+i for n > 2.

4. Argue that Jk(X) is not diagonalizable for n > 1.

11.1.1.1 MATLAB Moment

11.1.1.1.1 Jordan Blocks It is easy to write an M-file to create a matrix thatis a Jordan block. The code is

I function J = jordan(lambda, n)2 ifn == 0, J = [], else3 J = nilp(n) + lambda * eye(n); end

Experiment making some Jordan blocks. Compute their ranks and characteristicpolynomials with MATLAB.

11.1.2 Jordan Segments

Now we need to complicate our structure by allowing matrices that are blockdiagonal with Jordan blocks of varying sizes down the main diagonal. For thetime being, we hold X fixed.

DEFINITION 11.2 (X-Jordan segment)A X-Jordan segment of length k is a block diagonal matrix consisting of k

X-Jordan blocks of various sizes. In symbols, we write, J( k; pi, p2, ... , pk)Block Diagonal[Jv,(X), JP2(k), ... , J,,,(X)]. Note J(X; Pi. P2. . Pk) E

k

C' ' where t = Epi. Let's agree to write the pis in descending orderi=1

The sequence Segre(X) = (pi, P2, ... , Pk) is called the Segre sequence of

11.1 Jordan Form and Generalized Eigenvectors 393

J(X; pi, p2, , pk) Clearly, given X and the Segre sequence of X, the X-Jordan Segment is completely determined.

5 1 0 0 0

0 5 1 0 0

Let's look at some examples for clarification. J(5;3, 2) = 0 0 5 0 0 E

0 0 0 5 1

0 0 0 0 5

0 1 0 0 0 0 07 1 0 0 0 0 0 1 0 0 0 00 7 0 0 0 0 0 0 0 0 0 0

Clx5,J(7;2,1,1,1)= 0 0 7 0 0 ,J(0;3,2,2)= 0 0 0 0 1 0 00 0 0 7 0 0 0 0 0 0 0 00 0 0 0 7 0 0 0 0 0 0 1

Note rank(J(0; 3, 2, 2)) = 4 and nullity = 3.0 0 0 0 0 0 0

To understand the basics about Jordan segments, we need some generalitiesabout block diagonal matrices.

THEOREM 11.211

Let A =L

B®J, where B and C are square matrices.

01. A2 =

®

g];moregenerally.Ak= [Bk® J

fo, anypositive integer.

2. If p(x) E C[x] and p(A) _ 0, then p(B) =Oand p(C) _ 0. Conversely,if p(B) = ® and p(C) = 0, then p(A) = 0.

3. The minimal polynomial pA(x) = LCM(pB(x), pc(x)); moreover, if(x - X) occurs kB times in µB(x) and kc times in p.c(x), then (x- X)occurs max{ke, kc} times in µA(x).

4. The characteristic polynomial XA(x) = XB(x)Xc(x); so if (x- X) occurskB times in XB(x) and kc times in Xc(x), then (x- X) occurs kB + kctimes in XA(x).

5. X(A) = X(B) U X(C); that is, X is an eigenvalue of A if X is an eigenvalueof either B or C.

6. The geometric multiplicity of X in X(A) equals the geometric multiplicityof X in B plus the geometric multiplicity of X in C (of course, one of thesecould be zero).


PROOF Once again, the details are left to the reader. 0

These properties extend to matrices with a finite number of square blocksdown the diagonal.

COROLLARY 11.1Suppose A = Block Diagonal[A I, A2, ... , Ak]. Then

1. µA(x) = LCM[WA,(x), PAZ(x), ... , µA,(x)}k

2. XA(x) = fXA,(x)j=1

k3. A(A) = U A(Aj).

J=1

Now, we can apply these ideas to A-Jordan segments.

COROLLARY 11.2Consider a X-Jordan segment of length k, J = J(lt; PI, P2, ... , PA)-

1. X is the only eigenvalue of J(X; pl, P2, , Pk)-

2. The geometric multiplicity of A is k with eigenvectors e1, eP,+1, ... ,

eP,+P:+ P,-,+I-

3. p.j(x) = (x - X)IIIP,.P2.....P41

4.k

XJ (x) = (x- X)', where t = E Pjj=1

k

5. -k.J=1

6. index(J(0; P1, P2, P3, ... , PA)) = max{PI, P2, P3, ... , Pk}.

PROOF Once again, the proofs are left as exercises. 0

5 1 0

0 5 1

0 0 5

To illustrate, consider J(5; 3, 2, 2) = 5 1 . Clearly,0 5

5 1

0 5

the only eigenvalue is 5; its geometric multiplicity is 3 with eigenvectors el,e4,e6;the minimal polynomial is µ(x) _ (x - 5)3, and the characteristic polynomial


0 1 00 0 1

0 0 0

is X(x) = (x - 5)7. Consider J(0; 3, 2, 2) = 0 1

0 0

0 1

0 0

Zero is the only eigenvalue with geometric multiplicity 3, egenvectors e1 ,e4,e6,minimal polynomial µ(x) = x3, and characteristic polynomial X(x) = x7. Therank is 4 and index is 3.

Exercise Set 47

1. Consider J = J(4; 3, 3, 2, 1, 1). Write out J explicitly. Compute theminimal and characteristic polynomials of J. What is the geometric mul-tiplicity of 4 and what is a basis for the eigenspace? What is the trace anddeterminant of J? Is J invertible? If so, find J-1.


3. Prove all the parts of Corollary 1 1.1 and Corollary 11.2.


11.1.2.1.1 Jordan Segments We have to be a little bit more clever to writea code to generate Jordan segments, but the following should work:

I function) = jordseg(lambda, p)2 k = length(p)3 J = jordan(lambda, p(k))4 fori=k-l:-l:l5 J = blkdiag(jordan(lambda, p(i)), J)6 end

We need to be sure to enter p as a vector.For example,

> > jordseg(3, [2, 2, 4])

ans =3 1 0 0 0 0 0 0

0 3 0 0 0 0 0 0

0 0 3 1 0 0 0 0

0 0 0 3 0 0 0 0

0 0 0 0 3 1 0 0

0 0 0 0 0 3 1 0

0 0 0 0 0 0 3 1

0 0 0 0 0 0 0 3


11.1.3 Jordan Matrices

Now we take the next and final step in manufacturing complex matrices upto similarity.

DEFINITION 11.3 (Jordan matrix)A J o r d a n m a t r i x is a block diagonal matrix whose blocks are Jordan seg-

ments. The notation will be J = J((,\ , ; p,, , P, 2, ... , PIk(l)), (1\2; P21, P22, ,

P2k(2)), ... , (k,; P,i, Ps2, , p,k(s))) = BlockDiagonal[J(t i; Pi i, P12, ... ,

P(k(l)), ... , J(X,.; PS(, Ps2, ... , p,k(,))]. The data (hi, Segre(X1)) for i = 1,2,... ,s is called the Segre characteristic of J and clearly completely determinesthe structure of J.

For example,J = J((3;2, 1, 1), (-2; 3), (4;3, 2), (i; 1, 1)) _

3 1

0 3

3

3

-2 1 0

0 -2 I

0 0 -24 1 0

0 4 1

0 0 4

4 1

0 4

i

i

The Segre characteristic of J is {(3; 2, 1, 1), (-2; 3), (4; 3, 2), (i ; 1, 1)}. Now,using the general theory of block diagonal matrices developed above, we candeduce the following.

THEOREM 11.3Suppose J = J((A(; Pi 1, P12, ... , Pike)), (k2; P21, P22, ... , P242)), .... (ks; P, iP,2, .... P,k(s)))-

Then

1. the eigenvalues of J are,\, , k2, ... , Ik,

2. the geometric multiplicity of k; is k(i)

IL l Jordan Form and Generalized Eigenvectors 397

3. the minimum polynomial of J is µ(x) = rI(x - Ki)"'ax(N,',Ni2.i=I

4. the characteristic polynomial of J is X(x) = fl(x - where ti =i=1

k(i)

i piij=1

PROOF The proofs are left as exercises. 0

Now the serious question is whether any given matrix A is similar to a Jordanmatrix and is this matrix unique in some sense. The remarkable fact is that thisis so. Proving that is a nontrivial task. We work on that next.

Exercise Set 48

1. Consider J = J((4; 5), (11; 2, 2, 1), (2+2i; 3)). What are the eigenvaluesof J and what are their geometric multiplicities? Compute the minimaland characteristic polynomials of J.



11.1.3.1.1 Jordan Matrices Programming Jordan matrices is a bit of a chal-lenge. We are grateful to our colleague Frank Mathis for the following code:

I functionJmat = jordform(varargin)2 n = nargin3 lp = varargin(n)

4 lp = lp{:}5 k = length(lp)6 Jmat = jordseg(lp(1),lp(2 : k))7 fori=n-1:-I:18 lp = narargin(i)

9 lp = lp{:)10 k = length(lp)I I Jmat = blkdiag(jordseg(lp(l), lp(2 : k)), Jmat12 end


Try

> >ford f orm([8, [4, 2, 211, [9, [3, 1 ]1).

Note the format of the entry.

11.1.4 Jordan's Theorem

There are many proofs of Jordan's theorem in the literature. We will try topresent an argument that is as elementary and constructive as possible. Inductionis always a key part of the argument. Let's take an arbitrary A in C11 ". We shallargue that we can get A into the form of a Jordan matrix by using similarities.That is, we shall find an invertible matrix S so that S-1 AS = J is a Jordan matrix.This matrix will he unique up to the order in which the blocks are arranged onthe diagonal of J. This will he our canonical form for the equivalence relationof similarity. We can then learn the properties of A by looking at its Jordancanonical form (JCF).

The proof we present is broken down into three steps. The first big step is to getthe matrix into triangular form with eigenvalues in a prescribed order down themain diagonal.This is established by Schur's theorem, which we have alreadyproved. As a reminder, Schur's theorem states: Let A E C0" with eigenvaluesA, , .2, ... , K" (not necessarily distinct) be written in any prescribed order. Thenthere exists a unitary matrix U E C"with U-1 AU = T, where T is uppertriangular and t,, = K; for i = 1, 2, ... , n. Hence, we may assume U-1 AU isupper triangular with equal diagonal entries occurring consecutively down thediagonal. Suppose now, k, , K2, ... ,X,,, is a list of the distinct eigenvalues of A.

For the second stage, we show that transvections can be used to "zero out"portions of the upper triangular matrix T when a change in eigenvalue on thediagonal occurs. This can be done without disturbing the diagonal elements orthe upper triangular nature of T.

Suppose r < s and consider T,.,(a)-1 where ais any complex scalar. Notice that this similarity transformation changes entriesof T only in the rth row to the right of the sth column and in the sth columnabove the rth row and replaces tr,. by tr, + a(trr - t ).

By way of illustration, consider

I -a 0 t t12 t13 I a 0 I -a 0 t t11a + t12 t130 1 0 0 t72 t23 0 1 0 = 0 1 0 0 t22 1`23

0 0 1 0 0 t33 0 0 1 0 0 I 0 0 t33

hl ti 1a+t12-t22t1.3-(1123 tit ti 2+a(tt1-t22)t13-at2.10 t22 t23 = 0 t22 t23

0 0 t33 0 0 t33

By choosing ar, _ , we can zero out the (r, s) entry as long as tr, 96trr - t,V.

t,,. Thus, working with the sequence of transvections corresponding to positions

ll.! Jordan Form and Generalized Eigenvectors 399

(n-1,n),(n-2,it - 1),(n-2,n),(n-3,it -2),(n-3,n-2),(n -3, n),(n - 4, it - 3), (n - 4, it - 2), ...etc., we form the invertible matrix

Q =

T2

and note that Q-I AQ =

a block diagonal matrix where T, = X, I + M;,

where M; is strictly upper triangular (i.e., upper triangular with zeros on themain diagonal). Before going on to the final stage, let us illustrate the argumentabove to be sure it is clear.

2 2 3 4

Consider A =0 2 5 6

Being already upper triangular we do not0 0 3 9

,

0 0 0 3

need Schur. We see aI I = a22 and a33 = aqq so we begin by targeting a23:2 2 13 4

T23(-5)AT23(5) =0 2 0 -39

The reader should now conti nue0 0 3

99

0 0 0 3

2 2 0 0

0 2 0 0on with positions (3 14) 3 and 1 4 to obtain,, , ,

0 3 9

0 0 0 3

Now we go on to the final stage to produce JCF.In view of the previous stage, it suffices to work with matrices of the form XI+

M where M is strictly upper triangular. Say T =

0 x * ,,. *0 0 x * We

L0 0 0 ... X]induct on n, the size of the matrix. For n = 1, T = [k] = JI [k], which is already

in JCF. That seemed a little too easy, so assume t = 2. Then T k b 1

If b = 0, again we have JCF, so assume b # 0. Then, using a dilation, we

can make b to be 1; DI (b-' )T DI (b) =601 Ol ] L 0 k] 0 1

b-I 0 kb b _ b-' \b 1 k 1

0 11 L 0 >a = 0 k = L 0 , which is JCR Now

for the induction, suppose n > 2. Consider the leading principal submatrix ofT, which is size (it - 1)-by-(n - 1). By the induction hypothesis, this matrix

can be brought to JCF by a similarity Q. Then, LQ-I

? ]T

[Q

®]_L ® 1

I 0 1


T, =

Fand F is a block diagonal matrix of X-Jordan

L 0 0 ... 0XI

blocks. The problem is that there may be nonzero entries in the last column.But any entry in that last column opposite a row of F that contains a one canbe zeroed out. We illustrate.

It l 0 0 0 0 a

0 X 0 0 0 0 b

0 0 X 1 0 0 0

Say T, = 0 0 0 it 1 0 d . Now, for example, d lies in the0 0 0 0 It 1 e0 0 0 0 0] f0 0 0 0 0 0 k

(4,7) position across from a one in the fourth row of the leading principalsubmatrix of T, of size 6-by-6. We can use a transvection similarity to zero itout. Indeed,

57(d)T1T57(-d) =

it000000

1 0 0 0 0 aX 0 0 0 0 b

0 1t 1 0 0 C

0 0 it 1 0 d0 0 0 it I e+Ad0 0 0 0 ]t f0 0 0 0 0 it

57(-d) =

it 1 0 0 0 0 a

0 it 0 0 0 0 b

0 0 1t 1 0 0 C

0 0 0 k 1 0 -d+d=00 0 0 0 A I -itd+e+itd=e0 0 0 0 0 it f0 0 0 0 0 0 it

In like manner, we eliminate all nonzero elements that live in a row with asuperdiagonal one in it. Thus, we have achieved something tantalizingly closeto JCF:

It 1 0 0 0 0 0O k 0 0 0 0 b

O O A 1 0 0 0T2= 0 0 0 it 1 0 0

0 0 0 0 it 1 00 0 0 0 0 it f0 0 0 0 0 0 )\

Now comes the tricky part. First, if b and f are zero, we are done. We haveachieved JCR Suppose then that b = 0 but f # 0. Then f can be made one

11. 1 Jordan Form and Generalized Eigenvectors 401

using a dilation similarity:X 1 0 0 0 0 0

0 X 0 0 0 0 0

0 0 k 1 0 0 0

D7(f)T2D7(f -1) = 0 0 0 k 1 0 0 and we have JCF.0 0 0 0 k 1 00 0 0 0 0 k 1

0 0 0 0 0 0 k

Now suppose f = 0 but b 0. Then b can be made one by a dilation similar-k 1 0 0 0 0 00 X 0 0 0 0 1

0 0 k 1 0 0 0ity as above; D7(b)T2D7(b-1) = 0 0 0 k 1 0 0 . Now use a permuta-

0 0 0 0 X 1 00 0 0 0 0 X 00 0 0 0 0 0 k

tion similarity to create JCF; swap columns 3 and 7 and rows 3 and 7 to get

P(37)D7(b)T2D7(b-1)P(37) =X 1 0 0 0 0 0 X 1 0 0 0 0 0

0 X 1 0 0 0 0 0 X 1 0 0 0 0

0 0 0 1 0 0 k 0 0 k 1 0 0 0

= P(37) 0 0 0 X 1 0 0 = 0 0 0 X 1 0 00 0 0 0 X 1 0 0 0 0 0 X 1 0

0 0 0 0 0 X 0 0 0 0 0 0 X 0

0 0 X 0 0 0 0 0 0 0 0 0 0 k

Finally, suppose both b and f are nonzero. We will show that the elementopposite the smaller block can be made zero. Consider D7(f)T15(- L)T26(- L') )

T2T26(f)Tl5(f)D7(f-l) _k 1 0 0 0 j 0

T15(-f)

000000

k00000

0k0000

01

k000

00

1

k00

0001

k0

0000fk

T15(k) =

k 1 0 0 0 0 0

0 X 0 0 0 0 0

0 0 X 1 0 0 0

D7(f) 0 0 0 k 1 0 0 D7(f-1) _0 0 0 0 k 1 0

0 0 0 0 0 k f0 0 0 0 0 0 k


X 1 0 0 0 0 0

O k 0 0 0 0 0

0 0 A 1 0 0 0

0 0 0 X 1 0 0 , which is JCF.0 0 0 0 X 1 00 0 0 0 0 k I

0 0 0 0 0 0 x

This completes our argument for getting a matrix into JCF. Remarkably, afterthe triangularization given by Schur's theorem, all we used were elementaryrow and column operations. By the way, the Jordan we have been talking aboutis Marie Ennemond Camille Jordan (5 January 1838-22 January 1922), whopresented this canonical form in 1870. Apparently, he won a prize for thiswork.

An alternate approach to stage three above is to recall a deep theorem weworked out in Chapter 3 concerning nilpotent matrices. Note that N = T - XIis nilpotent and Theorem 81 of Chapter 3 says that this nilpotent matrix issimilar to BlockDiagonal[Nilp[pi], Nilp[p2], ... , Nilp[pk]] where the totalnumber of blocks in the nullity of N and the number of j-by-j blocks isrank(Ni-1) - 2rank(Ni) + rank(N'+'). If S is a similarity transformationthat affects this transformation then S-'NS = S-'(T - XI)S = S-'TS-kl = Block Diagonal[Nilp[pi], Nilp[P2], ... , Nilp[Pk]] so S-1 TS =Al + Block Diagonal[Nilp[pi], Nilp[P2], ... , Nilp[pk]] is in JCF.Finally, we remark that the one on the superdiagonal of the JCF are not es-sential. Using dilation similarities, we can put any scalar on the superdiag-onal for each block. What is essential is the number and the sizes of theJordan blocks.

11.1.4.1 Generalized Eigenvectors

In this section we look at another approach to Jordan's theorem where weshow how to construct a special kind of basis. Of course, constructing basesis the same as constructing invertible matrices. We have seen that not allsquare matrices are diagonalizable. This is because they may not have enough

egenvectors to span the whole space. For example, take A =L

010

Being lower triangular makes it easy to see that zero is the only eigenvalue. If

we compute the eigenspace, Eig(A, 0) =L

X2 I I A f Xx2 1 = f 0 l 1=

XZJ I

[0 0

0] [ X' J

= [ X,] =

[ 00

] } = l [ 02]1

X2 E C} _

sp (I0

1), which is one dimensional. There is no way to generate C2 with

II. I Jordan Forin and Generalized Eigenvectors 403

eigenvectors of A. Is there a way to get more "eigenvectors" of A? This leadsus to the next definition.

DEFINITION 11.4 (generalized eigenvectors)Let A be an eigenvalue of A E C"X". A nonzero vector x is a generalized

eigenvector of level q belonging to A (or just A-q-eigenvector for short) if andonly if (A - A! )g x =' but (A - A/ )g- I x # ". That is, X E NU ll ((A - A! )g )

but xVNull((A-A/)g-1).

Note that a I -eigenvector is just an ordinary eigenvector as defined previously.Also, if x is a A-q-eigenvector, then (A - A/)t'x = for any p > q. We couldhave defined a generalized eigenvector of A to be any nonzero vector x such

_iTthat (A - A!)kx = for some k E N. Then the least such k would be the levelof x. You might wonder if A could be any scalar, not necessarily an eigenvalue.But if (A - AI)kx ='for x 54 -6, then (A - AI)k is not an invertible matrixwhence (A - A/) is not invertible, making A an eigenvalue. Thus, there are no"generalized eigenvalues."

DEFINITION 11.5 (Jordan string)Let x be a q-eigenvector of A belonging to the eigenvalue A. We define the

Jordan string ending at x byXq = X Xq

Xq_, = (A - k!)X Axq = Axq + xq-1Xq-2 = (A - A!)2X = (A - A/)Xq_1 Axq_1 = AXq-1 +Xq-2

X2 = (A - AI )q-2X = (A - A!)x3x1 = (A - A!)q-IX = (A - A!)x2

AX3 = AX3 + X2Axe = Axe + x,Ax, = Ax1

Note what is going on here. If x is an eigenvector of A belonging to A, then(A - A!)x = V (i.e., (A - A!) annihilates x). Now if we do not have enougheigenvectors for A- that is, enough vectors annihilated by A - A! - then it isreasonable to consider vectors annihilated by (A - A1)2, (A - A/)3, and so on.If we have a Jordan string x j, x2, ... , xq , then (A - A! )x 1 = (A - A I )g x = _iT,

(A - A!)2 x2 = -6, ... , (A - A!)gxq = _. Note that x, is an ordinary eigen-vector and (A -XI) is nilpotent on span 1x,, x2, ... , xq }, which is A-invariant.The first clue to the usefulness of Jordan strings is given by the next theorem.

THEOREM 11.4The vectors in any Jordan string {x1, x2, ... , xq) are linearly independent.Moreover, each xi is a generalized eigenvector of level ifor i = 1, 2, ... , q.


PROOF With the notation as above, set a, x i + a2x2 + ... + ay_ 1 Xq _ 1 +

agXq = (. Apply (A - \1)y-l to this equation and get a,(A - \1)y-Ixi +a2(A - \I )y-' x2 + ... + ay(A - \I )q-1 xq = 6. But all terms vanish exceptaq(A - \I )q-1 Xq. Thus aq(A - AI )q-1 Xq = aq(A - 1\1)q-IX, SO aq = 0 since

#Inasimilar manner, each acanbeforced tohezero. Clearly,(A - AI)x, = -6, so x, is an eigenvector of A. Next, (A - AI)x2 = x, # (1and yet (A - \1)2x2 = (A -,\/)x, Thus, x2 is a generalized eigenvectorof level 2. Now apply induction. 0

We are off to a good start in generating more independent vectors corre-sponding to an eigenvalue. Next, we generalize the notion of an eigenspace.

DEFINITION 11.6 If \ is an eigenvalue of A E C""", define

G5(A)={xEC" 1(A-AI)t"x=-6 for some pEN}.

This is the generalized eigenspace belonging to A.

THEOREM 11.5For each eigenvalue A, G,\(A) is an A-invariant subspace of C. Indeed,G,\(A) = Null((A - \1)").

PROOF The proof is left as an exercise. El

Clearly, the eigenspace forA, Eig(A, A) = Null(A-AI) e G,,(A). Now theidea is something like this. Take a generalized eigenspace G,,(A) and look for ageneralized eigenvector x, say with (A - \I)'-'x # -6 but (A - \I)kx = -6

and such that there is no vector in G,\(A) that is not annihilated at least by(A - AI )k . This gives a generalized eigenvector of level k where k is the lengthof a longest Jordan string associated with A. Then x = xk, x,,_,, ... , xi is a list

of independent vectors in GX(A). If they span Gk(A), we are done. If not we lookfor another generalized eigenvector y and create another string independent ofthe first. We continue this process until (hopefully) we have a basis of G,,(A). Wehope to show we can get a basis of C" consisting of eigenvectors and generalizedeigenvectors of A. There are many details to verify. Before we get into the nitty-gritty, let's look at a little example. Suppose A is 3-by-3 and A is an eigenvalueof A. Say we have a Jordan string, x3 = x, x2 = (A - XI)x = Ax3 - Ax3 andx, = (A - \1)2x = (A - AI )x2 = Axe - Axe and necessarily (A - XJ)3x = 6and x, is an eigenvector for A. Then solving we see Ax3 = x2 + Ax3, Axe =x, + Axe and Ax, = Ax,, Then A[x, I X2 I x3] = [Ax, I Axe I Ax3] = [Ax,

I 1.1 Jordan Form and Generalized Eigenvectors 405

K 1 0

X1 + #\X2 I X2 + Xx31 = [xi I X2 I X31 0 K I = [x1 I X2 I x3]J3(1\).0 0 K

Theorem 11.4 says (x1, x2, x3) is an independent set, so if S = [x1I X2 I x3],

then AS = SJ3(X) or S-' AS = J3()\). Thus, A is similar to J3(X). We call(x1, x2, x3} a Jordan basis. The general story is much more complicated thanthis, but at least you now have a hint of where we are going.

First, a little bad news. Suppose we have a Jordan string (A - Xl)xi =-6,(A-XI)x2 =xi,... ,(A-XI)xq =xq_I.Theny, =xt,y2 =x2+aixi,Y3 =x3+a1x2+a2x1,... ,yq =Xq+aixq_1+a2Xq_2+...+aq_Ixi isalsoa Jordan string and the subspace spanned by the ys is the same as the subspacespanned by the xs. Thus, Jordan bases are far from unique. Note that yy abovecontains all the arbitrary constants. So once yy is nailed down, the rest of thechain is determined. This is why we want generalized eigenvectors of a highlevel. If we start building up from an eigenvector, we have to make arbitrarychoices, and this is not so easy. Let's look at a couple of concrete examplesbefore we develop more theory.

5 1 0

Consider a Jordan block A = J3(5) = 0 5 1 . Evidently, 5 is the only0 0 5

x xeigenvalue of A. The eigenspace Eig(A, 5) = !

Iy

II (A - SI) I y 1 =

z I I z

0 x 0 1 0 x 0

0 = y 1 0 0 1 y = 0 =0 z 0 0 0 z 00 x I

0 = 0 IE = sp 0

]).

Thus Eig(A, 5) is one di-0 0 0

mensional, so we are short on eigenvectors if we hope to span C3. Note that0 1 0 0 0 1

(A-5!)= 0 0 1 ,(A-51)2= 0 0 0 ,and(A-51)3=®,0 0 0 0 0 0

a 0so we would like a 3-eigenvector x3 = b . Evidently (A -51 )3x3 = 0

c 00 0 1 a c

Now (A - 5!)2x3 = 0 0 0 b = 0 , so we must require0 0 0 c 0

0 1 0 a bc96 0.Sayc=1.Next, x2=(A-51)x3= 0 0 1 b = I

0 0 0 1 0

yI z =

0

406 Jordan Canonical Forin

0 1 0 b

and x, = 0 0 I 1

0 0 0 0

I I I

So, choosing a = b = 1, x3 = I , x2 = 1 , and x, = 01 0 0

is a Jordan string, which is a Jordan basis of C3. Note Ax, = A 00

1 5 1 1 I 65 0 = 0 Ax e =A I = 0 + 5 I = 5 an d

0 L 0 0 0 0 0I 6 11 l 5 6 6

Ax3= I + 5 I

1=6 soA 0 1 I = 0 5 6 =

0 1 5 0 0 1 0 0 5

1 1 1 5 1 00 1 I 0 5 I . Here G5(A) = C3 and a Jordan basis is0 0 1 0 0 5

I 1 I

{ 0 , I 1 }. Actually, we could have just chosen the standard0 0 1

basis {e,, e2, e3}, which is easily checked also to be a Jordan basis. Indeed,Ae, = 5e,, Ae2 = 5e2 + e,, and Ae3 = 5e3 + e2. Moreover, it is not difficultto see that the standard basis {e,,e2, ... is a Jordan basis for the Jordanblock Furthermore, the standard basis is a Jordan basis for any Jordanmatrix where each block comes from a Jordan string. To see why more thanone Jordan block might be needed for a given eigenvalue, we look at anotherexample.

0 3 0 3i

Consider A = 20 03i 0

2i33

. We find the characteristic poly-

2i 0 2 0nomial of A is XA(x) = x4 and the minimum polynomial of A is lJA() = x'`since A2 = 0. Thus, 0 is the only eigenvalue of A. The eigenspace of 0 isEig(A, 0)-=

0 0 i

+d I IcdEC =I }

[']

0 [i]3 0

1 0 1 0The geometric multiplicity of X = 0 is 2. Since A22 _ 0, the highest a levelcan be for a Jordan string is 2. Thus, the possibilities are two Jordan strings of

I 1. I Jordan Form and Generalized Eigenvectors 407

length two or one of length two and two of length 1. To find a 2-eigenvector we0

solve the necessary system of equations and find one choice is x2 =-i1 /2

Then x2

we find 141 1,x;

is a Jordan string of length 2. Similarly,

0Now (x,, x2, x3, x4} turns out to be an

L 1/3J L°Jindependent set, hence is a Jordan basis of C4. This must happen as the nexttheorem will show. In other words, G0(A) is four dimensional with a basisconsisting of two Jordan strings of length two.

0 0 i I -l 0 0 i i

Now -i - i 0 0A

-i -i 0 0

0 1/ 2 1 1 0 1/2 1 1

1 1 0 1/3 1 1 0 1/3

0 1 0 0

0 n n n

0 0 0 1

Note the right-hand matrix is not a Jordan block matri x but

0 0 0 0

is composed of two blocks in a block diagonal matrix, J2(0)[

0® J2 (0)

THEOREM 11.6Let S, = {x, , x2, ... , xy } and S2 = {Y1, y2, ... , y p } be two Jordan stringsbelonging to the same eigenvalue X . If the eigenvectorx, and y, are independent,then S, U S2 is a linearly independent set of vectors.

PROOF The proof is left as an exercise. O

We recall that eigenvectors belonging to different eigenvalues are indepen-dent. A similar result holds for generalized eigenvectors.

THEOREM 11.7Let S, = {X1, x2, ... , xy } , S2 = {Y1, y2, ... , yt, } be Jordan strings belong-ing to distinct eigenvalues k, and 1\2. Then S, U S2 is a linearly independent,set.


PROOF To show independence, we do the usual thing. Set a, x, +...+ayxq+

3iy[ + ... + (3,,y,, Apply (A - k, I)'. Remember (A - k, If x; _fori=Next, apply (A - k2I)"-1. Note (A - 1\2I)"-I(A - k, I)y = (A - k, I)N(Ak21)"-' being polynomials in A. Now (A - k2I)P-'y; for j = I_. ,

p - 1(A - k21)y, so (31,(X2 - k, )"y, Thus, 0,, = 0 sincek, # k2 and y, # V. Continuing in this manner, we kill off all the (3s, leavinga,x, + .. . + ayxy = it. But the xs are independent by Theorem 1 1.4, so allthe as are zero as well and we are done. 0

The next step is to give a characterization of a generalized eigenspace moreuseful than that of Theorem 11.5.

THEOREM 11.8Let A E C" xn with minimum polynomial µA (x) = (x - X, )e, (x - 1\2 )e2 ... (x -,\,)e, and characteristic polynomial XA (x) = (x - k, )d' (x _,\,)d, (x -,\,)d,.Then d i m(Gx, (A)) = d; so G,,, (A) = Null ((A - k; I )e, ).

PROOF Let X E Null(A - k;/)`,. Then (A - XjI)",x = -0 , so x is inG;,,(A).This says dim(Gh,(A)) > d; sinceNull(A - XI)e e Gh,(A),andwedetermined the dimension of Null(A - k, I )e, in the primary decompositiontheorem. Getting equality is surprisingly difficult. We begin with a Schur trian-gularization of A using a unitary matrix that puts the eigenvalue X; on the maindiagonal first in our list of eigenvalues down the main diagonal. More precisely,we know there exists U unitary such that U-I AU = T where T is upper tri-

0 k; * ... * W

0 0 k;angular and T = . Since the characteristic*

0 0 .. 0 k;

0 Rpolynomials of A and T are identical, X; must appear exactly d; times down themain diagonal. Note R is upper triangular with values different from X; on its

10 * * ... *

10 0 * ... * W

diagonal. Now consider T - k; I= I0 0 0 I, which

*0 0 ... 0 0

0

II.! Jordan Form and Generalized Eigenvectors 409

has exactly di zeros down the diagonal, while k has no zeros on its main diag-onal. Now we look at the standard basis vectors. Clearly, e, E Null(T - Xi!)and possibly other standard basis vectors as well, but ed,+,, ... , e" do not.Any of these basis vectors when multiplied by T - lei! produce a column ofT - Xi! that is not the zero column since, at the very least, k has a nonzeroentry on its diagonal. Now (T - lei 1)2 adds a superdiagonal of zeros in thedi -by-di upper left submatrix (remember how zeros migrate in an upper trian-gular matrix with zeros on the main diagonal when you start raising the matrixto powers?), so e,,e2 E Null(T - \, 1)2 for sure and possibly other standardbasis vectors as well, but ed,+,, ... , e" do not. Continuing to raise T - Xi 1 topowers, we eventually find e,,e2, ... , ej, E Null(T - it !)di but ea,+,, ... , e,,

do not. At this point, the di-by-di upper left submatrix is completely filled withzeros, so for any power k higher than di we see dim (Null (T - Xi 1)1) = di.In particular then, dim(Null(T - lei!)") = di. However, Null((T - Xi!)") =G,,, (T). But G,,, (T) = G,\, (U-'AU) = U-'G,,, (A). Since U-' is invert-ible, dim(G,,,(A)) = di as well. 0

Now, the primary decomposition theorem gives the following.

COROLLARY 11.3Let A E C"" with minimum polynomial µA (x) _ (x - k, )r' (x - iz)r2 ... (x -,\'p and characteristic polynomial XA (x) = (x - X, )d' (x - X2)d2 ... (x - X,.Then C" = G,\, (A) ® G,,2(A) ® . . . ® G,,, (A). In particular C" has a basisconsisting entirely of generalized eigenvectors of A.

Unfortunately, our work is not yet done. We must still argue that we can linea basis of generalized eigenvectors up into Jordan strings that will make a basisfor each generalized eigenspace. This we do next. Evidently, it suffices to showwe can produce a Jordan basis for each generalized eigenspace, for then we canmake a basis for the entire space by pasting bases together.

THEOREM 11.9Consider the generalized eigenspace Gk(A) belonging to the eigenvalue it ofA. Then G,,(A) = Z, ® Z2 ® ® Z., where each Zi has a basis that is aJordan string.

Now comes the challenge. To get this theorem proved is no small task. Weshall do it in bite-sized pieces, revisiting some ideas that have been introducedearlier. We are focusing on this single eigenvalue It of A. Suppose XA(x) =(x - X)dg(x) and p.A(x) = (x - k)"h(x), where X is not a root of either g(x) orh(x). The construction of Jordan strings sitting over an eigenvalue is complicatedby the fact that you cannot just choose an arbitrary basis of Eig(A, 1t) =


NUll(A -XI) c GI,(A) and build a string over each basis cigenvector. Indeed,we need a basis of a very special form; namely,

((A - W/)"-'v1, (A - \1)m2-1 v2, ... , (A - v,),

where g = nlty((A - K1)) = dim(Eig(A, K)) is the geometric multiplicity ofKandm1 > m2 > > m,, l =d =dim(Ga(A)).Then the Jordan strings

VI V2

(A - \I)vl (A - K!)v2 ... vA

(A - \!)2v1 (A - \1)2v2 (A - lk!)v,

(A - h!)I-1v1 (A - X1 )1112-1 V2 (A - Xlynk-IVk

will form a basis of G,\(A) and each column corresponds to a Jordan blockfor K. The ith column will he a basis for Zi. Now that we know where we areheaded, let's begin the journey. There is a clue in the basis vectors we seek;namely,

(A - Xl) -lvi E JVull(A - k!) f1Col((A - I)III, -I).

This suggests we look at subspaces

Nk = NUll(A - X1) fl Col((A - K!)k-1).

This we will do, but first we look at the matrix (A - K!). We claim the indexof this matrix is e, the multiplicity of K as a root of the minimal polynomial.Clearly,

( ') e Null(A - K!) e ... e JVull(A - lk!)" e ,A(ull(A - \1)e+l c ....

Our first claim is that

Mull(A - \1)e = JVull(A - \/)e+I

Suppose W E Mull(A - \1)e+I. Then (A - \/)e+Iw = -6. But p (x) _(x - A)eh(x) where GCD((x - K)e, h(x)) = I. Thus, there exist a(x), b(x) inC[x] with

1 = a(x)(x - K)e + b(x)h(x) and so

(x - lk)e = a(x)(x - lk)2e + b(x)h(x)(x - K)e

= a(x)(X - K)2e + b(x)µA(x)

I /. / Jordan Form and Generalized Eigenvectors

But then

(A - k)e = a(A)(A - \)2e + b(A)!LA(A) = a(A)(A - \)2e

and thus

(A - A)ew = a(A)(A -k)e_i(A - \)e+,W _

putting w in Mull(A - XI )e. We have proved

Null(A - \I)e = Null(A - \I)e+i

411

An induction argument establishes that equality persists from this point forwardin the chain (exercise). But could there be an earlier equality? Note that µA(x) =(x - X)eh(x) = (x - X)[(x - \)r-l h(x)] and (x - \)e-'h(x) has degree lessthan that of the minimal polynomial so that (A - X)e-' h(A) cannot be the zeromatrix. Hence there exists some nonzero vector v with (A - k)e-'h(A)v # 0whence h(A)v 54 r .Yet (A - \I)[(A - \)e-'h(A)v] =IIA(A)v so

JVull(A - \I)e-I C JVull(A - \I)e.

Thus the first time equality occurs is at the power e. This says the index of(A - XI) is e and we have a proper chain of subspaces

(')CNull(A-k1)C...CNull(A-XI)e=Xull(A-\I)e+i =

Though we have worked with null spaces, our real interest is with the columnspaces of powers of (A - XJ). We see

C' D Col(A - X1) Col((A - X1)2)... ? Col((A-\I)e-1) DCol((A-k1)e)=...

We can now intersect each of these column spaces with NUll(A - XI) to get

Arull(A - kl) 2 NUll(A - \1) flCol(A - X1) 22Null(A-Xl)f1Col((A-XI)e)=

In other words,

Ni 2N22...2Ne=Ne+1 =...,

where we can no longer guarantee the inclusions are all proper. There couldbe some equalities sprinkled about in this chain of subspaces. If we let ni =dim(N;) we have

n, >n2>...>ne


Next, we recall Corollary 3.2, which says

n, = dim(N1) = nlty((A - \I)') - nlty((A - XI)'-1

so that

nI

112

n3

nlty(A - XI)nlty(A - XI )2 - nlty(A -XI)nlty(A - \1)3 - nlty(A - \1)22

nr = nlty(A - \I)' - nlty(A - kl)r-I > In,+I = nlty(A - k/)`+I - nlty(A - iI)r = 0.

From this, it is clear what the sum of the n,s is; namely,

n1 + n2 + + n, = nlty(A - k/)r = dim(Null(A - kI )` )= dim(GI,(A)) = d.

Thus (n 1, n2, ... , ne) forms a partition of the integer d. This reminds us ofthe notion of a conjugate partition we talked about earlier. Recall the Ferrer'sdiagram:

n i . . ...

n2 ...

n3

Figure 11.1: Ferrer's diagram.

This diagram has e rows, and the first row has exactly g dots since n I is

the dimension of NUll(A - k/). It is time to introduce the conjugate partition(m I , m2, ... , in.). This means in I > n 2 > > in, > l and m 1 + +mx =d and in1 = e. It is also clear that in1 = m2 = = in,.

The next step is to build a basis ofNull (A-XI) with the help of the conjugatepartition. Indeed, we claim there is a basis of Null(A - XI) that looks like

((A - k/)m'-IVI, (A - k/)m'-1V2, ... , (A - kI)rnk-IVI)

To understand how this is, we start with a basis of N,, and successively extendit to a basis of N1. Let ( b 1 , b2, ... , be a basis of N. By definition, the bsare eigenvectors of the special form;

b1 = (A - \I)e-IVI, b2 = (A - iI)''-I V2, ... , b,,, = (A - k/)r-1V,i,.

I /. I Jordan Form and Generalized Eigenvectors 413

But remember, e=ml

bl = (A - XI)mi-'VI, b2 = (A - 1\1)n"-IV2, ... , bn, = (A - U)""',-IVne.

Now if ne = ne_I, no additional basis vectors are required at this stage. How-ever, if ne_I > ne, an additional ne_I - ne basis vectors will be needed. Thecorresponding "m" values will be e - I for tie-] - ne additional basis vectorsso that

bn,+I = (A - \I)r-2Vn,+l = (A - XI)"' +'-I Vn,+I, and so on.

Continuing in this manner, we produce a basis ofNull(A-kl) of the prescribedform. Next, we focus on the vectors

VI, V2, ... , Vg.

The first thing to note is that

AA.,,(X)=(X-X)"'' fori = 1,2,... ,g.

B construction, (A - XI)"','vi belongs to Null(A - X1) so (A - XI)',v; _. This means P A.v, (x) divides (x - X)"" . However, being a basis vector,

(A - X/)'"-Iv; # 6 so we conclude p.A,v,(x) = (x - *\)".Next, we characterize the subspaces Z of G,\(A) that correspond to Jordan

strings, which in turn correspond to Jordan blocks, giving the Jordan segmentbelonging to k in the JCF. Consider

Z, = {p(A)v I p(x) E C[x}}

for any vector v. This is the set of all polynomial expressions in A acting onthe vector v. This is a subspace that is A-invariant. Moreover, if v # V,(v, Av,... , Ak-IV) is a basis for Z, where k In particular,dim(Z,,) = degp.A.,,(x). This can be established with the help of the division al-gorithm. Even more is true. If p.A.,,(X) = (x -X)k, then (v, (A -Ik1)v, ... , (A -X/)k-Iv} is a basis for Z,.

W e want to consider these subspaces relative to V1, v2, ... , vx constructedabove. First, we note

Z,, C G>,(A) fori = 1,2,... ,g.

From this we conclude

In view of pA,,,(x) = (x - it)"'' for i = 1, 2, ... , g, each Z,, has as itsbasis (vi, (A - X I )v; , ... , (A - X I)"', -I vi). There are two major obstacles to


overcome. We must establish that the sum (1) is direct and (2) equals C5(A).The latter we will do by a dimension argument. The former is done by induction.

Let's do the induction argument first. Assume we have a typical element ofthe sum set equal to zero:

p1(A)v1 + p2(A)v2 + ... + pg(A)vg

where each p;(A)v; E Z. We would like to conclude pi(A)v; = ( fori = 1, 2, ... , g. This will say the sum is direct. We induct on rn I = e. Supposem1 = 1. Then, since m1 > m2 > > mg > I, we must be in the case whereMI =m2=...=mg= 1. This says

((A - XI)mi-Ivl, (A - k!)nt:-I V2, ... , (A - k!)nml-IV, = V V , Vg

is an independent set of vectors. By the corollary of the division algorithmcalled the remainder theorem, we see

pi(x)=(x-k)9i(x)+ri(k)for j = 1,2,... g

where ri(k) is a scalar, possibly zero. Then,

pi(A)vj _ (A - k!)9i(A)vi +ri(k)vi= µA. v,, (A)c!i (A)vi + ri (X )vi= rj (k)vj.

Thus,

pI(A)v1 + p2(A)v2 + ... + pg(A)vg = -6

reduces to

rl(X)v1 + r2(X)v2 + ... + rg(k)vg = -6.

which is just a scalar linear combination of the vs. By linear independence, allthe rj(k) must be zero. Thus,

pi (x) = (x - \)9i(x) for j = 1, 2, ... , g.

Therefore,

pi(A)=(A-k!)9i(A)for j = 1,2,... ,g

and thus

pl(A)vj = (A - k!)gi(A)vj = 0 for j = 1, 2, ... , g,

as we had hoped.

I/. 1 Jordan Form and Generalized Eigenvectors 415

Now let's assume the independence follows for m i - 1. We prove inde-pendence for m i. Rather than write out the formal induction, we will look at aconcrete example to illustrate the idea. Again, it boils down to showing µA,v, (x)divides pj(x). The idea is to push it back to the case m i = 1. Assume the resultholds for mi - I = k. It will be helpful to note µA,(A-a/)v;(x) = (x - xyn,-).For the sake of concreteness, suppose n i = 2, m2 = 2, m3 = 1, m4 = 1. Thenwe have ((A - AI)vl, (A - A!)v2, v3, v4) is an independent set. Suppose

Pi(A)vi + P2(A)v2 + p3(A)v3 + P4(A)v4 = -d-

Multiply this equation by (A - A/) and note we can commute this with anypolynomial expression in A:

pI(A)(A - AI)vi + p2(A)(A - A!)v2 + p.,(A)(A - XI)v3+ P4(A)(A - AI)v4 = .

Since V3 and v4 are eigenvectors, this sum reduces to

PI(A)(A - X1)vt + p2(A)(A - 1\ /)v2 = .

But µA,(A_a/)v,(x) = (x - A) and µA,(A-t,/)v2(x) = (x - A), so by the caseabove, we may conclude that (x - A) divides pi(x) and p2(x). Thus, we havepi(x) = (x - A)gI(x), p2(x) = (x - A)g2(x). Look again at our zero sum:

gt(A)(A - X1)vi + q2(A)(A - XI)v2 + P3(A)v3 + p4(A)v4 = -6

and note

(x - A), 1-A.(A-a1)12(x) = (x - A), AA.v,(x)= (x - A), PA.v,(x) = (x - A).

This is exactly the case we started with, so we can conclude the (x - A) dividesq, (x), q2(x), P3(x), and p4(x). Let's say q,(x) _ (x - A)h1(x), q2(x) = (x -X)h2(x) p3(x) = (x - A)h3(x), P4(x) = (x - A)h4(x). Then

pi(A)vt = gi(A)(A - AI)v, = ht(A)(A - AI)2viP2(A)v2 = q2(A)(A - A1)v2 = h2(A)(A - Al)2v2

P3(A)v3 = h3(A)(A - AI)v3 =

P4(A) = h4(A)(A - AI)v4 =

= -

The general induction argument is left to the reader. Hopefully, the idea is nowclear from the example why it works, and it is a notational challenge to writethe general argument.


The last step is toget the direct sum to equal the whole generalised eigenspace.This is just a matter of computing the dimension.

Zv, ®... ® ZvX) _ dim(Z,, )

_ deg(µA.,, (x))i=1

=d

= dim(G5(A)).

Now the general case should be rather clear, though the notation is a bit messy.The idea is to piece together Jordan bases of each generalized eigenspace to geta Jordan basis of the whole space. More specifically, suppose A E C" x" and Ahas distinct eigenvalues , . . . , X ... ,X,. Suppose the minimal polynomial of A isµA(x) = (x - X)" (x ->\)r= .. (x - X)r and gi is the geometric multiplicity of X,for i = I, 2, ... , s. That is, gi = dim(Null(A -XI)). The consequence of thediscussion above is that there exist positive integers mi j fori = 1, 2, ... , s, j =1, 2, ... , gi and vectors vij such that ei = mil > miz > > mig, >_ 1 and

µA,,,, (x) = (x - Xi)"'' such that

K,

C', = G)EDZV,/.

i=1 j=1

Indeed, if we choose the bases

Bij = ((A-Xi1)",-1vij,... ,(A-XXil)vij,vij)

for and union these up, we get a Jordan basis

V 9,

13=UU13,ji=1 j=I

for C". Moreover, if we list these vectors as columns in a matrix S, we haveA-1 AS = J, where J is a Jordan matrix we call a JCF or Jordan normal formof A(JNF). We will consider the uniqueness of this form in a moment but let'sconsider an example first.


0 1 -3i 0

Consider A =0

0 0 1 iOne way or another, we find the

0 3i 0 0characteristic polynomial and factor it:

XA(X) = (x - 3)2(x+3)2.

Hence, the eigenvalues of A are -3 and 3. Next, we determine the eigenspacesand the geometric multiplicities in the usual way we do null space calculation:

a a 0

Eig(A, 3) = Null(A - 31) = b I (A - 31) b =0

d d 0

-i

0= SP I

1-0.1

Thus, the geometric multiplicity of 3 is I. Similarly,

a a 0

Eig(A, -3) = Null(A + 31) = b I (A + 31)b = 0

d d 0

0=SP I

0

so the geometric multiplicity of -3 is also 1. Next, we seek a generalizedeigenvector for 3. This means solving

a aA b 3 b

c c

d d

-i

01

0}

We find a one parameter family of choices, one being


Thus

G3(A) = sp

0

0 -i

1 00 1

Similarly, we find

i 1 100 i

G-AA) = sp I , 0

0 1

Thus

-i 0 i 0 -I -i 0 i 0 3 1 0 00 -i 0 i 0 -i 0 i 0 3 0 0

1 0 1 0 A 1 0 1 0

=

0 0 -3 1

0 1 0 1 0 1 0 1 0 0 0 -3

Finally, we discuss the uniqueness of the JCR Looking at the example above,we could permute the block for 3 and the block for -3. Hence, there is somewiggle room. Typically, there is no preferred order for the eigenvalues of amatrix. Also, the ones appearing on the superdiagonal could be replaced byany set of nonzero scalars (see Meyers [2000]). Tradition has it to have ones.By the way, some treatments put the ones on the subdiagonal, but again this isonly a matter of taste. It just depends on how you order the Jordan bases. Let'smake some agreements. Assume all blocks belonging to a given eigenvalue staytogether and form a segment, and the blocks are placed in order of decreasingsize. The essential uniqueness then is that the number of Jordan segments andthe number and sizes of the Jordan blocks is uniquely determined by the matrix.Let's talk through this a bit more. Suppose J, and J2 are Jordan matrices similarto A. Then they are similar to each other and hence have the same character-istic polynomials. This means the eigenvalues are the same and have the samealgebraic multiplicity, so the number of times a given eigenvalue appears is thesame in J, and J2. Now let's focus on a given eigenvalue X. The geometricmultiplicity g = dim(JVull(A - Al)) determines the number of blocks in theJordan segment belonging to k. The largest block in the segment is e, where e isthe multiplicity of X as a root of the minimal polynomial. It seems tantalizinglyclose to be able to say J, and J2 are the same, except maybe where the segmentsare placed. However, here is the rub. Suppose the algebraic multiplicity is 4 andthe geometric multiplicity is 2. Then A appears four times and there are twoblocks. It could be a 3-by-3 and a 1-by-I or it could be two 2-by-2s. Luckily,there is a formula for the number of k-by-k blocks in a given segment. This

I /. I Jordan Forin and Generalized Eigenvectors 419

formula only depends on the rank (or nullity) of powers of (A - XI). We haveseen this argument before when we characterized nilpotent matrices. It has beena while, so let's sketch it out again. Let's concentrate on the d-by-d k-segment.

k ?

X

0it

This segment is a block diagonal matrix with the Jordan blocks J., (k), wheremi > m2 > ... > mg > 1, say

Seg(k) = Block Diagonal [J,,,, (k), J,1,2(k), ... , J,nx(k)].

But then

Seg(k) - kid = Block Diagonal[Nilp[mi], Nilp[m2], ... , Nilp[mg]] - N.

Recall

nlty(Nk)-{ k if I <k<d-1l d ifk>d

and so

nltY(NA1

) - nlty(Nk-1) =

0

if I <k <difk>d

Now

g

nlty(Seg(k) - kid)k = Enlty(Nilp[m;]k)

and so

nity(Seg(k) - kId)k - nlty(Seg(k) - kid)'g

_ >nlty(Nilp[m;]k) - nlty(Nilp[m;]k-I)r-i

g

= 1

i=1k < m; .


This difference of nullities thus counts how many blocks have size at least k-hy-ksince the power k has not killed them off yet. Consequently, the difference

[nlty(Seg(X) - X1a)k - nlty(Seg(X) - X1d)k-'] - [nlty(Seg(X) - X/j+i-nlty(Seg(X) - X1d)k]

counts exactly the number of blocks that are of size k-by-k. This can he restatedusing ranks:

rank(Seg(X) - X Id)k-I) - 2rank(Seg(X) - X Id)k) + rank(Seg(X) - XId)'+i ).

Note that these computations did not depend whether we are in J, or J2, so thenumber and the sizes of the Jordan blocks in every segment must be the same.Up to ordering the segments, J, and J2 are therefore essentially the same.

Further Reading

[F&I&S, 1979] S. H. Friedberg, A. J. Insel, and L. E. Spence, LinearAlgebra, Prentice Hall, Englewood Cliffs, NJ, (1979).

[H&K, 1971 ] K. Hoffman and R. Kunze, Linear Algebra, 2nd Edition,Prentice Hall, Englewood Cliffs, NJ, (1979).

[MacDuffee, 1946] C. C. MacDuffee, The Theory of Matrices, ChelseaPublishing Company, New York, (1946).

[Perlis, 1958] S. Perlis, Theory of Matrices, 2nd Edition, Addison-Wesley,Reading, MA, (1958).

[Valiaho, 1986] H. Valiaho, An Elementary Approach to the JordanCanonical Form of a Matrix, The American Mathematical Monthly,Vol. 93, (1986), 711-714.

Exercise Set 49

1. How does XA -, relate to XA? (Hint: XA (X) _ (-xLXA(X).)det(A)

2. Consider A = r 0 ] and B = [ ' 1. Argue that XA = Xe but

A and B are not similar. LJ

I l.I Jordan Form and Generalized Eigenvectors 421



5. Prove all the parts of Corollary 11.3.

6. Prove alI the parts of Theorem 11.9.

7. Fill in the details in all the computational examples in the text.

8. Prove the Cayley-Hamilton theorem using Jordan's theorem.

9. Say everything you can about a matrix A whose JCF is5 1 00 5 1

5

5 I

5

u I

a

10. Suppose A is a 4-by-4 matrix with eigenvalue 3 of multiplicity 4. Listall possible JCFs A might have. How many JCFs can an arbitrary 4-by-4matrix have? Exhibit them.

11. Prove the claim made about the lack of uniqueness of Jordan strings atthe top of page 405.

12. Argue that the standard basis {e1, e2, ... , e } is a Jordan basis for theJordan block Furthermore, the standard basis is a Jordan basis forany Jordan matrix where each block comes from a Jordan string.

13. Argue that a generalized eigenspace for A is A-invariant.

14. If S is invertible, prove that GI,(S-1 AS) = S-I(Gk(A)).

15. For a vector v, argue that the following two statements are equivalent:(1) there exists a positive integer k with (A - KI)1v = 0, and (2) thereis a sequence v1, v2, ... , vk = v such that (A - XI)Vk = vk_I, (A -XI)vk-I = Vk-2,...,(A - X!)v1 =

16. Suppose M = sp(xi, Axl, ... , Ad-1x1 ), where Adxl = for the firsttime. Suppose xI is a generalized eigenvector of level d for the eigenvalueX of A. Argue that wl = (A - X I )d-I x1, w2 = (A - kl )J-2x1, ... , wdx1 is a Jordan string that is a basis for M.


17. Do the induction argument indicated in the proof of Theorem 11.9.

18. Argue that Z,, = (p(A)v I p(x) E C[x]} is a subspace that is A-invariant.

19. Prove that Z,,, C Gx(A) f o r i = 1, 2, ... , g.

11.2 The Smith Normal Form (optional)

There is a more general approach to JCF that extends to more general scalarsthan complex numbers. For this, we need to be able to work with matrices thathave polynomial entries. In symbols, this is C[x]"'". The concepts of dealingwith scalar matrices extend naturally to matrices with polynomial entries. Wedefine matrix equivalence in the usual way. Two m-by-n matrices A(x) andB(x) in C[x]"''I are equivalent iff there exist matrices P(x) and Q(x) such thatB(x) = P(x)A(x)Q(x), where P(x) and Q(x) are invertible and of appropriatesize. Another way to say invertible is P(x) and Q(x) have nonzero scalar deter-minants. It is easy to see equivalence is indeed an equivalence relation. Whendealing with complex matrices, the fundamental result was rank normal form.This said every complex matrix was equivalent to a matrix with ones downthe diagonal, as many ones as rank, and zeros elsewhere. When dealing withmatrices in C[x]"'"", there is an analog called Smith normal form. This isnamed for Henry John Stephen Smith (2 November 1826-9 February 1883).Actually, Smith was a number theorist and obtained a canonical form for matri-ces having only integer entries. (Philos. Trans. Roy. Soc. London, 151, (1861),293-326).It was Frobenius who proved the analogous result for matrices withpolynomial entries. (Jour. Reine Angew. Math. (Crelle), 86, (1878), 146-208.)Here we have monic polynomials down the diagonal, as many as rank, eachpolynomial divides the next and zeros elsewhere.

THEOREM 11.10 (Smith normal form)Suppose A(x) is in C[x ]'m"' of rank r Then there is a unique matrix SNF(A(x))in C[x]'"" equivalent toA(x) where SNF(A(x)) is a diagonal, matrix with monicpolynomials s, (x), s2(x), ... s, (x) on the diagonal, where si (x) is divisible bys,_,(x)fori =2, ... , r.

PROOF We argue the existence first. Note that the transvections, dilations,and permutation matrices do the same work that they did for scalar matrices,even though we are now allowing polynomial entries in our matrices. Indeed,only the transvections will need polynomial entries in our proof. We note the

11.2 The Smith Normal Form (optional) 423

determinant of all these matrices is a nonzero scalar so, just as before, theyare all invertible. The proof goes by induction on m and n. The case m =it = I is clear, so we consider the case m = 1, n > 1. In this case, A(x) =[al(x) a2(x) . If all the ai(x)s are zero, we are done, so let's assumeotherwise. Then there must be an a;(x) with minimal degree. Use an elementarycolumn operation if necessary and put that polynomial in the first position. Inother words, we may assume a, (x) has minimal degree. Now, using elementarymatrices, we can replace all other aj(x)s by zero. The key is the divisionalgorithm for polynomials. Take any nonzero aj(x) in A(x) other than ai(x).Divide aj(x) by al(x) and get aj(x) = qj(x)a,(x) + rj(x), where rj(x) is zeroor its degree is strictly less than deg(a, (x)). Multiply the first column by -q1 (x)and add the result to the jth column. That produces rj(x) in the jth position.Then, if r. (x) = 0, we are happy. If not, swap rj (x) to the first position. If therestill remain nonzero entries, go through the same procedure again (i.e., dividethe nonzero entry by rj(x) and multiply that column by minus the quotient andadd to the column of the nonzero entry producing another remainder). Again,if the remainder is zero we are done; if not, go again. Since the degrees of theremainders are strictly decreasing, this process cannot go on forever. It mustterminate in a finite number of steps. In fact, this process has no more thandeg(a,(x)) steps. This completes the induction since we have all the entrieszero except the first. A dilation may be needed to produce a monic polynomial.

The case m > I and n = I is similar and is left as an exercise.Now assume m and n are greater than I. Suppose the theorem is true for

matrices of size (m-1)-by-(n-1). We may assume that the (1, 1) entry of A(x) isnonzero with minimal degree among the nonzero entries of A(x). After all, ifA (x) is zero we are done. Then if not, row and column swaps can be used if neces-sary. Now, using the method described above, we can reduce A(x) using a finite

a(,,)(x) 0 ... 0

0 azn)(x)number of elementary matrices to A, (x) _

mn(x)O 42)(x)(X) ... [i")

We would like to get all the entries divisible by a(1)(x). Iffor some i,

that is not zero is not divisible by then add the ith row to the firstrow and apply the procedure above again. Then we get a matrix A2(x) _

a(i)(x) 0 ... 0

0 a(2)(x)22...

where the degree of al I)(x) is strictly less

(2) (0 an12(x) ... a z,(x)

than the degree of a(I 1)(x). If there is still an entry not divisible by a(, i)(x),repeat the process again. In a finite number of steps, we must produce a matrix


0 ... 0

O a22)(x) ... a;'3 )(x )A3(x) = where every entry is divisible by

0 a,,,2(x) ... )711(3) (X)

a i)(x). We can use a dilation if necessary to make a, (x) monic. Now the mduc-a22)(x) a,,1 (x)

tion hypothesis applies to the lower-right corner(3) (;)am2(x) a"In

and we are essentially done with the existence argument.

Before we prove the uniqueness of the Smith normal form, we recall the ideaof a minor. Take any m-by-n matrix A. A minor of order k is obtained by choos-ing k rows and k columns and forming the determinant of the resulting squarematrix. Now the minors of a matrix with polynomial entries are polynomials sowe can deal with their greatest common divisors (GCDs).

THEOREM 11.11Let gk(x) denote the GCD of the order k minors of A(x) and hA(x) denote theGCD of the order k minors of B(x). Suppose A(x) is equivalent to B(x). Thengk(x) = hk(x) for all k.

PROOF Suppose A(x) is equivalent to B(x). Then there exist invertible P(x)and Q(x) such that B(x) = P(x)A(x)Q(x). P(x) and Q(x) are just productsof elementary matrices, so we argue by cases.

Suppose B(x) = E(x)A(x), where E(x) is an elementary matrix. We considerthe three cases. Let R(x) be an i-by-i minor of A(x) and S(x) the i-by-i minorof E(x)A(x) in the same position. Suppose E(x) = Pig, a swap of rows. Theeffect on A(x) is (1) to leave R(x) unchanged or (2) to interchange two rows ofR(x), or (3) to interchange a row of R(x) with a row not in R(x). In case (1)S(x) = R(x); in case (2), S(x) = -R(x); in case(3), S(x) is except possiblyfor a sign, another i-by-i minor of A(x).

Next, suppose E(x) is a dilation Di(x) where a is a nonzero scalar. Theneither S(x) = R(x) or S(x) = aR(x). Lastly, consider a transvection E(x) =T,j(f(x)). The effect on A(x) is (I) to leave R(x) unchanged, (2) to increaseone of the rows of R(x) by f (x) times another of row of R(x), or (3) to increaseone of the rows of R(x) by f (x) times a row not of R(x). In cases (1) and (2),S(x) = R(x); in case (3), S(x) = R(x) ± f(x)C(x), where C(x) is an i-by-iminor of A(x).

Thus any i-by-i minor of E(x) is a linear combination of i-by-i minors ofA(x). If g(x) is the GCD of all i-by-i minors of A(x) and h(x) is the GCD of all


i-by-i minors of E(x)A(x), then g(x) divides h(x). Now A(x) = E(x)-1 B(x)and E-1 (x) is a product of elementary matrices, so by a symmetric argument,h(x) divides g(x). Since these are inonic polynomials, g(x) = h(x).

Next, suppose B(x) = E(x)A(x)F(x), where E(x) and F(x) are productsof elementary matrices. Let C(x) = E(x)A(x) and D(x) = C(x)F(x). SinceD(x)T = F(x)T C(x)T and F(x)T is a product of elementary matrices, the GCDof all i-by-i minors of D(x)T is the GCD of all i-by-i minors of C(x)T. But theGCD of all i-by-i minors of D(x).The same is true for C(x)T and C(x) so theGCD of all i-by-i minors of B(x) = E(x)A(x)F(x) is the GCD of all i-by-iminors of A(x). 0

We are now in a position to argue the uniqueness of the Smith normal form.

THEOREM 11.12Suppose A(x) is in C[x]'nxn of rank r Let gk(x) denote the GCD of the order kminors of A(x). Let go(x) = I and diag[si(x), s2(x), ... , s,(x), 0, ... ,0] be aSmith normal form of A(x). Then r is the maximal integer with g,(x) nonzero

and si(x)= gi(x) for i = 1,2,...,r.gi-i(x)

PROOF We begin by arguing that A(x) is equivalent to diag[si(x),52(X), ... , Sr(X), 0, ... ,0]. These two matrices have the same GCD of mi-nors of order k by the theorem above. Except for the diagonal matrix, theseminors are easy to compute. Namely, g,, (x) = s, (x)s2(x) . . . se (x) fork = 1,2,... r. Thus, the si(x)'s are uniquely determined by A(x). o

The polynomials SO), s2(x), ... , s,(x) are called the invariant factors ofA(x).

THEOREM 11.13A(x) and B(x) in C[x I xn are equivalent if they have the same invariant factors.

PROOF The proof is left as an exercise. El

Before we go any further, let's look at an example.

Example 11. 1x x-l x+2

Let A(x) = x2 + x X2 x2 + 2xx2-2x x2-3x+2 x2+x-3


x - I x +2A(x)T12(-1) = x x22 x2

x-2 x22-3x+2 x2+x-3I x-1 x-12

T31(-x + 2)T21(-x)A(x)T17(-1) = 0 x 00 0 x + I

T31(-x + 2)T21(-x)A(x)T12(-l)T21(-x + 1)T31(-x - 2) _

1 0 00 x 00 0 x+l

T23(-I)T31(-x + 2)T21(-x)A(X)T12(-1)T21(-x + I)T31(-x - 2)

1 0 00 x -x-l0 0 x+l

T23(-1)T31 (-x + 2)T21(-x)A(x)T12(-1)T21(-x +I) T31(-x - 2)T23(1)

1 0 0

0 -1 -x - 10 x+1 x+1

T32(x + 1)T23(- I)T31(-x2)T23(1) =

1 0 00 1 x+10 0 -x22-x

1

+ 2)T21(-x)A(x)T12(- l)T21(-x + 1)T31(-x -

D2(-I)T32(x+ I)T23(-1)T31(-x+2)T71(-x)A(X)T12(-I )T21(-x+ I)T31(-x - 2)T23 T32(-x - 1)D3(-1)

1 0 0

0 1 0 = SNF(A(x)).0 0 x(x + I)

There are other polynomials that can be associated with A(x).

DEFINITION 11.7(elementary divisors)

Let A(x) E C[x]"X". Write each invariant factor in its prime factorizationover C, say si (x) = (x - X,1 )e 1(X - k2)e,2 ... (X - \,,, )e, l = 1, 2, ... , r.

However, some of the eij may be zero since si(x) divides si+1(x), ei+1J > eiii = 1, 2, ... , r - 1, j = 1, 2, ... , ki. The nontrivial factors (x - Aid )e J arecalled the elementary divisors of A(X) over C.

11.2 The Smith Normal Form (optional)

Example 11.2Suppose

SNF(A(x)) _

1 0 0

0 1 00 0 (x - 1)(x2 + 1)0 0 000 0

427

0 00 00 0

(x - I)(x2 + I )2x 0

0 (x - 1)2(x2 + 1)2x2(x2 -5)

The invariant factors are s1(x) = I, s7(x) = 1, s3(x) = (x - 1)(x2 + 1),sa(x) = (x - 1)(x2 + 1)2x, and s5(x) = (x - 1)2(x2 + 1)2x2(x2 - 5). Now theelementary divisors over the complex field C are (x - 1)22, x - 1,x - 1, (x +i)2,(x+i)2,x+i,(x-i)2,(x-i)2,x-i,x2,x,x- 15,x+ 15.

Note that the invariant factors determine the rank and the elementary divisors.Conversely, the rank and elementary divisors determine the invariant factors and,hence, the Smith normal form. T o see how this goes, suppose X1, X2, ... , Avare the distinct complex numbers appearing in the elementary divisors. Let(x - X; )r,' , (x - be the elementary divisors containing Xi. Agreeto order the degrees e, I > e;k, > 0. The number r of invariant factorsmust be greater or equal to max(kI, ... , k1)). The invariant factors can then bereconstructed by the following formula:

P

si(x) _ fl(x - A1)e.r+I-i for j = 1, 2, ... , r=1

where we agree (x - \;)ei, = 1 when j > ki.We can learn things about scalar matrices by using the following device. Given

a scalar matrix A E C"", we can associate a matrix with polynomial entriesin C[x]"""; this is the characteristic matrix xl - A. So if A = [a;j ] E C""then

x - all -a12 ...

-a21 x - a27 ...xl-A inC[x]""". Of course,

x - an,,the determinant of x l - A is just the characteristic polynomial of A. The mainresult here is a characterization of the similarity of scalar matrices.

THEOREM 11.14For A and B in C"x", the following statements are equivalent:

1. A and B are similar.


2. xl - A and xI - B are equivalent.

3. xI - A and xI - B have the same invariant. factors.

PROOF Suppose A and B are similar. Then there exists an invertible matrixS with B = S-' A S. Then it is easy to see S-' (x! - A )S = x l - B. Conversely,suppose P(x) and Q(x) are invertible with P(x)(x! - A) = (xl - B)Q(x).By dividing (carefully), write P(x) = (xi - B)P,(x) + R, and Q(x) _Q,(x)(xl - A) + R2, where R, and R2 are scalar matrices. Then, by con-sidering degree, we conclude P, (x) - Q, (x) = 0. Therefore, R, = R2 andso R,A = BR,. It remains to prove R, is invertible. Suppose S(x) is the in-verse to P(x). Write S(x) = (xl - A)Q2(x) + C, where C is a scalar matrix.Now ! = (xl - B)Q3(x)+ R,C since R,A = BR, and P(x)S(x) = 1. NoteQ3(x) = P,(x)(x! - A)Q2(x) + P,(x)C + R,Q,(x). Now, by consideringdegrees, conclude Q3(x) is zero. Thus R,C = I and we are done.

We leave it as an exercise that Jordan's theorem follows from the existenceand uniqueness of the Smith normal form.

THEOREM 11.15 (Jordan's theorem)If A is a square complex matrix, then A is similar to a unique Jordan matrix (upto permutation of the blocks).

PROOF The proof is left as an exercise. The uniqueness comes from theuniqueness of the Smith normal form of xI - A. 0

Exercise Set 50

1. With the notation from above, argue that the number of elementary divi-r

sors of A(x) is Eki.i=

2. Suppose A(x) is invertible in C[x]" "", Argue that det(A(x)) is a nonzeroconstant and the converse. (Hint: Look at A(x)B(x) = 1 and take deter-minants of both sides.)

3. Argue that A(x) is invertible in C[x]n"" iff A(x) is a product of elementarymatrices in C[x]""

4. Prove that the characteristic polynomial of A E C""" is the product ofthe invariant factors of xl - A.


5. Prove that the minimum polynomial of A E C"x" is the invariant factorof xI - A of highest degree.

6. Prove that A E Cn xn is similar to a diagonal matrix iff x I - A has linearelementary divisors in C[x].

7. Prove that if D is a diagonal matrix, the elementary divisors of x] - Dare its diagonal elements.

8. Argue that matrix equivalence in C[x]"mxn is an equivalence relation.



Further Reading

[Brualdi, 1987] Richard A. Brualdi, The Jordan Canonical Form: An OldProof, The American Mathematical Monthly, Vol. 94, No. 3, 257-267,(1987).

[Filippov, 19711 A. F. Filippov, A Short Proof of the Theorem on Re-duction of a Matrix to Jordan Form, Vestnik, Moscow University, No. 2,18-19,(197 1). (Also Moscow University Math. Bull., 26,70-71,(197 1).)

[F&S, 1983] R. Fletcher and D. Sorensen, An Algorithmic Derivation ofthe Jordan Canonical Form, The American Mathematical Monthly, Vol.90, No. 1, 12-16, (1983).

[Gantmacher,1959] F. R. Gantmacher, The Theory of Matrices, Vol. 1,Chelsea Publishing Company, New York, (1959).

[G&W, 1981 ] A. Galperin and Z. Waksman, An Elementary Approach toJordan Theory, The American Mathematical Monthly, Vol. 87, 728-732,(1981).

[G&L&R, 1986] I. Gohberg, P. Lancaster, and L. Rodman, InvariantSubspaces of Matrices with Applications, John Wiley & Sons, New York,(1986).


[H&J, 1986] R. Horn and C. R. Johnson, Introduction to Matrix Analysis,Cambridge University Press, Cambridge, (1986).

[Jordan, 1870] C. Jordan, Traite des Substitutions et des EquationsAlgebriques, Paris, (1870), 125.

[L&T, 1985] Peter Lancaster and Miron Tismenetsky, The Theory ofMatrices: With Applications, 2nd Edition, Academic Press, Orlando,(1985).

[Noble, 1969] Ben Noble, Applied Linear Algebra, Prentice Hall,Englewood Cliffs, NJ (1969).

[Sobczk, 1997] Garret Sobczyk, The Generalized Spectral Decomposi-tion of a Linear Operator, The College Mathematics Journal, Vol. 28,No. 1, January, (1997), 27-38.

[Strang, 1980] Gilbert Strang, Linear Algebra and Its Applications, 2ndEdition Academic Press, New York, (1980).

[T&A 1932] H. W. Turnbull and A. C. Aitken, An Introduction to theTheory of Canonical Matrices, Blackie & Son, London, (1932).

Chapter 12

Multilinear Matters

bilinear map, bilinear form, symmetric, skew-symmetric,nondegenerate, quadratic map, alternating

12.1 Bilinear Forms

In this section, we look at a generalization of the idea of an inner product.Let V1, V2, and W be vector spaces over I8 or C, which we denote by IF whenit does not matter which scalars are being used.

DEFINITION 12.1 (bilinear map)A bilinear map cp is a function cp : V, x V2 -- W such that

1. cp(x + y, z) = cp(x, z) + p(y, z) for all x, y E V, all z E V2

2. p(ax, z) = ap(x, z) for all a E IF for all x E V, all z E V2

3. cp(x, y + z) = p(x, y) + p(x, z) for all x E V1, all y, z E V2

4. cp(x, (3y) = p(x, y)P for all R E F, all x E V1, ally E V2.

In particular, if V, = V2 and W = IF, we traditionally call p a bilinear form.We denote L2(V, , V2; W) to be the set of all bilinear maps on V, and V2 withvalues in W. We write L2(V; W) for L2(V, V; W).

We note that the name "bilinear" makes sense, since a bilinear map is linear ineach of its variables. More precisely, if we fix yin V2, the map dP(y) : V, -* Wdefined by d,p(y)(x) = p(x, y) is linear and the map s,p(x) defined by s,p(x)(y) _p(x, y) is linear for each fixed x in V1. Here s (x) : V2 - ) W.

We also note that the zero map O(x, y) = is a bilinear map and any linearcombination of bilinear maps is again bilinear. This says that L2(V,, V2; W) isa vector space over F in its own right. Now let's look at some examples.

431

432 Multilinear Mutters

Example 12.1

1. Let VI and V2 be any vector spaces over IF and let f : VI - -> F andg : V2 - F he linear maps (i.e., linear functionals). Then p(x, y)f (x)g(y) is a bilinear form on VI and V2.

2. Fix a matrix A in 1F"' `n. Define YA(X, Y) = XT AY, where X E IF"'Iyand Y E P't'. Then SPA E L2(IF'n"y, IF' ; IFy"t'). We can make PA intoa bilinear form by modifying the definition to YA(X, Y) = tr(XT A Y).

3. If we take q = p = I in (2) above, we get

PA

all a12 al,,,

a21 a22 ... a2m_ [xl...xml

ami anus an,nn, if

= XT AY = E > ail xi yi Ei=li=l

In gory detail,xl yl

YA

xm yn

F.

allxlyl + al2xlx2y2+ ... +aI,xlyn

+a2lx2yl +a22x2y2+ ... +a2,x2yn

+a,nlxn,yl +am2xmy2+

To be even more concrete,

r xl l ylSPA

L X2 J, y2

Y3

_ [xIx21l alla12 a13

a21 a22 a23 1

Y1

Y2

y3

+amn x,,,yn

1

Yl

= [xlaII + x2a21 x1a12 + x2a22 x1a13 + x2a23] Y2

Y3

= (xlalI + x2a21)y1 + (x1a12 + x2a22)Y2 + (xja13 + x2a23)y3

= allxlyl +a2lx2yl + a12x1y2 + a22x2y2 + a13xly3 +a23x2y3

= (allxlyl + al2xIy2 + a13xly3) + (a21x2y1 + a22x2y2 + a23x2y3).

12.1 Bilinear Forms 433

We spent extra time on this example because it arises often in practice. Asan extreme case, cpz : F x F -* F by (pz(x, y) = xzy is a bilinear form for eachfixed scalar z.

Next we collect some elementary computational facts about bilinear forms.

THEOREM 12.1Let cp be a bilinear form on V. Then

/. cp(', y) = 0 = cp(x, -6) = cp(', 1) for all x E V

2. cp(ax + (3y, z) = acp(x, z) + (3cp(y, z) for all a, (3 E F, all x, y, z E V

n

3. cp(.Eca;x,, y) = iE1a1cp(xt, y) for all ai E F. all x;, y E V

4. cp(x, ay + (3z) = p(x, y)a + cp(x, z)(3 = acp(x, y) + (3cp(x, z) for alla,(3EFallx,y,zE V

m m5. p(x, E (3i yi) = E p(x, Yi )Ri for all 13j E F, all y1, x E V

j=1 j=1

6. cp(ax+ 3y, yw+8z) = ay(x, w)y+ay(x, z)8+ 3cp(y, w)y+(3cp(y, z)8for all a, 3,y, in lFandallx,y,z, win V

7.n m if ni

E RjYj) = E E a;cp(x1,Yi)13jJ=J 1=1 1=I

8. cp(-x, y) = -cp(x, y) = cp(x, -y) for all x, y in V

9. p(-x, -y) = cp(x, y) for all x, y in V

10. cp(x + y, x + y) = cp(x, x) + cp(x, y) + cp(y, x) + cp(y, y) for all x, y E V

11. cp(x - Y, X - Y) = cp(x, x) - cp(x, y) - cp(Y, x) + p(Y, y) for all x, y E V

12. cp(x + y, x + y) + cp(x - y, x - y) = 2cp(x, x) + 29(y, y) for all x, y E V

13. cp(x + Y, x + Y) - cp(x - Y, x - y) = 2pp(x, y) + 2cp(y, x) for all x, y E V

14. cp(x + y, x - Y) = cp(x, x) - p(x, y) + cp(y, x) - cp(Y, y)

15. cp(x - y, x + Y) = cp(x, x) - cp(y, x) + cp(x, y) - cp(y, y)

16. cp(x + Y, x - Y) + 9(x - Y, x + y) = 2p(x, x) - 2pp(Y, y)

17. cp(x - y, x + Y) - pp(x + y, x - y) = 2pp(x, y) - 2p(Y, x)

18. cp(x, y) + cp(Y, x) = c0(x + y, x + Y) - p(x, x) - p(Y, y)

434 Multilinear Matters

PROOF The proofs are routine computations and best left to the reader. 0

It turns out arbitrary bilinear forms can be studied in terms of two specialkinds of forms. We have the following trivial decomposition of an arbitrary(P

(P(x, y) = 2 [,P(x, y) + pP(y, x)] + 2 y) - P(y, x)]

= IPVVHI(x, y) + pskew(x, Y).

The reader will be asked to verify that y,,,,(x, y) = yvv(y, x) for all x, yand y ,is bilinear and epskew(y, x) _ -(P,'keu,(X, y) where again y.t., is bi-linear. The first property of is called symmetry and the second is skew-symmetry.

DEFINITION 12.2 (symmetric, skew-symmetric)Let ip be a bilinear form on V. Then we say

1. y is symmetric iff ep(x, y) = y(y, x) for all x, y E V.

2. y is skew-symmetric iff ep(x, y) = -cp(y, x) for all x, y E V.

The fancy talk says that L2(V ; F) is the direct sum of the subspace consistingof all symmetric bilinear forms and the subspace of all skew-symmetric bilin-ear forms. So, in a sense, if we know everything about symmetric forms andeverything about skew-symmetric forms, we should know everything about allbilinear forms. (Right!)

DEFINITION 12.3 (nondegenerate)A bilinear form y : V x W --+ F is called nondegenerate on the left if

y(v, w) = 0 for all w E W means v = 6. Also, y is nondegenerate on theright iff y(v, w) = O for all v E V means w = 6. We call y nondegenerateiff y is nondegenerate on the left and on the right.

We have seen that dp(y) : V -* IF and s,(x) : W -* F are linear. That is, foreach y E W, d,,(y) E L(V;F) = V* and for each x E V, sp(x) E L(W;F) _W*. We can lift the level of abstraction by considering dp : W -* V* andsY : V -> W* by y r--> d,p(y) and x F-- s,p(x). These maps are themselveslinear.

A bilinear map cp is a function of two variables. However, there is a naturalway to associate a function of one variable by "restricting ep to the diagonal."That is, look at cp(x, x) only.

12.1 Bilinear Forms 435

DEFINITION 12.4 (quadratic map)Let V and W be vector spaces over IF. A quadratic map from V to W is a

map Q : V ----+ W such that

1. Q(ax) = a2 Q(x) for all a E IF, all x E V.

2. Q(x + y) + Q(x - y) = 2Q(x) + 2Q(y), all x, y E V.

if W = F, we, again traditionally, speak of Q as a quadratic form.

THEOREM 12.2Let Q be a quadratic map. Then

1. Q(()=0.2. Q(-x) = Q(x) for all x E V.

3. Q(x - y) = Q(y - x) for all x, y E V.

4. Q(ax + (3y) = Q(ax) + 2a[3Q(x) + Q((3x).

5. Q(ax + 3y) + Q(ax - 3y) = 2a2 Q(x) + 2R2 Q(y)

6. Q(Z (x + y)) + Q(Z (x - y)) = 12 2Q(x) + 1 Q(y)

7. Q(x + y) - Q(x - y) = z [Q(x + 2y) + Q(x - 2y)]

8. 2Q(x + z + y) + 2Q(y) = Q(x+z+2y)+ Q(x+z).

9. 2Q(x+y)+2Q(z+y)= Q(x+z+2y)+Q(x-z).

PROOF Again the proofs are computations and referred to the exercises. 0

Now beginning with p : V x V ----) W bilinear, we can associate a map0 : V --> W by 6(x) = p(x, x). Then cD is a quadratic map called thequadratic map associated with p.

Exercise Set 51

1. LetA = f 1 2 3

L 4 5 6by this matrix.

I . Write out explicitly the bilinear form determined

2. Work out alI the computational formulas of Theorem 12.1.


3. Verify all the claims made in the previous examples.

4. Let cp be a bilinear form on the IF vector space V. Then p is calledalternating iff p(x, x) = 0 for all x E V. Argue that p is alternating ifand only if p is skew-symmetric.

5. Let cp be a bilinear form on V, W and S : V ---> V and T : V ---> V belinear maps. Define ps.T(v, w) = p(Sv, Tw) for V E V, W E W. Arguethat Ps.T is also a bilinear form on V, W.

6. Let pp be a bilinear form on V. Then p,,,,,(x, y) = z [p(x, y) + p(y, x)]is a symmetric bilinear form on V. Show this. Also show p,tew(x, y) =z [p(x, y) - p(y, x)] is a skew symmetric bilinear form on V. Argue thatcp(x, y) = epv,,,,(x, y) + pPv4eu,(x, y) and this way of representing p as asymmetric plus a skew-symmetric bilinear form is unique.

7. Argue that L2(V, W; F) is a vector space over F. Also argue that thesymmetric forms are a subspace of L2(V; F), as are the skew-symmetricforms, and that L2(V; F) is the direct sum of these to subspaces.

8. Let V be the vector space C ([-'rr, 'rr]) of F valued functions definedon [-zr, w], which are continuous. Define (p(f, g) = f ,, f (x)g(x)dx.Argue that p is a symmetric bilinear form on V.

9. You have seen how to define a bilinear map. How would you define atrilinear map p : Vi x V2 x V3 -* W. How about a p-linear map?

10. Let p be bilinear on V. Show that p,,.,,,(x, y) = a (cp (x + y, x + y)-cp(x-y,x-y)}.

11. Let y be a bilinear form on V (i.e., cp E L2(V; IF)). Then s,p(x) : V - Fand d,p(y) : V - IF are linear functionals. Show s,p : V - L(V; F) =V* and d, : V -) L(V;IF) = V* are linear maps in their ownright and p is nondegenerate on the right (left) if d,p(sp) is injectiveiff ker(d,o)(ker(s,,)) is trivial. Thus p is nondegenerate iff both s,p and d"are injective. In other words, p is degenerate iff at least one of ker(s,p) orker(d,p) is not trivial.

12. Suppose a bilinear form is both symmetric and skew-symmetric. Whatcan you say about it.

13. Verify the computational rules for quadratic maps given in Theorem 12.2.

14. Let p : V x V -* W be bilinear. Verify that the associated quadraticmap 0: V -> W given by (D(x) = ep(x, x) is indeed a quadratic map.

12.2 Matrices Associated to Bilinear Forms 437

15. A quadratic equation in two variables x and y is an equation of the formaxe+2bxy+cy2+dx+ey+ f = 0.1 Write this equation in matrix form

where x = y 1 and A = I b bJ.Note xr Ax is the quadratic form

associated to this equation. Generalize this to three variables; generalizeit to n variables.

16. Suppose A E R""" and A= AT. Call A(or Q) definite iff Q(x) = xTAxtakes only one sign as x varies over all nonzero vectors x in R", Call Apositive definite iff Q(x) > 0 for all nonzero x in R", negative definiteif Q(x) < 0 for all nonzero x in R", and indefinite iff Q takes onboth positive and negative values. Q is called positive semidefinite ifQ(x) > 0 and negative semidefinite iff Q(x) < 0 for all x. Thesewords apply to either A or Q. Argue that A is positive definite iff all itseigenvalues are positive.

17. Prove that if A is positive definite real symmetric, then so is A-'.

18. Argue that if A is positive definite real symmetric, then det(A) > 0. Isthe converse true?

19. Prove that if A is singular, ATA is positive semidefinite but not positivedefinite.

Further Reading

[Lam, 1973] T. Y. Lam, The Algebraic Theory of Quadratic Forms, TheBenjamin/Cummings Publishing Company, Reading, MA, (1973).

congruence, discriminant

12.2 Matrices Associated to Bilinear Forms

Let cp be a bilinear form on V and 5 = {b1, b2, ..., be an ordered basisof V. We now describe how to associate a matrix to cp relative to this basis. Letxand ybe in V. Then x=xibi andy= yibi +..+y"b,,,so


n n n n n n

pp(x, y) = pp(r_xibi, y) = Exip(bi, y) = Exiy(bi, >yjbi) = r_ r_xiyiYi=I i=I i=I j=1 i=lj=I

(bi, bj).Let aid = cp(bi, bj). These n2 scalars completely determine the action of

y on any pair of vectors. Thus, we naturally associate the n-by-n matrix A =[aid] = [,.p(bi, b;)] := Mat (,p; B). Moreover, y(x, y) = Mat(x; C3)T Mat(,p; B)Mat(y; B).

Conversely, given any n-by-n matrix A = laij] and an ordered basis 8 ofV, then p(x, y) = Mat(x; C3)T AMat(y; B) defines a bilinear form on V whosematrix is A relative to 8. It is straightforward to verify that, given an orderedbasis B of V, there is a one-to-one correspondence between bilinear forms onV and n-by-n matrices given by p r---* Mat(y,13). Moreover, Mat(acpi +b(p2; B) = aMar((pi; 13) + bMat(cp2; BY

The crucial question now is what happens if we change the basis. How doesthe matrix representing p change and how is this new matrix related to theold one. Let C be another ordered basis of V. How is Mat(y; B) related toMat(cp, C)? Naturally, the key is the change of basis matrix. Let P be the (in-vertible) change of basis matrix P = PB+c so that Mat(x; B) = PMat(x; C).Then for any x, y E V, p(x, y) = Mat(x; B)T Mat(cp; C3)Mat(y; B) _

(PMat(x;C))T Mat((p; B)PMat(y;C) =Mat(x;C)T(PT Mat(cp;B)P)Mat(y;C). But also p(x, y) _Mat(x; C)T (Mat((p;C)Mat(y;C), so we conclude

Mat((p;C) = PT Mat(,p;B)P.

This leads us to yet another equivalence relation on n-by-n matrices.

DEFINITION 12.5 (congruence)A matrix A E F" "" is congruent to a matrix B E IF"" iff there exists an

invertible matrix P such that B = PT AP. We write A B to symbolizecongruence.

THEOREM 12.3is an equivalence relation on F" x 11, That is,

1. A^- Aforall A.

2.IfA^-B,then B--A.

3. IfA - BandB^- C, thenA - C.


12.2 Matrices Associated to Bilinear Forms 439

We note that congruence is a special case of matrix equivalence so, for exam-ple, congruent matrices must have the same rank. However, we did not demandthat PT equal P-' , so congruence is not as strong as similarity. Congruent ma-trices need not have the same eigenvalues or even the same determinant. Anyproperty that is invariant for congruence, such as rank, can be ascribed to theassociated bilinear form. Thus, we define the rank of cp to he the rank of anymatrix that represents cp. Moreover, we have the following theorem.

THEOREM 12.4Let cp be a bilinear form on V. Then cp is nondegenerate iff rank(cp) = dim V.


Suppose A and B are congruent. Then B = P7 A P for some nonsingular ma-trix P. Then det(B) = det(PT AP) = det(P)2det(A). Thus, the determinantsof A and B may differ, but in a precise way. Namely, one is a nonzero squarescalar times the other. We define the discriminant of a bilinear form cp to be{a2det(A) I a i4 0, A represents cp in some ordered basis} This set of scalarsis an invariant under congruence. We summarize this section with a theorem.

THEOREM 12.5Let cp be a bilinear form on V and B = {b1, b2, , an ordered basisof V. Then cp(x, y) = Mat(x,13)T Mat(cp; 8)Mat(y; B) where Mat(p, B) =[p(bt, b;)]. Moreover, if C is another ordered basis of V, then Mat(p;C) =(Pc_a)T Mat(cp; l3)Pc,a. Moreover, if two matrices are congruent, then theyrepresent the same bilinear form on V. Also, p is symmetric if Mat(cp; B) is asymmetric matrix and p is skew-symmetric iff Mat(p;13) is a skew-symmetricmatrix.

Exercise Set 52


2. Find two congruent matrices A and B that have different determinants.




orthogonal, isotropic, orthosymmetric, radical, orthogonal direct sum

12.3 OrthogonalityOne of the most useful geometric concepts that we associate with inner

products is to identify when two vectors are orthogonal (i.e., perpendicular).We can do this with bilinear forms as well. Given a bilinear form cp, we candefine the related notion of orthogonality in the natural way. Namely, we say xis p-orthogonal to y iff p(x, y) = 0. We can symbolize this by x ..L y. If pis understood, we will simplify things by just using the word "orthogonal" andthe symbol 1. Unlike with inner products, some strange things can happen. It ispossible for a nonzero vector to be orthogonal to itself! Such a vector is calledisotropic, and these vectors actually occur meaningfully in relativity theory inphysics. It is also possible that a vector x is orthogonal to a vector y but y is notorthogonal to x. However, we have the following nice result.

THEOREM 12.6Let cp be a bilinear form on V. Then the orthogonality relation is symmetric(i.e., x 1, y implies y 1, x) iff cp is either symmetric or skew-symmetric.

PROOF If y is symmetric or skew-symmetric, then clearly the orthogonalityrelation is symmetric. So, assume the relation is symmetric. Let x, y, z c V.Let w = p(x, y)z - p(x, z)y. Then we compute that x 1 w. By symmetryw 1 x, which is equivalent to p(x, y)p(z, x) - p(x, z)ep(y, x) = 0. Set x = yand conclude cp(x, x) [p(z, x) - p(x, z)] = 0 for all x, z E V. Swapping x andz, we can also conclude p(z, z) [p(z, x) - p(x, z)] = 0. This seems to be goodnews since it seems to say p(z, z) = 0 (i.e., z is isotropic or p(z, x) = cp(x, z)).The problem is we might have a mixture of these cases. The rest of the proofsays its all one (all isotropic) or all the other (i.e., p is symmetric).

So suppose cp is not symmetric (if it is, we are finished). Then there mustexist vectors u and v such that p(u, v) i4 p(v, u). Then, by what we showedabove, p(u, u) = p(v, v) = 0. We claim p(w, w) = 0 for all w in V. Sinceu is isotropic cp(u + w, u + w) = cp(u, w) + cp(w, u) + cp(w, w). If w is notisotropic, then cp(w, x) = p(x, w) for all x E V. In particular, p(w, u) = p(u, w)and cp(w, v) = p(v, w). By above, p(u, w)p(v, u) - p(u, v)cp(w, u) = 0, socp(u, w) [cp(v, u) - p(u, v)] = 0; but p(u, v) 54 p(v, u), so we must conclude

12.3 Orthogonality 441

Ip(w, u) = p(u, w) = 0. Similarly, p(w, v) = p(v, w) = 0. Thus y(u + w, u +w) = cp(w, w). But y(u + w, v) = p(u, v) + p(u, v) = cp(u, v) # p(v, u) =p(v, u + w). Thus u + w is also isotropic. Therefore, p(w, w) = 0. Therefore,all vectors in V are isotropic and cp is alternating, hence skew-symmetric. 0

To cover both of these cases, we simply call p orthosymmetric whenever theassociated orthogonality relation is symmetric.

Now let S be any subset of a vector space V with p an orthosymmetricbilinear form on V. We define Sl = (v E V I (p(v, s) = 0 for all s E S). Wedefine the radical of S by Rad(S) = S fl Sl so that Rad(V) = V'. We seethat p is nondegenerate iff Rad(V)

THEOREM 12.7Let S be any subset of the vector space V.

1. S1 is always a subspace of V.

2. SC(S1)'.

3. If S, a S2, then Sz a Si .

4. If S is a finite dimensional subspace of V, then S = S.

PROOF The proof is relegated to the exercises. 0

If V is the direct sum of two subspaces Wn and W2 (i.e., V = W, ®W2), we callthe direct sum an orthogonal direct sum iff W, c WZ and write V = W, ®1 W2.Of course, this idea extends to any finite number of subspaces.

If cp is a bilinear form on V then, by restriction, p is a bilinear form on anysubspace of V. The next theorem says we can restrict our study to nondegenerateorthosymmetric forms.

THEOREM 12.8Let S be the complement of Rad (V ). Then V = Rad (V) ®1 S and p restrictedto S is nondegenerate.

PROOF Since Rad(V) is a subspace of V, it has a complement. Choose one,say M; then V = Rad(V) ® M. But all vectors are orthogonal to Rad(V), soV = Rad(V)®1M.LetvE Mf1Ml.Thenv E Mlsov E Rad(V).Buty E Malso, so v E Rad(V) fl M = ( ). Hence v = . Thus M fl Ml = (-6). 0


Exercise Set 53


2. Consider the bilinear form associated with the matrix A =L

1 00 -1

Identify all the isotropic vectors.

3. A quadratic function f of n variables looks like f (x1, x2, ..., xn) _n n

>gijxixj + Ecixi + d. What does this formula reduce to if n = 1?i=I j=1 i=IArgue that f can be written in matrix form f (v) = v7 Qv + cT v + d,where Q is a real n-by-n symmetric nonzero matrix. If you have hadsome calculus, compute the gradient V f (v) in matrix form.

orthogonal basis, orthonormal basis, Sym(n;C),Sylvester's law of inertia, signature, inertia

12.4 Symmetric Bilinear FormsIn this section, we focus on bilinear forms that are symmetric. They have a

beautiful characterization.

THEOREM 12.9Suppose cp is a symmetric bilinear form on the finite dimensional space V. Thenthere exists an ordered basis B of V such that Mat(p; 5) is diagonal. Such abasis is called an orthogonal basis.

PROOF The proof is by induction We seek a basis B = {bI, b2, ... ,

such that p(bi, bj) = 0 if i # j. If 9 = 0 or n = 1, the theorem holdstrivially, so suppose 9 # 0 and n > I . We claim p(x, x) 0 for some x E V. Ifcp(x, x) = 0 for all x E V, then p(x, y) = 1 {ep(x+y, x+y) - cp(x -y, x - y)} _0 for all x, y, making cp = 0 against our assumption. Let W = sp(x). Weclaim V = W ®1 W1. First, let z E W n W -L. Then z = ax for some aand z E W1 so z I. z. Thus, cp(z, z) = 0 = cp(ax, ax) = a2p(x, x). Sincep(x, x) 54 0, a2 = 0, so a = 0 so z = '. We conclude W n W1 = (V).

cp(v, x) (cp(v, x)Now let v E V and set b = v - p(x x) x. Then V =±P(x x)

x + b and

/2.4 Symmetric Bilinear Forms 443

cp(x, b) = p(x, v) - p(v' x) cp(x, x) = 0 since p is symmetric. Thus, b E W1cp(x, x)

and so V = W ® W1. Now restrict p to W1, which is n - l dimensional. Byinduction there is a basis (b,, b2, ... , b"_, } with p(x;, bj) = 0 if i # j. Addx to this basis. 0

This theorem has significant consequences for symmetric matrices undercongruence.

COROLLARY 12.1Any symmetric matrix over F is congruent to a diagonal matrix.

PROOF Let A be an n-by-n symmetric matrix over F. Let PA be the sym-metric bilinear form determined by A on F", (i.e., cpA(x, y) = XT Ay). ThenMat(p;std) = A, where std denotes the standard basis of F". Now, by theprevious theorem, there is a basis 5 of 1F" in which Mat((pA; B) is diagonal.But Mat(epA; B) and Mat(ePA; std) = A are congruent and so A is congruentto a diagonal matrix. 0

Unfortunately, two distinct diagonal matrices can be congruent, so the diag-onal matrices do not form a set of canonical forms for the equivalence relation

d, 0

d2

of congruence. For example, if Mat(p, CB) and we

0 d"select nonzero scalars a,, ... , of,,, we can "scale" each basis vector toget a new

faid, 0

basis C = {a,b,,... , a"b"). Then Mat(cp, C) =

0 a?

and these two matrices are congruent.Now we restrict our attention to the complex field C. Here we see congruence

reduces to ordinary matrix equivalence.

THEOREM 12.10Suppose cp is a symmetric bilinear form of rank r over the n-dimensional spaceV. Then there exists an ordered basis B = (b,, b2, ... , b" } of V such that

1. Mat(p; B) is diagonal

I forj=l,2,...,r2. 9(bj, bi) =0 for j > r


PROOF Let (a,, ... , he an orthogonal basis as provided by Theorem12.9. Then there exist r values of j such that cp(a,, a1) # 0, else the rank of cpwould not be r. Reorder this basis if necessary so these basis vectors become the

II

first r. Then define bj = cp(a ,, aj )ai for j = 1, r Then we have

I aiforj>rour basis so that Mat(p; Ci) -10 ®] . 0

COROLLARY 12.2If p is a nondegenerate symmetric bilinear form over the n-dimensional spaceV, then V has an orthonormal basis for p.

Let Sym(n; C) denote the set of n-by-n symmetric matrices over C. Let Ik.,,,be the matrix with k ones and in zeros on the main diagonal and zeros elsewhere.Then the rank of a matrix is a complete invariant for congruence, and the setof all matrices of the form Ik.,,, for k + m = n is a set of canonical forms forcongruence. That is, every matrix in Sym(n; C) is congruent to a unique matrixof the form for some k = 0, 1, ... , n and m = n - k.

The story over the real field is more interesting. The main result is named forJames Joseph Sylvester (3 September 1814-15 March 1897).

THEOREM 12.11 (Sylvester's law of inertia)Suppose ep is a symmetric bilinear form of rank r on the real n-dimensionalvector space V . Then there is an ordered basis 1 3 = (b, , b2, ... , such that

1k

Mat(p, B) _ I. where k is the number of ones,

k + m = r, and k is an invariant for p under congruence.

PROOF Begin with the basis (a,, a,,,... , a } given by Theorem 12.9. Re-order the basis if necessary so that p(a1, at) = 0 for j > r and cp(aj, aj) # 0 for

I < j < r. Then the basis B = {b,, b2, ... , where bj = aj,cp(aj, aj)

1 < j < r, and bj = ai for j > r yields a matrix as above. The hard partis to prove that k, m, l do not depend on the basis chosen. Let V+ be the sub-space of V spanned by the basis vectors for which cp(bj, bj) = I and V-the subspace spanned by the basis vectors for which p(bi, bj) _ -l. Nowk = dim V+. If -6 # x E V+, p(x, x) > 0 on V+ and if ( x E V-,then p(x, x) < 0. Let Vl be the subspace spanned by the remaining basis

/2.4 Symmetric Bilinear Forms 445

vectors. Note if x E V1, cp(x, y) = 0 for all y E V. Since B is a basis,we have V = V+ e V-- ® V 1-. Let W be any subspace of V such thatcp(w, w) > 0 for all nonzero w E W. Then W fl span {V -, VI) = (6 )for suppose w E W, b E V -, c E V 1 and w + b + c=-6. Then 0 =cp(w, w + b + c) = cp(w, w) + p(w, b) + cp(w, c) = cp(w, w) + p(w, b) and0 = p(b, w + b + c) = cp(b, b) + cp(b, w) + -6 = cp(b, b) + cp(w, b). Butthen p(w, w) = cp(b, b), but cp(w, w) > 0 and cp(b, b) < 0, so the only way outis cp(w, w) = cp(b, b) = 0. Therefore w = b = 6 so c = V as well. NowV = V+ ® V- ® V -L and W, V-, V' are independent so dim W < dim V+. IfW is any subspace of V on which cp takes positive values on nonzero vectors,dim W < dim V+. So if 5' is another ordered basis that gives a matrix /k'.,,,i.ti,then V+ has dimension k1 < dim V+ = k. But our argument is symmetricin these bases so k < k'. Therefore, k = dim V+ = dim Vi+ for any suchbasis. Since k + I = r and 0 + 11 = r, it follows l = 11 as well, and sincek + l + in = n, we must have in I = in also. 0

Note that V1 = Rad(V) above and dim(V1) = dim V - rank(p). Thenumber dim V+ - dim V- is sometimes called the signature of cp.

THEOREM 12.12Congruence is an equivalence relation on Sym(n;R), the set of n-by-n sym-metric matrices over the reals. The set of matrices Ik,t,,,, where k + I + in = nis a set of canonical forms for congruence and the pair of numbers (k, in) or(k + in, k - in) is a complete invariant for congruence.

In other words, two real symmetric matrices are congruent iff they have thesame rank and the same signature.

Exercise Set 54

1. Suppose A is real symmetric positive definite. Argue that A must benonsingular.

2. Suppose A is real symmetric positive definite. Argue that the leadingprincipal submatrices of A are all positive definite.

3. Suppose A is real symmetric positive definite. Argue that A can be re-duced to upper triangular form using only transvections and all the pivotswill be positive.

446 Multilineur Mutters

4. Suppose A is real symmetric positive definite. Argue that A can he fac-tored as A = LDLT, where L is lower triangular with ones along itsdiagonal, and D is a diagonal matrix with all diagonal entries positive.

5. Suppose A is real symmetric positive definite. Argue that A can he fac-tored as A = LLT, where L is lower triangular with positive diagonalelements.

6. Suppose A is real and symmetric. Argue that the following statementsare all equivalent:

(a) A is positive definite.(h) The leading principal submatrices of A are all positive definite.(c) A can be reduced to upper triangular form using only transvections

and all the pivots will be positive.(d) A can he factored as A = LLT, where L is lower triangular with

positive diagonal elements.(e) A can be factored as A = BT B for some nonsingular matrix B.(f) All the eigenvalues of A are positive.

7. Suppose A is a real symmetric definite matrix. Do similar results toexercise 6 hold?

8. Suppose A is real symmetric nonsingular. Argue that A2 is positive defi-nite.

9. Argue that a Hermitian matrix is positive definite iff it is congruent to theidentity matrix.

10. Suppose H is a Hermitian matrix of rank r. Argue that H is positive

semidefinite iff it is congruent to ® ®] . Conclude H > 0 iff

H = P*P for some matrix P.

11. Argue that two Hermitian matrices A and B are congruent iff they havethe same rank and the same number of positive eigenvalues (countingmultiplicities).

12. Suppose A and B are Hermitian. Suppose A is positive definite. Arguethat there exists an invertible matrix P with P*AP = I and P*BP = D,where D is a diagonal matrix with the eigenvalues of A` B down thediagonal.

13. Suppose A is a square n-by-n complex matrix. Define the inertia of Ato he ('rr(A), v(A), B(A)), where rr(A) is the number of eigenvalues of A

12.5 Congruence and Symmetric Matrices 447

(counting algebraic multiplicities) in the open right-half plane, v(A) is thenumber of eigenvalues in the open left-half plane, and S(A) is the numberof eigenvalues on the imaginary axis. Note that a(A)+v(A)+8(A) = n.Argue that A is nonsingular iff S(A) = 0. If H is Hermitian, argue that,rr(H) is the number of positive eigenvalues of H, v(H) is the number ofnegative eigenvalues of H, and b(H) is the number of times zero occurs asan eigenvalue. Argue that rank(H) = 7r(H)+ v(H) and signature(H) _1T(H) - v(H).

14. Suppose A and B are n-by-n Hermitian matrices of rank r and supposeA = MBM* for some M. Argue that A and B have the same inertia.

Further Reading

[Roman, 1992] Steven Roman, Advanced Linear Algebra, Springer-Verlag, New York, (1992).

[C&R, 19641 J. S. Chipman and M. M. Rao, Projections, GeneralizedInverses and Quadratic Forms, J. Math. Anal. Appl., IX, (1964), 1-11.

12.5 Congruence and Symmetric MatricesIn the previous sections, we have motivated the equivalence relation of con-

gruence on n-by-n matrices over F. Namely, A, B E F"', A -c B iff thereexists an invertible matrix P such that B = PT A P. Evidently congruence is aspecial case of matrix equivalence, so any conclusions that obtain for equivalentmatrices hold for congruent matrices as well. For example, congruent matricesmust have the same rank.

Now P being invertible means that P can he expressed as a product ofelemen-tary matrices. The same is so for Pr. Indeed if PT = En, E,,,_1 ... El, then P =E

1Ez ... ET . Therefore, B = PT A P = E,,, En, _, ... E I A E i EZ En . Thus,

B is obtained from A by performing pairs of elementary operations, each pairconsisting of an elementary row operation and the same elementary column


operation. Indeed1 0 0 a b c 1 k 0it 1 0 d e f 0 1 0 =0 0 1 g h k 0 0 1

a aK+b cXa+d k2a+2Xb+e Xc+f

g kg+h k

Thus, we see if A is symmetric, (i.e., A = AT), then E7 'A E is still symmetricand, if ET zeros out the (i, j) entry of A where i # j, then E zeros out the(j, i) entry simultaneously. We are not surprised now by the next theorem.

THEOREM 12.13Let A be a symmetric matrix in Fr""'. Then A is congruent to a diagonal matrixwhose first r diagonal entries are nonzero while the remaining n - r diagonalentries are zero.

The proof indicates an algorithm that will effectively reduce a symmetricmatrix to diagonal form. It is just Gauss elimination again: E,,, E, A = PT Aproduces an upper triangular matrix while Ei E; . . . ET produces a diagonalmatrix. These elementary operations have no effect on the rank at each stage.

ALGORITHM 12.1

[AII [EiAEfIEf] -+ ... _

[EII...EIAET...EET...E] = I DP]For example, f L

1 2 2- 1 I 2 2:A= I 2 3 5, then 2 3 5

2 5 52 5 5 :

T, (-2)A T21(-2)T . 1 T21(-2)T

1 0 2

0 -1 1

2 1 5

I -2 0

0 I 0 I -'0 0 1

12.5 Congruence and Symmetric Matrices 449

T31(- 2)A" )T31( -2)TT21(-2)T T31 (-2)T

1 0 0 1 -2 -20 -1 1 0 1 0 -*

0 1 5 0 0 1

T23(-1)A(2) T23(-1)T T21(-2)T T31(-2)T T23(-1)T

1 0 0 1 0 -2= 0 -2 0 0 1 0

0 0 1 0 -1 1

Then

1 0 -2 T 1 0 -2 1 0 00 1 0 A 0 1 0 = 0 -2 00 -1 1 0 -l 1 0 0 1

A couple of important facts need to be noted here. Since P need not be anorthogonal matrix (P-1 = pT), the diagonal entries of PT AP need not beeigenvalues of A. Indeed, for an elementary matrix E, EAET need not be thesame as EAE-I. Also, the diagonal matrix to which A has been reduced aboveis not unique.

Over the complex numbers, congruence reduces to equivalence.

THEOREM 12.141

Let A E C"". Then A is congruent toL

® ® ] . Therefore, two matrices A,

B E C""" are congruent iff they have the same rank.


Exercise Set 551 2 3

I. Find a diagonal matrix congruent to 2 4 5

3 5 6

2. Argue that if A is congruent to B and A is symmetric, then B must hesymmetric.

3. Prove directly that the matrix [0 J

is not congruent to I0 0 ,

over R, but they are congruent over C. L

12.6 Skew-Symmetric Bilinear FormsRecall that a bilinear form p is called skew-symmetric if cp(x, y) = -ep(y, x)

for all x, y E V. Any matrix representation of p gives a skew-symmetric matrixA7.= -A and skew-symmetric matrices must have zero diagonals. Suppose

u, v are vectors in V with cp(u, v) = 1. Then p(v, u) 1= -1. If we restrict ep to

H = span{u, v}, its matrix has the formL

01J .

Such a pair of vectors

is called a hyperbolic pair and H is called a hyperbolic plane.

THEOREM 12.15Let p be a skew symmetric bilinearform on the F vector space V of n dimensions.Then there is a basis B = { a , , b , , a2, b,, ... , ak, bk, where

Mat(cp, B) =

[ -1 0

[-11 0

where rank((p) = 2k.0

PROOF Suppose cp is nonzero and skew-symmetric on V. Then there existsa pair of vectors a, b with cp(a, b) 4 0 say ep(a, b) = a. Replacing a by(I /a)a, we may assume cp(a, b) = 1. Let c = as + Pb. Then ep(c, a) =

12.6 Skew-Symmetric Bilinear Forms 451

ip(aa+(3b, a) = (3p(b, a) = -(3 and cp(c, b) = p(aa+(3b, b) = acp(a, b) = aso c = cp(c, b)a - ep(c, a)b. Note a and b are independent. Let W = span {a, b}We claim V = W ® W j-. Let x be any vector in V and e = p(x, b)a - cp(x, a)band d = x - c. Then C E W and d E W1 since cp(d, a) = p(x - ep(x, b)a +p(x, a)b, a) = p(x, a) + p(x, a)p(b, a) = 0 and, similarly, cp(d, b) = 0. Thus,V = W+ W -L. Moreover, W fl W -L = (-0 ), so V = W ®W 1. Now (p restrictedto W -L is a skew-symmetric bilinear form. If this restriction is zero, we are done.If not, there exist a2, b2 in W' with cp(a2, b2) = 1. Let W2 = span {a,, b2}.Then V = W ® W2 ® Wo. This process must eventually cease since we onlyhave finitely many dimensions. So we get p(aj, bj) = 1 for j = 1, ... , k andp(aj, aj) = cp(b;, bj) = cp(a;, bi) = 0 if i i4 j and if Wj is the hyperbolicplane spanned by {ai, bj }, V = W, ® ... (D Wk ® Wo, where every vector inWO is orthogonal to all aj and bj and cp restricted to WO is zero. It is clear thematrix of y relative to jai, b, , a2, b2, ... , at , bk, c, , ... , ct,} has the advertised

form. 0

COROLLARY 12.3If cp is a nondegenerate skew symmetric bilinear form on V, then dim(V)must be even and V is a direct sum of hyperbolic planes, and Mat ((p; B) is

[ -0 1]for some basis B.

Exercise Set 56

1. Prove that every matrix congruent to a skew-symmetric matrix is alsoskew-symmetric.

2. Argue that skew-symmetric matrices over C are congruent iff they havethe same rank.

3. Prove that a skew-symmetric matrix must have zero trace.

4. Suppose A is an invertible symmetric matrix and K is skew-symmetricwith (A + K)(A - K) invertible. Prove that ST AS = A, where S =(A + K)-'(A - K).


5. Suppose A is an invertible symmetric matrix and K is such that (A +K)(A - K) is invertible. Suppose ST AS = A, where S = (A + K)-1(A -K) and I + S is invertible. Argue that K is skew-symmetric.

12.7 Tensor Products of Matrices

In this section, we introduce another way to multiply two matrices togetherto get another matrix.

DEFINITION 12.6 (tensor or Kronecker product)Let A E C" and B E Cp,q. Then A ® B is defined to be the mp-by-nq

C11 ... C1n

matrix A ® B = , where Ci1 is a block matrix of'size

Con I . . . Cwnpq-by-pq defined by C,1 _ (ent,1(A))B = aid B. In other words,

a11B a1 2B ...

A®B=a21 B a2 2 B ... a2n B

an,1 B a,, 2B amnB

For example,

all a12r bit b12 bl3

a21 a22

1

0L b21 b22 b73

a31 a32

ali bi, al lbl2 allb13 al2bll al2bl2 a12b13

all b2l al lb22 allb23 a12b21 a12b22 a12b23

a2l bll a2 1b12 a2lb13 a22bll a22b12 a22bl3all b2l al l b22 all b23 a22b21 a22b22 a22b23

a31 bll a3 1b12 a31b13 a32b11 a32b12 a32b13

a31 b21 a3 1 b22 a31 b23 a32b21 a32b22 a32b23

We can view A ® B as being formed by replacing each element all of Aby the p-by-q matrix all B. So the tensor product of A with B is a partitionedmatrix consisting of m rows and l columns of p-by-q blocks. The ij'h blockis a;1 B. The element of A ® B that appears in the [p(i - 1) + r]'h row and[q(j - 1) + s]'t' column is the rs'h element ae1b of a;j B.

12.7 Tensor Products of Matrices 453

Note that A ® B can always be formed regardless of the size of the matrices,unlike ordinary matrix multiplication. Also note that A ® B and B ® A have thesame size but rarely are equal. In other words, we do not have a commutativemultiplication here. Let's now collect some of the basic results about computingwith tensor products.

THEOREM 12.16 (basic facts about tensor products)

1. For any matrices A, B, C, (A ®B) ®C = A ®(B ®C).

2. For A, B m-by-n and C p-by-q,(A+B)®C=A®C+B®CandC®(A+B)=C®A+C®B.

3. If a is a scalar (considered as a 1-by-1 matrix),then a ® A = aA = A ® a for any matrix A.

4. O®A = O = A ®O for any matrix A.

5. If a is a scalar, (aA) ® B = a(A ®B) = A ®(aB)for any matrices A and B.

6. IfD=diag(di,d2,... ,d),then D®A=diag(d, A, d2A,... , d A).

7.

8. For any matrices A, B, (A ® B)T = AT ® BT .

9. For any matrices A, B, (A ® B)* = A* ® B*.

10. If A is m-by-n, B p-by-q, C n-by-s, D q-by-t, then (A ® B)(C ® D) _(AC) ® (BD).

11. If A and B are square, not necessarily the samesize, then tr(A ® B) = tr(A)tr(B).

12. If A and B are invertible, not necessarily the same size, then (A 0 B)-' _A-' 0 B-1.

13. If A is m-by-m and B n-by-n, then det(A (& B) = (det(A))"(det(B))".

14. If A is m-by-n and B p-by-q and if A91, BA' are one-inverses of A andB, respectively, then As' 0 is a one-inverse of A 0 B.


15. For any, matrices A and B, rank(A ® B) = rank(A)rank(B).Consequently A ® B has full row rank iff A and B have full row rank. A

similar statement holds for column rank. In particular A ® B is invertibleiff both A and B are.

16. A®B=A®B.

PROOF The proofs will he left as exercises.

All right, so what are tensor products good for? As usual, everything in linearalgebra boils down to solving systems of linear equations. Suppose we are tryingto solve AX = B where A is n-by-n, B is m-by-p, and X is n-by-p; say

[uzz

I [ X3 XX4b-

I = [ b2,:b;; If we are clever, we can write

this as an ordinary system of linear equations:all ail 0 0 xi bpi

a21 a22 0 0 X3 b21

0 0 all a12 X2 bi2.This is just (/2(& A)x=b

o o a21 a22 x4 J [ b22where x and b are big tall vectors made by stacking the columns of X and B ontop of each other. This leads us to introduce the vec operator.

We can take an m-by-n matrix A and form an mn-by- I column vector bystacking the columns of A over each other.

DEFINITION 12.7 (vec)Coll (A)Co12(A)

Let A be an to-by-n matrix. Then vec(A) = E C""'+i. Now if A

Col (A)is m-by-n, B n-by-p, and X n-by-p, we can reformulate the matrix equationAX = B as (1t, ® A)x = b, where x = vec(X) and b = vec(B). If we cansolve one system, we can solve the other. Let's look at the basic properties ofvec. Note the (i, j) entry of A is the [(j - 1)m + i J`t' element of vec(A).

THEOREM 12.17 (basic properties of vec)

1. If x and y are columns vectors, not necessarily the same size, thenvec(xyT) = y ® X.

2. For any scalar c and matrix A, vec(cA) = c vec(A).

3. For matrices A and B of the same size, vec(A + B) = vec(A) + vec(B).


4. Let A and B have the same size, then tr(AT B) = [vec(A)]T vec(B).

5. If A is m-by-n and B n-by-p, thenAcoli B

Acol2(B)vec(AB) diag(A, A_. , A)vec(B) = (IN 0

Acol,,(B)A)vec(B)

= (BT 0 I,,,)vec(A).

6. Let A be m-by-n, B n-by-p, C p-by-q; then vec(ABC) = (CT ®A)vec(B).

PROOF The proofs are left as exercises.

We have seen that AX = B can be rewritten as (Ip (& A)vec(X) = vec(B).More generally, if A is m-by-n, B m-by-q, C p-by-q, and X n-by-p, thenAXC = B can be reformulated as (CT ® A)vec (X) = vec(B). Anotherinteresting matrix equation is AX + X B = C, where A is m-by-m, B isn-by-n, C is m-by-n, and X, m-by-n. This equation can be reformulated as(A 0 1 + 1,,, (9 BT)vec(X) = vec(C). Thus, the matrix equation admits aunique solution iff A ®1 + Im ® BT is invertible of size mn-by-mn.

Exercise Set 57

1. Prove the various claims of Theorem 12.16.

2. Prove the various claims of Theorem 12.17.

3. Is it true that (A 0 B)+ = A+ ® B+?

a b c

4. Work out vec( d e f ) explicitly.g h i

a b c

5. Define rvec(A) = [vec(AT )]T . Work out rvec( d e f ) explic-g h i

itly. Explain in words what rvec does to a matrix.

6. Prove that vec(AB) = (1t, (& A)vec(B) = (BT(& I)vec(A) = (BT ®A)uec(1 ).


7. Show that vec(ABC) = (CT ® A)vec(B).

8. Argue that tr(AB) = vec(A )T vec(B) = vec(BT)T vec(A).

9. Prove that tr(ABCD) = vec(DT )T (CT ® A)vec(B) = vec(D)t (A ®CT)vec(BT ).

10. Show that aA ®(3B = a(3(A (9 B) = (a(3A) ® B = A ®(a(3B).

11. What is (A®B®C)(D®E(9 F)='?

12. If X E X(A) and µ E X(B) with eigenvectors v and w, respectively, arguethat Aµ is an eigenvalue of A ® B with cigenvector v ® w.

13. Suppose v and w are m-by- I . Show that vT ®w = wv7 = w ®vT andvec(vwT) = w ® v.

14. Prove that vec(A) = vec(A).

15. Prove that vec(A*) = vec(AT).

16. Suppose G E A( l} and HE B(I). Prove that G(9 HE A® B(1}.

17. Prove that the tensor product of diagonal matrices is a diagonal matrix.

18. Prove that the tensor product of idempotent matrices is an idempotentmatrix. Is the same true for projections'?

19. Is the tensor product of nilpotent matrices also nilpotent?


12.7.1.1 Tensor Product of Matrices

MATLAB has a built-in command to compute the tensor (or Kronecker)product of two matrices. If you tensor an n-by-n matrix with a p-hy-q matrix,you get an mp-by-nq matrix. You can appreciate the computer doing this kindof operation for you. The command is

kron(A,B).

Just for fun, let's illustrate with matrices filled with prime numbers.

>>A = zeros(2); A(:) = primes(2)

2 5A=3 7


>>B = zeros(4); B(:) = primes(53)

2 II 23 41

3 13 29 43B=5 17 31 47

7 19 37 53

>>kron(A, B) ans =

4 22 46 82 10 55 115 205

6 26 58 86 15 65 145 215

10 34 62 94 25 85 155 235

14 38 74 106 35 95 185 265

6 33 69 123 14 77 161 287

9 39 87 129 21 91 203 301

15 51 93 141 35 119 217 329

21 57 III 159 49 133 259 371

Try kron(eye(2),A) and kron(A,eye(2)) and explain what is going on.This is a good opportunity to create a function of our own that MATLAB

does not have built in. We do it with an M-file. To create an.m file in a Windowsenvironment, go to File, choose New, and then choose M-File. In simplest form,all you need is to type

B = A(:) f unction B = vec(A)

Then do a Save.Try out your new function on the matrix B created above.

>>vec(B)

You should get a long column consisting of the first 16 primes.

Appendix A

Complex Numbers

A.1 What is a Scalar?

A scalar is just a number. In fact, when you ask the famous "person on thestreet" what mathematics is all about, he or she will probably say that it is aboutnumbers. All right then, but what is a number? If we write "5" on a piece ofpaper, it might be tempting to say that it is the number five, but that is not quiteright. In fact, this scratching of ink "5" is a symbol or name for the number andis not the number five itself. We got this symbol from the Hindus and Arabs.In ancient times, the Romans would have written "V" for this number. In fact,there are many numeral systems that represent numbers. Okay, so where doesthe number five, independent of how you symbolize it, exist? Well, this is gettingpretty heavy, so before you think you are in a philosophy course, let's proceedas if we know what we are talking about. Do you begin to see why mathematicsis called "abstract"?

We learned to count at our mother's knee: 1, 2, 3, 4, .... (Remember that` ... " means "keep on going in this manner," or "etc" to a mathematician.)]These numbers are the counting numbers, or whole numbers, and we officiallyname them the "natural numbers." They are symbolized by I`l = 11 9 2, 3 ... } .

After learning to count, we learned that there are some other things you cando with natural numbers. We learned the operations of addition, multiplication,subtraction, and division. (Remember "gozinta" ? 3 gozinta 45 fifteen times.)It turns out that the first two operations can always be done, but that is not trueof the last two. (What natural number is 3 minus 5 or 3 divided by 5?) We alsolearned that addition and multiplication satisfy certain rules of computation. Forexample, m+n = n+m for all natural numbers m and n. (Do you know the nameof this rule?) Can you name some other rules? Even though the natural numbersseem to have some failures, they turn out to be quite fundamental to all ofmathematics. Indeed, a famous German mathematician, Leopold Kronecker(7 December 1823-29 December 1891), is supposed to have said that God

Do you know the official name of the symbol " . . "?

459

460 Complex Numbers

created the natural numbers and that all the rest is the work of humans. It is notclear how serious he was when he said that.

Anyway, we can identify a clear weakness in the system of natural numbersby looking at the equation:

m + x = n. (Al).

This equation cannot always be solved for x in N (sometimes yes, but notalways). Therefore, mathematicians invented more numbers. They constructedthe system of integers, Z, consisting of all the natural numbers together withtheir opposites (the opposite of 3 is -3) and the wonderful and often underratednumber zero. Thus, we get the enlarged number system:

Z = {. .. ,-2,-1,0, 1,2,...I.

We have made progress. In the system, Z, equation (1) can always be solvedfor x. Note that N is contained in Z, in symbols, N e Z. The construction of Zis made so that all the rules of addition and multiplication of N still hold in Z.

We are still not out of the woods! What integer is 2 divided by 3? In termsof an equation,

ax = b (A.2)

cannot always he solved in Z, so Z also is lacking in some sense.Well all right, let's build some more numbers, the ratios of integers. We call

these numbers fractions, or rational numbers, and they are symbolized by:

lb

(Remember, we cannot divide by zero. Can you think of a good reason why

not'?) Since a fraction is as good as a itself, we may view the integers as

contained in Q, so that our ever expanding universe of numbers can be viewedas an inclusion chain:

NcZcQ.Once again, the familiar operations are defined, so nothing will he lost from theearlier number systems, making Q a rather impressive number system.

In some sense, all scientific measurements give an answer in Q. You canalways add, multiply, subtract, and divide (not by zero), and solve equations

A. I What is a Scalar? 461

(A. 1) and (A.2) in Q. Remember the rules you learned about operating withfractions?

a c ad + be(A 2)'

b d bd.

(A 4)\b/ ..dl bd.

a c _ ad - bcA 5

b d bd

\b/ \d/ad

( . )

(A.6)

( "Invert and multiply rule" )

It seems we have reached the end of the road. What more can we do that Qdoes not provide? Oh, if life were only so simple! Several thousand years ago,Greek mathematicians knew there was trouble. I think they tried to hush it upat the time, but truth has a way of eventually working its way out. Consider asquare with unit side lengths (1 yard, l meter-it does not matter).

1

1

Figure Al.!: Existence of nonrational numbers.

Would you believe we have seen some college students who thought b + _ °+e !? What were

they thinking? It makes you shiver all over!

462 Complex Numbers

The diagonal of that square has a length whose square is two. (What famoustheorem of geometry led to that conclusion?) The trouble is there are no ratios

of integersb

whose square is two; that is, x2 = 2 has no solution in Q. Even

is lacking in some sense, namely the equation:

x`-=2 (A.7)

cannot he solved forx in Q, although we can get really close to a solution in Q. Ittook mathematicians quite awhile to get the next system built. This is the systemR of real numbers that is so crucial to calculus. This system consists of all ratio-nal numbers as well as all irrational numbers, such as f, /.3, V(5, ii, e, ... .

Our chain of number systems continues to grow:

NcZcQcR.Once again, we extend our operations to R so we do not lose all the wonderful

features of Q. But now, we can solve x2 = 2 in R. There are exactly twosolutions, which we can write as f and -.. Indeed we can solve x2 = pfor any nonnegative real number p. Aye, there's the rub. You can prove squarescannot he negative in R. Did you think we were done constructing numbersystems? Wrong, square root breath! How about the equation

2x = -3? (A.8)

No, the real numbers, a really big number system, is still lacking. You guessedit, though, we can build an even larger number system called the system ofcomplex numbers C in which (8) can be solved.

Around 2000 B.C., the Babylonians were aware of how to solve certain casesof the quadratic equation (famous from our high school days):

axe+bx+c=0.

In the sixteenth century, Girolamo Cardano (24 September 1501-21 Septem-ber 1576) knew that there were quadratic equations that could not be solvedover the reals. If we simply "push symbols" using rules we trust, we get theroots of ax 2 + bx + c = 0 to be

r-bf r b2 b22-4ac2a 2a 2a

But what if b2 - 4ac is negative'? The square root does not make any sense--4then in R. For example, consider x2 + x + 1 = 0. Then r =

I f 1

=2

- 2± 2 . If you plug these "imaginary" numbers into the equation, you do

A.1 Whut is a Scular? 463

get zero if you agree that = -1. What can I say? It works! You cantell by the name "imaginary" that these "numbers" were held in some mistrust.It took time but, by the eighteenth century, the remarkable work of LeonardEuler (15 April 1707-18 September 1783) brought the complex number systeminto good repute. Indeed, it was Euler who suggested using the letter i for.(Yes, I can hear all you electrical engineers saying "i is current; we use j for." I know, but this is a mathematics book so we will use i for .) Ourroots above can be written as

1 /3-,r=-2f 2 e

We are now in a position to give a complete classification of the roots of anyquadratic equation ax 2 + bx + c = 0 with real number coefficients.

Case 1: b2 - 4ac is positive.-b b -4 a c

Then r =2a ± 2a

gives two distinct real roots of the equa-

tion, one for the + and one for the -.

Case 2 : b2 - 4ac is zero.b

Then r = -2a

is a repeated real root of the equation.

Case 3 : b2 - 4ac is negative.

-b l)(4ac - b2) -6 4ac - b2Then r =

=2a f 2a 2a 2a W- if) - (2a)

4ac --b 2

C

i = a ± (3i gives the so-called "complex conjugate pair" of2a

roots of the quadratic. This is how complex numbers came to be viewed asnumbers of the form a + (3i or a + i 13.

It took much more work after Euler to make it plain that complex numberswere just as meaningful as any other numbers. We mention the work of CasperWessel (8 June 1745-25 March 1818), Jean Robert Argand (18 July 1768-13 August 1822), Karl Friedrich Gauss (30 April 1777-23 February 1855) (hispicture was on the German 10 Mark bill) and Sir William Rowen Hamilton (4August 1805-2 September 1865) among others. We are about to have a goodlook at complex numbers, since they are the crucial scalars used in this book.One of our main themes is how the system of complex numbers and the systemof (square) complex matrices share many features in common. We will use theapproach of Hamilton to work out the main properties of the complex numbersystem.

464

Further Reading

Complex Numbers

[Dehacne, 1997] S. Dehaene, The Number Sense, Oxford UniversityPress, New York, (1997).

[Keedy, 1965] M. L. Keedy, Number Systems: A Modern Introduction,Addison-Wesley, Reading, MA, (1965).

[Nahin, 1998] P. J. Nahin, An Imaginary Tale: The Story of ,f 1,

Princeton University Press, Princeton, NJ, (1998).

[Roberts, 1962] J. B. Roberts, The Real Number System in an AlgebraicSetting, W. H. Freeman and Co., San Francisco, (1962).

A.2 The System of Complex Numbers

We will follow Hamilton's idea of defining a complex number z to be anordered pair of real numbers. So z = (a, b), where a and b are real, is acomplex number; the first coordinate of z is called the real part of z, Re(z),and the second is called the imaginary part, lm(z). (There is really nothingimaginary about it, but that is what it is called.) Thus, complex numbers looksuspiciously like points in the Cartesian coordinate plane R2. In fact, they canbe viewed as vectors in the plane. This representation will help us draw picturesof complex numbers. It was Argand who suggested this and so this is sometimescalled an Argand diagram.

The operations on complex numbers are defined as follows: (Let z i = (xi, yi )and z2 = (x2, y2) be complex numbers):

1.1 addition: zi e z2 = (xi + x2, y, + y2)

1.2 subtraction: zi e Z2 = (x1 - X2, y, - y2)1.3 multiplication: zi G Z2 = (x,x2 - yIy2, xi y2 + yI x2)There are two special complex numbers 0 (zero) and 1 (one). We define:

1.4 the zero 0 = (0, 0)1.5 the multiplicative identity 1 = (I , 0)

zi (x2xi + y2yi xi y2 - x2yi 11.6 division: - = 2 2 2 J provided z2 # 0-Z2 xZ + y2 x2 + y2

A.2 The System of Complex Numbers 465

Some remarks are in order. First, there are two kinds of operations going onhere: those in the complex numbers ®, O, etc. and those in R, +, etc. We aregoing to play as if we know all about the operations in R. Then, we will deriveall the properties of the operations for complex numbers. Next, the definitionsof multiplication and division may look a little weird, but they are exactly whatthey have to be to make things work.

Let's agree to write C = ]R2 for the set of complex numbers equipped withthe operations defined above. We can take a hint from viewing C as R 2 andidentify the real number a with the complex number (a, 0). This allows us toview R as contained within C. So our inclusion chain now reads

NcZcQcRcc.Also, we can recapture the usual way of writing complex numbers by

(a, b) = (a,0)e(0, b) = a ®(b,0)0(0, 1) = a ®bai

Since the operations in R and C agree for real numbers, there is no longer aneed to be so formal with the circles around plus and times, so we now dropthat. Let's just agree

(a, b) = a + bi.

APA Exercise Set 1

1. Use the definitions of addition, subtraction, multiplication, and divisionto compute zi + zz, zi - zz. z, z2, zi /zz, and zZ, z, , where zi = 3 + 4i,and Z2 = 2 - i. Express you answers in standard form a + bi.

2. If i = (0, 1), use the definition of multiplication to compute i2.

3. Find the real and imaginary parts of:

(a) 3 - 4i(b) 6i(c) (2 + 3i)2(d) ( _/ 3 --i ) 3

(e) (I + i)10.

4. Find all complex numbers with z2 = i.

5. Consider complex numbers of the form (a, 0), (b, 0). Form their sum andproduct. What conclusion can you draw'?

466 Complex Numbers

6. Solve z + (4 - 3i) = 6 + 5i for z.

7. Solve z(4 - 3i) = 6 + 5i for z.

8. Let z 0 0, z = a +bi. Compute1

in terms of a and b. What is I , I '?

z i 2+i9. Prove Re(iz) = -Im(z) for any complex number z.

10. Prove Im(iz) = Re(z) for any complex number z.

11. Solve (1 + i )z2 + (3i )z + (5 - 4i) = 0.

12. What is 09991

13. Draw a vector picture of z = 3 + 4i in 1It22. Also draw 3 - 4i and -4i.

A.3 The Rules of Arithmetic in CNext, we will list the basic rules of computing with complex numbers. Once

you have these, all other rules can be derived.

A.3.1 Basic Rules of Arithmetic in C

A.3.1.1 Associative Law of Addition

zi + (Z2 + Z3) = (zi + Z2) + Z3 for alI zi, Z2, Z3 in C.(This is the famous "move the parentheses" law.)

A.3.1.2 Existence of a Zero

The element 0 = (0, 0) is neutral for addition; that is, 0 + z = z + 0 = z for allcomplex numbers z.

A.3.1.3 Existence of Opposites

Each complex number z has an opposite -z such that z+(-z) = 0 =(-z)+z.More specifically, if z = (x, y), then -z = (-x, -y).

A.3.1.4 Commutative Law of Addition

zI+ Z2 =Z2+zi for all zi, Z2 in C.

A.3 The Rules of Arithmetic in C 467

A.3.1.5 Associative Law of Multiplication

ZI (Z2Z3) = (ZI Z2)Z3 for all Z I, Z2, Z3 in C.

A.3.1.6 Distributive Laws

Multiplication distributes over addition; that is,ZI(Z2 + Z3) = ZIZ2 + ZIZ3 for all zl, Z2, Z3 in C and also (zl + Z2)Z3 =

ZIZ3 + Z2Z3

A.3.1.7 Commutative Law for Multiplication

ZIZ2 = Z2ZI for all zl, Z2 in C.

A.3.1.8 Existence of Identity

One is neutral for multiplication; that is, 1 z = z 1 = z for all complexnumbers z in C.

A.3.1.9 Existence of Inverses

For each nonzero complex number z, there is a complex number z-I so thatzz-I = z-Iz = 1. In fact, ifz = (x, y) # 0, then z-I =

x -yCxz + y2' x2 + y2)l̀eave most of them toWe will illustrate how these rules are established and

you. Let us set a style and approach to proving these rules. It amounts to "stepsand reasons"

PROOF of 2.1Let ZI = (al, bl), Z2 = (a2, b2), and Z3 = (a3.b3)Then zl + (z2 + Z3) = (al , bl) + ((a2, b2) + (a3,b3) by substitution,= (albs) + (a2 + a3, b2 + b3) by definition of addition in C,= (al + (a2 + a3), bl + (b2 + b3)) by definition of addition in C,= ((al + a2) + a3, (bl + b2) + b3) by associative law of addition in R,= (al + a2, bl + b2) + (a3b3) by definition of addition in C,= ((al , bl) + (a2, b2)) + (a3, b3) by definition of addition in C,= (zI + Z2) + Z3 by substitution. a

You see how this game is played'? The idea is to push everything (usingdefinitions and basic logic) back to the crucial step where you use the givenproperties of the reals R. Now see if you can do the rest. We can summarizeour discussion so far. Mathematicians say that we have established that C isafield.

468 Complex Numbers

APA Exercise Set 2

1. Prove all the other basic rules of arithmetic in C in a manner similar tothe one illustrated above.

2. Provezi z2=0iffzi =0orz2=0.

3. Prove the cancellation law: z,z = z2z and z $ 0 implies z, = Z2-

4. Note subtraction was not defined in the basic rules. That is because wecan define subtraction by z, - Z2 = z, +(-z2). Does the associative lawwork for subtraction?

5. Note division was not defined in the basic rules. We can define z, - Z2 =z,

Z2, if z2 0 0. Can you discover some rules that workby z, - Z2 = z, z;

for division?

_ I az + b2 a6. Suppose z = a + bi 0. Argue that f _ V 2 ( 1 + az + bz

a 1Va2+b2(a2)ifb>_Oand- F7 z zb+a

+ i 1 - ) if b < 0. Use this information to find I + i.a2 + b2

C

7. Argue that axz+bx+c = O has solutions z = - (_)+J(_)2 a

b cand z = - (2a) - V

(2ab )L -a

numbers.

even if a, b, and c are complex

A.4 Complex Conjugation, Modulus, and DistanceThere is something new in C that really is not present in R. We can form

the complex conjugate of a complex number. The official definition of complexconjugate is: if z = (a, b) = a + bi, then z := (a, -b) = a - bi. Next wecollect what is important to know about complex conjugation.

A.4 Complex Conjugation, Modulus, and Distance 469

A.4.1 Basic Facts About Complex Conjugation

(3.1)Z=zforallzinC.(3.2)zi+z2=Zi+Z2 forallz1,z2inC.(3.3) zi -z2 =Zi - Z2 for all Zi, Z2 in C.(3.4) TI-z2 = Z,Z2 for all zi, z2 in C.

(3.5) (-ZI _? forallzi,z2inCwithZ2 Z2

C1\ - 1

/I for all nonzero z in C.(3.6)z

iAs illustration again, we will prove one of these and leave the other proofs

where they belong, with the reader.

PROOF of 3.4 Let z, = (a I, b,) and z2 = (a2, b2). Then,ziz2 = (a,, b,) (a2, b2) by substitution,

= (a,a2 - b1b2, a,b2 + b,a2) by definition of multiplication in C,= (a,a2-b,b2, -(a,b2+b,a2)) by definition of complex conjugate,= (a,a2 - b1b2, -a,b2 - b,a2) by properties of additive inverses

in R,= (a, a2 - (-b,) (-b2), a, (-b2) + (-b, )a2) by properties of additive

inverses in R,_ (a,, -b,)(a2, -b2) bydefinitionofmultiplication

in C,= Z, Z2 by substitution. 0

Again, see how everything hinges on definitions and pushing back to whereyou can use properties of the reals.

Next, we extend the idea of absolute value from R. Given a complex numberz = (a, b) = a + bi, we define the magnitude or modulus of z by IzI :=

a2 +b2. We note that if z is real, z = (a, 0); then IzI = a2 = Ial theabsolute value of a. Did you notice we are using I I in two ways here? Anyway,taking the modulus of a complex number produces a real number. We collectthe basics on magnitude.

A.4.2 Basic Facts About Magnitude

(3.7) zz = Iz12 for all complex numbers z,(3.8) IzI ? 0 for all complex numbers z,(3.9) IzI =0iffz=0,(3.10) IZIZ2l = Izi I IZ2l for all ZI, Z2 in C,

(3.11) I z' I = Izj I for all ZI, Z2 in C with z2 A 0,z2 IZ2I

470 Complex Numbers

(3.12) Izi + z21 _< Izi I + Iz21 for all zI, z2 in C (the triangle inequality),

(3.13)z =

-for all z # 0 in C,

izil(3.14)11z,I-Iz211 _< Izi -Zz1forallZl,Z2inC,(3.15) IzI = I -zI for all z in C,(3.16) IzI = IzI.Again, we leave the verifications to the reader. We point out that you do not

always have to go hack to definitions. We have developed a body of theory(facts) that can now be used to derive new facts. Take the following proof forexample.

PROOF of 3.10IZIZ212 = (z1z2)(z,z2)

= (z,z2)(z,z2)= ((zIZ2)zI)z2= (z, (z2z, W Z2= ((zizi)z2)z2= (z,z,)(z2z2)= 1Z1 j2 Iz212

= (IzI I Iz21)2

by 3.7,by 3.4,by 2.5,by 2.5,by 2.5 and 2.7,by 2.5,by 3.7,by the commutative law of multiplication in R.

Now, by taking square roots, 3.10 is proved. 0

One beautiful fact is that when you have a notion of absolute value (mag-nitude), you automatically have a way to measure distance, in this case, thedistance between two complex numbers. If zI and z2 are complex numbers, wedefine the distance between them by d(z1, z2) IzI - Z21 . Note that if zI =(a,, b,) and z2 = (a2, b2), then d(zl, z2) = IzI - z21 = I(ai - a2, b, - b2)I =

(a, - a2)22 + (b, - b2)2. This is just the usual distance formula betweenpoints in R2! Let's now collect the basics on distance. These follow easilyfrom the properties of magnitude.

A.4.3 Basic Properties of Distance

(3.17)d(zI,Z2)>0 for all Zi, z2 in C.(3.18) d(z,,Z2)=0iffz, =Z2forz,, Z2 in C.(3.19) d(z1, z2) = d(z2, z,) for all z,. Z2 in C.(3.20) d(z,, z3) -< d(z,, z2) + d(z2, z3) for all z,, z2, z3 in C

(triangle inequality for distance).

A.4 Complex Conjugation, Modulus, and Distance 471

3.20d(zi+w,z2+w)=d(z1,Z2)for all zi,Z2,win C(translation invariance of distance).

We leave all these verifications as exercises. Note that once we have distancewe can introduce calculus since we can say how close two complex numbersare. We can talk of convergence of complex sequences and define continuousfunctions. But don't panic, we won't.

We end this section with a little geometry to help out our intuition.

T

Imaginary Axisi 1R /,'

z = (a,b) = a + bi

Real Axis

z = (a,-b) = a - bi

Figure A1.2: Magnitude and complex conjugate of z = a + bi.

Note that Izl is just the distance from z to the origin and z is just the reflectionof z about the real axis.

The addition of complex numbers has the interesting interpretation of additionas vectors according to the parallelogram law. (Remember from physics?)

472 Complex Numbers

(a+c, b+d)

Figure A1.3: Vector view of addition of complex numbers.

(You can check the slopes of the sides to verify we have a parallelogram.) Themultiplication of complex numbers also has some interesting geometry behindit. But, for this, we must develop another representation.

APA Exercise Set 3

1. Prove the basic facts about complex conjugation left unproved in the text.

2. Prove the basic facts about modulus left unproved in the text.

3. Prove the basic facts about distance left unproved in the text.

4. Find the modulus of 7 - 3i, 4i, 10 - i.

A. 5 The Polar Form of Complex Numbers

5. Find the distance between 3 - 2i and 4 + 7i.

6. Prove z2 = i 2 , if z2 A 0.

7. Calculate6 - 3i2 - i

using the formula in problem 6 above.

8. Find z if (5 + 7i )Z = 2 - i.

i+9. Compute .

i-z

473

10. Prove Re(z) = Z(z + z), /m(z) = Z; (z - z) for any z in C.

11. Prove IRe(z) ICI zl for any z in C.

12. Prove lRe(z)l + l!m(z)l vf2 Izl .

13. Prove z is real iff z = z; z is pure imaginary iff -z.

14. Prove that forallz,winC, lz+w12+Iz-w12=2Iz12+21w12.

15. Prove that forallz,winC, Izwl <z

(Iz12+Iw12).

16. For complex numbers a,b,c,d, prove that la - bI I c - d l + la - d Ilb - cl > Ia - cl lb - dl . Is there any geometry going on here? Canyou characterize when equality occurs?

A.5 The Polar Form of Complex NumbersYou may recall from calculus that points in the plane have polar coordinates

as well as rectangular ones. Thus, we can represent complex numbers in twoways. First, let's have a quick review. An angle in the plane is in standardposition if it is measured counterclockwise from the positive real axis. Do notforget, these angles are measured in radians. Any angle 0 (in effect, any realnumber) determines a unique point on the unit circle, so we take cos 0 to bethe first coordinate of that point and sin 0 to be the second. Now, any complexnumber z = (a, b) = a + bi # 0 determines a magnitude r = IzI = a2 -+b2and an angle arg(z) in standard position. This angle 0 is not unique since, if0 = arg(z), then 0 + 2Trk, k E 7G works just as well. We take the principalargument Arg(z) to be the 0 such that -Tr < 0 < Tr.

474 Complex Numbers

z = (a,b)

Figure A1.4: Polar form of a complex number.

From basic trigonometry we see cos 0 =a

=a

and sin 0 =b = b. Thus

z = a + bi = r(cos 0 + i sin 0).IzI r Izi r

There is an interesting connection here with the exponential function base e.Recall from calculus that

x2 x3 x4 xnex = l +x+-+-+-+...+-+... .2! 3! 4! n!

Let's be bold and plug in a pure imaginary number i 0 for x.

e'0 = l +(i0)+

i0 02 03 04

=1+4 6 8 3 S 7 9

Z + 4 - 6i + i(0- ei + si -

f + 9i...

=cos0+isin0.

A.5 The Polar Form of Complex Numbers

This is the famous Euler formula

eme = cos 0 + i sin 0.

We can now write

z = a + bi = re'0,

where r = Izi and 0 = arg(z) for any nonzero complex number z.This leads to a very nice way to multiply complex numbers.

475

THEOREM A.1Let z i = r, (cos 01 + i sin 0i) = r, eie' and z2 = r2(cos 02 + i sin 02) = r2eie2.Then zrz2 = r1 r2 [cos(01 + 02) + i sin(0i + 02)] = rIr2e'(e'+02)

PROOF We compute z i z2 = r, (cos 01 + i sin 01)r2(cos 02 + i sin 02) =r1 r2[(cos 01 cos 02-sin 01 sin 02)+i(sin 0r cos 02+cos 01 sin 02)] = ri r2[cos(Or+02) + i sin(0r + 02)] using some very famous trigonometric identities. 0

In words, our theorem says that, to multiply two complex numbers, youjust multiply their magnitudes and add their arguments (any arguments willdo). Many nice results fall out of this theorem. Notice that we can square acomplex number z = reie and get z2 = zz = r2e2'e. Who can stop us now?z3 = r3es;e z4 = r4e410 ... Thus, we see that the theorem above leads to thefamous theorem of Abraham DeMoivre (26 May 1667-27 November 1754).

THEOREM A.2 (DeMoivre's theorem)If 0 is any angle, n is any integer, and z = re' e, then z" = r"erne = r" (cos(n0)+i sin(nO)).

PROOF An inductive argument can be given for positive integers. The casen = 0 is easy to see. The case for negative integers has a little sticky detail.

First, if z = re'e # 0, then (re'e)(I a-'e) = eo = 1. This says z-1 = I e-1e.r r

Now suppose n = -m, where m is a positive integer. Then z" = (re'°)" _

(reierm = ((reie)-i)m = (I a-i0)mn = r-mei(-m)o = rnetne 0r

One clever use of DeMoivre's theorem is to turn the equation around andfind the nth roots of complex numbers. Consider z" = c. If c = 0, this equationhas only z = 0 as a solution. Suppose c # 0 and c = p(cos g + i sin g) =z" = r"(cos(nO) + i sin(nO)). Then we see p = r" and 9 = nO or, the otherway around, r = p'/" and g = go + 2k'rr, where go is the smallest nonnegative

476 Complex Numbers

angle for c. That is, 0 =Po + 2kir where k is any integer. To get n distinct

nroots, we simply need k = 0, 1, 2, ...

THEOREM A.3 (on Nth roots)For c = p(cos yo + i sin yo), p # 0, the equation z" = c, it a positive

1integer has exactly n distinct roots, namely z = p In

cos (yo + 2k-tr ) +

n

cpo + 2k trisin where k=0,1,2...

n )),For example, let's find the fifth roots of c = -2 + 2i. First, r = 1-2 + 2i I _

= so yo = 135° = 3irr/4. Thus, c = 23/2(cos37r4 + i sin 4) _

23/2ei3n'4. So the fifth roots are

2-1/10e 3,r/20 = 23/10 (cos37r

+ i sin37r)

= 23110(cos 27° + i sin 27°),20 20

23t toe IOrr/2° = 23/ 10(eos

23/ loe' 19w/20 = 23/ 10(cos

23/ 10e127Tr'20

23/ 10e i357r/20

11 Tr 117r 3/10 nn° ; 99°2019Tr

+t stn-)=2 (COs +t stn ),

20+ i sin

= 23t10(cos 20 + i sin

= 23"10(cos 20 + i sin

2019ar

2027Tr

20

23/ 10(cos 171° +i sin 171`),

23110(cos 243° + i sin 243°),

35,7r

20 = 23/ 1o(cos 315° + i sin 315°).

A special case occurs when c = 1. We then get what are called the roots ofunity. In other words, they are the solutions to z" = I.

THEOREM A.4 (on roots of unity)The it distinct nth roots of unity are

2,rrk

e n = cos(211k)

+ i sin(2,rrk l , where k = 0, 1, 2... , n - 1.

n n /)

Moreover, if we unite w = cos 12n I + i sin 12-n I , then w is the nth rootn \\\n/

of unity having the smallest positive angle and the nth roots of unity arew, w2, w3, ... , wn-1 . Note that all roots of unity are magnitude one complexnumbers, so they all lie on the unit circle in R 2. There is some nice geome-try here. The roots of unity lie at the vertices of a regular polygon of n sides

A.5 The Polar Form of cornplex Numbers 477

inscribed in the unit circle with one vertex at the real number 1. In other words,the unit circle is cut into n equal sectors.

Example Al.Findthe four fourth roots of I and plot them

Figure A1.5:

I

7r 'Tr

cos 7r + i sin 7r = -1,

37r 37rW3 = COST + i sin

2= -l.

478

APA Exercise Set 4

1.

2.

Complex Numbers

Express 10(cos 225` + i sin 225") in rectangular form. Do the same fora) 6(cos 60° + i sin 60°) h) 18(cos 135° + i sin 135b) 17(cos(270° + i sin 270") i) 41(cos90°+isin90")c) 2f(cos 150° + i sin 150`) j) 5 /(cos 225° + i sin 225d) 6f(cos45° + i sin 45°) k) 10(cos300° + i sin 300")e) 12(cos 330° + i sin 330°) 1) 22(cos 240° + i sin 240')f) 40(cos 70° + i sin 70°) m) 8(cos 140° + i sin 140")g) 6(cos 345° + i sin 345°) n) 15(cos 235° + i sin 235')

Write the following complex numbers in polar form.a) 8 j) 14b) - 19i k) 3 . - i 3 4-c) 15 - i f

d) -13-i13.e)4-3it)6+8ig)5/ +i5'h) 31ii) -4f -4i

1) -4-4im) +i 15

n) 12 + 5io)-13-10iP) - 7 - 7iq) -8'+8ir) 15f -i15f

s) 18it)- 11+ lliu) 10-il0.v)2f -i2/w) - 6 + 12ix)7- lliy) - 23z)3-i3f3-aa) 10 + 20i

3. Perform the indicated operations and express the results in rectangularform.

a) [2(cos 18° + i sin 18°)] [4(cos 12° + i sin 12°)]b) [ 10(cos 34° + 34°)] [3(cos 26° + i sin 26°)]c) [7(cos 112° + i sin 112°)] [2(cos 68° + i sin 68°)]d) [6(cos 223° + i sin 223°)] [cos 227° + i sin 227°)]

e)12(cos 72° + sin 72°)

3(cos 42° + sin 42°)24(cos 154° + sin 154°)

6(cos 64° + i sin 64°)42(cos 8° + i sin 8°)

g) 7(cos 68° + i sin 68°)6f(cos 171° + i sin 171°)

h)

4.

2(cos 216° + i sin 216')

Express in rectangular form.

a) [2(cos 30° + i sin 30°)14b) [4(cos 10° + i sin 10")]6c) [3(cos 144° + i sin 144°)]5d) [2(cos 210° + i sin 210°)]'

A.5 The Polar Form of Complex Numbers 479

e) [2(cos 18° + i sin 18°) 5

f) [ 3 (cos 30° + i sin 30°)] -6

g) [(cos 15° + i sin 15°)]100h) (cos 60° + i sin 60°)50f .i)(2 - 2f030

1 /3-4Oj)(-2+ 2:)k) ( + i)5

1) (v-i,5)95. Find the fourth roots of -8 + i 8

6. Find in polar form the roots of the following. Draw the roots graphically.

a) x2 = 36(cos 80° + i sin 80°)b) x2 = 4(cos 140° + i sin 1400)

c) x3 = 27(cos 72° + i sin 72°)d) x3 = 8(cos 105° + i sin 105°)e) x4 = 81(cos 64° + i sin 64°)

f) x4 = 16(cos 200° + i sin 200°)g) X5 = (cos 150° + i sin 150°)h) x6 = 27(cos 120° + i sin 1200)

i)

j)

k)

1)

m)

n)

o)

P)

x2=l+ifx2=8-i8x3+4+if3- =0x3-8f-i8f = 0X4 =2-iJX4 = - 8 + i 8fX5-32=0x5 + 16 + IN = 0

7. Find all roots and express the roots in both polar and rectangular form.

a) x2 + 36i = 0b) x2=32+i32fc) x3-27=0d) x3 - 8i = 0e) x3+216=0f) X3+27i = 0g) x3 - 64i = 0h) x4+4=0i) x4+81 =0j) x4 = -32 - i32,/.3k) x6-64=0

480 Complex Numbers

1) x6+8=0m) x5-I=0n) x5 - 2431 = 0

8. Find necessary and sufficient conditions on the complex numbers z andw so that zz = ww.

A.6 Polynomials over C

It turns out that properties of polynomials are closely linked to properties ofmatrices. We will take a few moments to look at some basics about polynomials.But the polynomials we have in mind here have complex numbers as coefficients,such as (4+ 2i )xz + 3i x +(,,f7-r +ezi ). We will use C [x] to denote the collectionof all polynomials in the indeterminate x with coefficients from C. You can addand multiply polynomials in C [x] just like you learned in high school. First,we focus on quadratic equations, but with complex coefficients. This, of course,includes the real coefficient case. Let ax 2 + bx + c = 0, where a, b, and c are inC. We will show that the famous quadratic formula for the roots of the quadraticstill works. The proof uses a very old idea, completing the square. This goesback at least to the Arab mathematician Al-Khowarizmi (circa 780-850) in theninth century A.D. Now a is assumed nonzero, so we can write

b cx + -x = --.a a

z

Do you remember what to do'? Sure, add(

2bto each side (half the coefficient

of the first degree term squared) and get

zb b c (b\2

X +a

x +2a = - a + 2a

The left-hand side is a perfect square!

z z /Cx+2a) -xz+ax+(2a)--a+I 2a

)2.

Now form a common denominator on the right

bz b2 - 4ac

Cx + 2a) 4a2

A.6 Polynomials over C 481

We have seen how to take square roots in C, so we can do it here.

b b2 - 4acx +

2a = ± 2a '

b b2 _-4 a cX

2a 2a

In 1797, Gauss, when he was only 20 years old, proved probably the mostcrucial result about C [x] I. You can tell it must be good by what we call thetheorem today; it is called the Fundamental Theorem of Algebra.

THEOREM A.5 (the fundamental theorem of algebra)Every polynomial of degree n > I in C[x] has a root in C.

We are in no position to prove this theorem right here. But that will not stopus from using it. The first consequence is the factor theorem, every polynomialin C [x] factors completely as a product of linear factors.

THEOREM A.6 (the factor theorem)Let p(x) be in C [x] and have degree n > 1. Then p(x) = a(x - r,)(x -r2) ... (x - where a, r, , r_.. , r are complex numbers with a # 0. Thenumbers r, , r, ... r (possibly not distinct) are the roots of p(x) and a is thecoefficient of the highest power of x in p(x).

PROOF By the fundamental theorem, p(x) has a root, r1. But then we canfactor p(x) = (x - ri)q(x), where q(x) E C [x] and the degree of q(x) is oneless than the degree of p(x). An inductive argument then gets us down to a lastfactor of the form a(x - 0

In the case of real polynomials (i.e., R [x]), the story is not quite as clean,but we still know the answer.

THEOREM A.7Every polynomial p(x) in R [x] of degree at least one can be factored as aproduct of linear factors and irreducible quadratic factors.

PROOF Let p(x) E R [x] with p(x) = a,x + ao.Here the coefficients ai are all real numbers. If r is a root of p(x), possibly com-plex, then so is 7 (Verify p(r) = p(T)). Thus, nonreal roots occur in complexconjugate pairs. Therefore, if s i , ... sk are real roots and r1, T , r2, T2 ... are thecomplex ones we get inC [x], p(x) =a(x-s1)...(x-sk)(x-rl)(x-Ti)....

482 Complex Numbers

But (x - rj)(x - I' j) = x2 - (rj + rj )x + r? j is an irreducible quadratic inII8[x], so the theorem is proved. 0

APA Exercise Set 4

1. Wouldn't life be simpler if fractions were easier to add? Wouldn't it hegreat if

u

+ ti = a+n? Are there any real numbers a and b that actuallywork in this formula? Are there any complex numbers that actually workin this formula?

Further Reading

[B&L, 1981 J J. L. Brenner and R. C. Lyndon, Proof of the Fundamen-tal Theorem of Algebra, The American Mathematical Monthly, Vol. 88,(1981), 253-256.

[Derksen, 20031 Harm Derksen, The Fundamental Theorem of Algebraand Linear Algebra, The American Mathematical Monthly, Vol. 110, No.7, Aug-Sept (2003), 620-623.

[Fine, 1997] B. Fine and G. Rosenberger, The Fundamental Theorem ofAlgebra, Springer-Verlag, New York, (1997).

[H&H, 2004] Anthony A. Harkin and Joseph B. Harkin, Geometry ofGeneralized Complex Numbers, Mathematics Magazine, Vol. 77, No. 2,April, (2004), 118-129.

[Ngo, 1998] Viet Ngo, Who Cares if x2 + 1 = 0 Has a Solution?, TheCollege Mathematics Journal, Vol. 29, No. 2., March, (1998), 141-144.

A.7 Postscript

Surely by now we have come to the end of our construction of numbersystems. Surely C is the ultimate number system and you can do just aboutanything you want in C. Well, mathematicians are never satisfied. Actually, thestory goes on. If you can make ordered pairs in R2 into a nice number system,

A. 7 Postscript 483

why not ordered triples in IR3. Well, it turns out, as William Rowan Hamil-ton discovered about 150 years ago, you cannot. But this Irish mathematiciandiscovered a number system you can make with four-tuples in R4. Today wecall this system IHI, the quaternions. If q, = (a, b, c, d) and q2 = (x, y, u, v),then q, ® q2 = (a + x, b + y, c + u, d + r) defines an addition that youprobably would have guessed. But take a look at the multiplication! q, O q2 =(ax-by-cu-dv,ay+bx+cu-du,au+cx+dy-bv, av +dx +bu -cy).How many of the properties of C also work for 1HI? There is a big one that doesnot. Can you find it? Now C can he viewed as being contained in IHI by takingfour-tuples with the last two entries zero. Our chain now reads

NcZcQcRcCCH.Mathematicians have invented even weirder number systems beyond the

quaternions (see Baez [20021). To close this section, let me tempt you to dosome arithmetic in the finite system 10, 1, 2, 3) where the operations are definedby the following tables:

+ 0 1 2 3 0 1 2 3

0 0 1 2 3 0 0 0 0 0

1 1 0 3 2 1 0 1 2 3

2 2 3 0 1 2 0 2 3 1

3 3 2 1 0 3 0 3 1 2

Can you solve 2x = 1 in this system'? Can you figure out which of the rulesof C work in this system? Have fun!

Further Reading

[Baez, 20021 J. C. Baez, The Octonions, Bulletin of the American Math-ematical Society, Vol. 39, No. 2, April (2002), 145-205. Available at<http://math.ucr.edu/home/baez/octonions.html>.

[Kleiner, 1988] I. Kleiner, Thinking the Unthinkable: The Story of Com-plex Numbers (With a Moral), Mathematics Teacher, Vol. 81, (1988),583-592.

[Niev, 1997] Yves Nievergelt, History and Uses of Complex Numbers,UMAP Module 743, UMAP Modules 1997,1-66.

Appendix B

Basic Matrix Operations

B.1 IntroductionIn Appendix A, we looked at scalars and the operations you can apply to

them. In this appendix, we take the same approach to matrices. A matrix isjust a rectangular array of scalars. Apparantly, it was James Joseph Sylvester(3 September 1814-15 March 1897) who first coined the term "matrix" in1850. The scalars in the array are called the entries of the matrix. We speak of amatrix having rows and columns (like double-entry bookkeeping). Remember,columns hold up buildings, so they go up and down. Rows run from left toright. So, an m-by-n matrix A (we usually use uppercase Roman letters todenote matrices so as not to confuse them with scalars) is a rectangular array ofscalars arranged in m horizontal rows and n vertical columns. In general then,an m-by-n matrix A looks like

all a12 ... aln

a21 a22 ... a2nm n

Ala1 mxn

,=_ rra11 _ [fa,

an,l amt ... anu,

where each aid is in C for I < i < m and I < j < n. In a more formaltreatment, we would define a matrix as a function A : [m] x [n] --p C where[m] = [ 1 , 2,... , m} and [n] = ] 1, 2,... , n} and A(i, j) = aid, but we see noadvantage in doing that here.

Note that we can locate any entry in the matrix by two subscripts; that is, theentry aid lives at the intersection of the i th row and j th column. For example, a34denotes the scalar in the third row and fourth column of A. When convenient,we use the notation entij (A) = aid; this is read "the (i, j) entry of A" The indexi is called the row index and j the column index. By the ith row of A we meanthe 1-by-n matrix

rowi(A) = [a11 ail ... air]

485

486 Basic Matrix Operations

and by the jth column of A we mean the m-by-I matrix

1

We say two matrices are the same size if they have the same number ofrows and the same number of columns. A matrix is called square if the num-ber of rows equals the number of columns. Let C"' denote the collectionof all possible m-by-n matrices with entries from C. Then, of course, C" ""would represent the collection of all square matrices of size n-by-n with entriesfrom C.

Let's have a little more language before we finish this section. Any entry ina matrix A with equal row and column indices is called a diagonal element ofA; diag(A) = [a,, a22 a33 ... ]. The entry ai j is called off-diagonal if i # j,subdiagonal if i > j, and superdiagonal if i < j. Look at an example and seeif these names make any sense.

We now finish this section with the somewhat straightforward idea of matrixequality. We say two matrices are equal iff they have the same size and equalentries; that is, if A, B E C"' "", we say A = B iff enti j (A) = ent,j (B) for all i, jwith I < i < m and I < j < n. This definition may seem pretty obvious, butit is crucial in arguments where we want to "prove" that two matrices that maynot look the same on the surface really are. Notice we only speak of matricesbeing equal when they are the same size.

APB Exercise Set 1

1. Let A =L 4 5 6

1. What is ent,3(A)? How about ent32(A)? What

are the diagonal elements of A? The subdiagonal elements? The super-diagonal elements? How many columns does A have? How many rows?

2. SupposeL X- Y Y+ z ]= f 9 2

2x - 3w 3w + 2z L 6 5What are x, y, z, and w?

B.2 Matrix Addition 487

B.2 Matrix Addition

We now begin the process of introducing the basic operations on matrices. Wecan add any two matrices of the same size. We do this in what seems a perfectlyreasonable way, namely entrywise. Let A and B be in C" with A = laid I ,

B = [b, ]. The matrix sum A ® B is the m-by-n matrix whose (i, j) entry isaid + b, . Note that we have two notions of addition running around here, ®used for matrix addition and + used for adding scalars in C. We have definedone addition in terms of the other; that is A ® B = la;j + b,]. In other words,

ent;i(A ® B) = ent11(A) + ent;j(B).

For example, in C2x3,

1 2 3 2 4 6 3 65 6,®[ 1 3 5,-[5 8 11[4

Now the question is: what rules of addition does this definition enjoy? Arethey the same as those for the addition of scalars in C? Let's see!

THEOREM B.1 (basic rules of addition in C"' x")Suppose A, B, and C are in C"' "' Then

1. Associative law of matrix additionA®(B(D C)=(A®B)®C.

2. Existence of a zeroLet 0 = [0]",x" (i.e., ent11(0) = 0). Then for any matrix A at all,

®®A=A=A®®.

3. Existence of opposites

Given any matrix A in C'"we can find another matrix B with A ®B = 0 = B ® A. In fact take B with ent;j(B) = -entqj(A). DenoteB=-A Then A®(-A)-®=(-A)®A.

4. Commutative law of matrix additionFor any two matrices A, B in Cm x", A ® B = B ® A.

PROOF We will illustrate the proofs here and leave most as exercises. Weappeal to the definition of matrix equality and show that the (i, j) entry of the


matrices on both sides of the equation are the same. The trick is to push theargument back to something you know is true about C. Let's compute:

ent, LA ® (B (D C)l= ent,j (A) + entij (B (D C) definition of matrix addition,= ent;j (A) + (entgi(B) + ent,j (C)) definition of matrix addition,= (entij(A) + ent,j(B)) + ent,j(C) associative law of addition in C,= ent,i(A (D B) + ent,j (C) definition of matrix addition,= ent,j ((A ® B) ® C) definition of matrix addition.

a

That was easy, wasn't it'? All we did was use the definition of matrix additionand the fact that addition in C is associative to justify each of our steps. Thebig blow of course was the associative law of addition in C. Now you prove therest!

So far, things are going great. The algebraic systems (C, +) and (C"`" ®)have the same additive arithmetic. The basic laws are exactly the same. We caneven introduce matrix subtraction, just like we did in C. Namely, for A, B inCmxn define A J B = A ® (-B). In other words,

ent,j(A 0 B) = ent,1(A) - ent,j(B) = a,j - b,j.

Now that the basic points have been made, we will drop the special notationof circles around addition and subtraction of matrices. The context should makeclear whether we are manipulating scalars or matrices.

APB Exercise Set 2

1 . Let A= [2 0 1 1 ]'B- [ 61 1 4

Find A + B. Now find B + A. What do you notice'? Find A - B. FindB - A. What do you notice now?

2. Suppose A = [3 4 j' X =

Solve for X in A + X = B.

x

uY j, and B = 5

14

10

20

3. What matrix would you have to add toSi 3 +

2i] in

C2x2 to get

the zero matrix?

4. Prove the remaining basis laws of addition in C"I'l using the formatpreviously illustrated.

B.3 Scalar Multiplication 489

B.3 Scalar MultiplicationThe first kind of multiplication we consider is multiplying a matrix by a

scalar. We can do this on the left or on the right. So let A be a matrix inC""", and let a E C be a scalar. We define a new matrix aA as the matrixobtained by multiplying all the entries of A on the left by a. In other words,ent;l(aA) = a ent;l(A); that is, aA = a [aid] = [aa,j]. For example, in C2x2,

we have 3 12 3 6

4 6 , = [ 12 18

Let's look at basic properties.

THEOREM B.2 (basic properties of scalar multiplication)

Let A, B be in C"<" anda, b be inC. Then

1. a(A + B) = aA + aB.

2. (a + b)A = aA + bA.

3. (ab)A = a(bA).

4. IA=A.

5. a(Ab) = (aA)b.

PROOF As usual, we illustrate only one of the arguments and leave the restas exercises. The procedure is to compare (i, j) entries. Let's prove (1) together:

entij(a(A + B)) = a enti1(A + B) definition of scalar multiplication,= a [entij(A) + ent11(B)] definition of matrix addition,= a(ent;j (A)) + a(entij (B)) left distributive law in C,= entij(aA) + entij(aB) definition of scalar multiplication,= entij(aA + aB) definition of matrix addition,

Once again, our attack on this proof has been to appeal to definitions andthen use a crucial property in C. 0

Our game plan is still going well, although it is a little unsettling that we aremixing apples and oranges, scalars and matrices. Here we multiply a matrix bya scalar and get another matrix. Can we multiply a matrix by a matrix and getanother matrix? Yes, we can, and that is what we address next.


APB Exercise Set 3

1. Let A=[2

1 1 1 J, B=[5

2 4 J. Compute 5A - 3B,

-4(A - 8).

2. Prove the remaining basic properties in the manner illustrated. Formulateand prove similar properties for right scalar multiplication.

3. Let C E 0"'. Argue C = A + i B where the entries of A and B arereal. Argue A and B are unique.

B.4 Matrix MultiplicationThe issue of matrix multiplication is not quite so straightforward. Suppose

we have two matrices A and B of the same size. Since we defined additionentrywise, it might seem natural to define multiplication of matrices the sameway. Indeed you can do that, and we will assign exercises to investigate prop-erties of this definition, of multiplication. However, it turns out this is not the"right" definition. There is a sophisticated way to motivate the upcoming def-inition, but we will attempt one using systems of linear equations. Considerthe problem of substituting one system of linear equations into another. Begin

I a x, + a12x2 = C1 a,1 a12with . The coefficient matrix is A =

a2, x, + a22x2 = c2 a21 a22

I x, = b y, + b12 y2Now consider a linear substitution for the xs : . The

I x2 = b21Y1 + b22y2

coefficient matrix here is B =61,

b12 Now, putting th e xs back int o

the original system, we get621 b22

a, Ix, +a12x2 b12Y2) + a12(b21 Y1 +b22Y2)

= a,1b11Y1 + a11b12Y2 + a,2b21Y1 + a12b22Y2

_ (a,1b,I + a22b21)y, + (a,1b,2 + a,2b22)Y2 and

a21x1 + a22x2 = a21(b11)'1 + b12Y2) + a22(b21Y1 + b22Y2)

= a2,b11y, + a2,b12y2 + a22b21Y1 + a22b22y2

= (a21 b + a22b21)Y, + (a21 b12 + a22b22)Y2

B.4 Matrix Multiplication 491

If we use matrix notation for the original system, we get AX = C, where

X =L

X1 J and C =cz

1. We write the substitution as X = BY, where2

Y =L

y1J .

Then the new system is AX = A(BY) = (AB)Y = C ifyou cher-Y2

ish the associative law. Thus, the coefficient matrix AB =allbi1 +a12b21 a11bit +a12b22

a21b11 +az1b21 azib12+azzbaz

So, if we make this our definition of

Jmatrix multiplication, we see the connection between the rows of A and thecolumns of B. Using this row by column multiplication produces the result-ing matrix AB. This is what leads us to make the general definition of how tomultiply two matrices together.

Suppose A is in C''" and B is in C"P. Then the product matrix AB is thematrix whose (i, j) entry is obtained by multiplying each entry from the ithrow of A by the corresponding entry from the jth column of B and adding theresults. Notice that the product matrix has size m-by-p. Let's spell this out a bitmore. The product matrix AB is the m-by-p matrix whose (i, j) entry is

ent;j(AB) =k=1

Okay, still clear as mud? Let A = [a,1] and B = [b11]. Then

ent,j(AB) = an b1j + a12b2J + ... + ai"b,,j.

4 1 4 3Now, let's look at an example. Let A = [ 2 6 0] and B = 0 -1 3 1

2 7 5 2Let's compute the (2,3) entry of AB. We use the second row of A and thirdcolumn of B:

1 2 4 ] 14 1 4 3r2.4+6.3+0. 5 = 26. So 12 6 0 0 -1 3 1 - [ 26 *,'

L 2 7 5 2

Does that help? See if you can fill out the rest of AB. As you can see,multiplying matrices requires quite a bit of work. But that is why we havecalculators and computers. Notice that to multiply matrix A by matrix B inthe order AB, the matrices do not have to be the same size, but it is abso-lutely necessary that the number of columns of A is the same as the number


of rows of B. If you remember the concept of dot product from your ear-lier studies, it might he helpful to view matrix multiplication in the followingway:

ent;j(AB) = row,(A). colj(B).

Before we move to basic properties, let's have a quick review of the usefulsigma notation we have already used above. You may recall this notation fromcalculus when you were studying Riemann sums and integrals. Recall that thecapital Greek letter E stands for summation. It tells us to add something up. It

is a great shorthand for lengthy sums. So al + a2 + ... + an = Eat. Now thei=1

b1jb2j

(i, j) entry of A B can be expressed as [a; I ae2 ... a;,, aikbkj: k=I

bnjusing E-notation. Note the "outside" indices i and j are fixed and the "runningindex" is k, which matches on the "inside."

There are three basic rules when we do computations using the E-notation.

THEOREM B.3 (basic rules of sigma)n n n

1. > (aj +bj) = >aj + bj.j=1 j=1 j=1

n

2. Ea bj =a Ebj.j=1 l=1

3. E > ajk = E E ajk.j=lk=1 4=1 j=I

The proofs here are unenlightening computations and we will not even askyou to do the arguments as exercises. However, we would like to consider rule(3) about double summations. This may remind you of the Fubinni theoremin several variable calculus about interchanging the order of integration (thenagain, maybe not). Anyway, we can understand (3) by using matrices. Consider

all a,2the matrix a21 a22 . Let T stand for the summation total of all the matrix

all a32entries. Well, there are (at least) two different ways we could go about computingT in steps. First, we could compute row sums and add these or we could firstcompute column sums and then add these. Evidently, we would get the same

B.4 Matrix Multiplication 493

answer either way, namely T.

alla21

a31

a12

a22

a32

3 3

>aj1 > aj2j=1 j=1

k=1

2 2 2 3 2 3 3

Thus, T = >alk + >a2k + >a3k = E > aik or T = >aji + >aj2 =k=I k=1 k=1 j=lk=I j=l j=1

2 3 3 3 2 3

> ajk or T = >ajl + >aj2 = E Eajk.k=l j=1 j=l j=I k=lk=l

So you see, matrices are good for something already! Now let's look at thebasic properties of matrix multiplication.

THEOREM B.4 (basic properties of matrix multiplication)

Let A, B, C be matrices and a E C. Then

1. If A is m-by-n, B n-by-p, and C p-by-q, then A(BC) = (AB)C in C",y

2. If A is m-by-n, B n-by-p, and C n-by-p, then A(B + C) = AB + AC in(Cmxp

3. If A is in-by-n, B is m-by-n, and C is n-by-p, then (A + B)C = AC + BCinCmxp

4. If A is m-by-n, B n-by-p, and a E C, then a(AB) _ (aA)B = A(aB).

5. row1(AB) = (row;(A))B.

6. colj(AB) = A(colj(B)).

7. Let I denote the n-by-n matrix whose diagonal elements all equal oneand all off-diagonal entries are zero. Let A be m-by-n. Then IA = A =AI,,.

PROOF The proofs are left as exercises in entry verification in the manneralready illustrated. 0


What are some of the consequences of these basic properties'? That is,(C, +, , 0, 1 1 1 , +, , share many common algebraic features. Thus,the basic arithmetic of both structures is the same. However, now the cookies

9start to crumble. Let's look at some examples. Let A = [ 6 4 6 ] and B =

2[ , 1 0

]inCz z. Note AB= [ 23

18 -6 -9 ] =BA.Thus, the commutative law ofmultiplication breaks down for matrices. There-

fore, great care must be exercised when doing computations with matrices.Changing the order of a product could really mess things up. Even more can

go wrong; A # 02 but A2 = [ 0 0 ] = ®z, so C2x2 has "zero divisors"

even though C does not. These two facts bring in some big differences in thearithmetic of scalars versus matrices.

Understanding the nature of matrix multiplication is crucial to understandingmatrix theory, so we will dwell on the concept a bit more. There are four ways tolook at multiplying matrices and each gives insights in the appropriate context.We will illustrate with small matrices to bring out ideas.

View 1 (the row column view [or dot product view])

Let A = a

c

a[ax + bu] = [(a, b) (x, u)]. Then AB =

c d

_ (a, b) (x, u) (a, b) (y, v) This is the row-columnI -

b ] and B =d Recall the dot product notationrb

View 2 (the column view)

Here we leave A alone but think of B as a collection of columns. Then AB =

[c d ] U v -[[c d U ] [c d v

ax + bu ay + by We find it very handy in matrix theory to doex+du . cu+du

things like AX = A [xi I xz I x'j = [Ax, I Axz I Axi], which moves the ma-trix on the left in to operate on the columns of the matrix on the right one byone.

B.5 Transpose 495

View 3 (the row view)

a bHere we leave B alone and partition A into rows. Then A B =

c d

a b ]

c d ]

1

ax+bu ay+bvcx+du cy+dv

View 4 (the column-row view [or outer product view])

Here we partition A into columns and B into rows. Then AB =a c

b . d

x.. y.. _][a][ ]+[u V][ax aaCyy

L J

I bu by ] _ r ax + bu ay + by ] Also notice that r a b ]du dv L cx+du cy+du

(

L c dx

] [cx+du] [acxx

]+[ du] -[ca ]x+[d ]u.We[usee that the matrix product AB is a right linear combination of the columns

of A, with the coefficients coming from B. Indeed, [ a b ] X Y

U v

a [x y] + b [u v], so the product can also he viewed as a left linear combinationof the rows of B, with coefficients coming from the matrix on the left.

B.5 TransposeGiven a matrix A in C"I'll we can always form another matrix AT, A-

transpose, in C"'<"'. In other words,

ent;j(AT) = entj;(A).

Thus, the transpose of A is the matrix obtained by simply interchanging the

1 2 3 [1 4

rows and columns of A. If A = r4 5 6

], then AT = 2 5

3 6


THEOREM B.5 (basic facts about transpose)

1. If AEC"",then ATT =A.

2. IfA,BEC'nxn then(A+B)T=AT+BT.

3. If A E Cm"" and B E C"',', then (AB)T = BT AT T.

4. If A is invertible in C"x" then (A T)-1 = (A')T.

5. (aA)T = aAT.

PROOF We illustrate with (3). Compute

ent,j((AB)T) = entj,(AB) definition of transpose,n

= >2ajkbki definition of matrix multiplication,k=In

= >bkiajk commutativity of C,k=1n

= Eent;k(BT )entk,(AT) definition of transpose,k=1

= ent,j((BT )(AT )) definition of matrix multiplication.

Since the (i, j) entries agree, the matrices are equal. 0

We can use the notion of transpose to distinguish important families of ma-trices. For example, a matrix A is called symmetric iff A = AT. A matrix A iscalled skew-symmetric if1 AT = -A.

The reader may notice that there seems to be a pretty strong analogy betweencomplex conjugation z 1 -> z and matrix transposition A 1> AT. Indeed, theanalogy is quite strong. However, there is a problem. For complex numbers,zz = Iz12 = 0 implies z = 0. If we let

1 1 1

A = i it ECzxz then AA T=iI

tl0 0

but A 0. Note that if we had restricted A to real entries, then AAT =implies A = 0 for A E R"x". It boils down to the fact that, in R, the sumof squares equaling zero implies each entry in the sum is zero. We can fix thisproblem for complex matrices. If A E Cmx", define the conjugate transposeA* by

ent,j(A*) = entj,(A) = entj,(A);

that is,

A*=(AT)=At.

B.5 Transpose 497

But first we define the conjugate of a matrix and list the basic properties. DefineA by

ent;j(A) = ent11(A).

THEOREM B.6 (basic fact about the conjugate matrix)Let A, B E C"xn and a E (C. Then

1. A=A.

2. (A -+B ) = A + B.

3. AB = AB.

4. A = A iff all entries of A are real.

5. aA = aA.

6. (X)T = AT.

THEOREM B.7 (basic facts about conjugate transpose)Let A, B E C" x", and a E C. Then

1. (A*)* = A.

2. (A + B)* = A* + B*.

3. (AB)* = B*A*.

4. (aA)* = &A*.

5. (AA*)* = AA* and (A*A)* = A*A.

6. AA* _ ® implies A = 0.

7. A* = A" (A*)*).

8. (A ®B)* = A* ®B*.

Now the real numbers sit inside of the complex numbers, IR e C. Wecan identify complex numbers as being real exactly when they are equal totheir complex conjugates. This suggests that matrices of the form A = A*should play the role of real numbers in C"x". Such matrices are called self-adjoint, or Hermitian, in honor of the French mathematician Charles Hermite(24 December 1822-14 January 1901). Constructing Hermitian matrices is easy.Start with any matrix A and form AA* or A*A. Of course, Hermitian matrices


are always square. So, in C"x" we can view the Hermitian matrices as playingthe role of real numbers in C. This suggests an important representation ofmatrices in C11". Recall that if z is a complex number, then z = a + bi, wherea and b are real numbers. In fact, we defined Re(z) = a and Im(z) = b. We saw

Re(z) =z

2z

and Im(z) =z

2iz

,so the real and imaginary part of a complex

number are uniquely determined. We can reason by analogy with matrices in* A _ A*

Cn"" Let A E C""". Define Re(A) =A

2Aand Im(A) = 2i . This

leads to the so-called Cartesian decomposition. Let A E C11 "I be arbitrary. ThenRe(A) and Im(A) are self-adjoint and A = Re(A) + Im(A)i. This has importantphilosophical impact. This representation suggests that if you know everythingthere is to know about Hermitian matrices, then you know everything there isto know about all (square) matrices.

Let's push this analogy one step forward. A positive real number is alwaysa square; that is, a > 0 implies a = b2 for some real number b. Thus, if weview a as a complex number, a = bb = bb. This suggests defining a Hermitianmatrix H to be positive if H = SS* for some matrix S 0. We shall havemore to say about this elsewhere.

APB Exercise Set 4

1. Establish all the unproved claims of this section.

2. Say everything you can that is interesting about the matrixi 2 23 3 32 I 23 3 32 _2 I3 3 3

3. Let A = J. Find A2, A3,... ,A". What if AI I l

1 1 1 ?

1 1 1

Find A2, A3, ... , A". Can you generalize?

4. Argue that (A + B)2 = A 2 + 2A B + B 2 iff AB = BA.

5. Argue that (A + B)2 = A22+ B2 iff AB = -BA.

6. Suppose AB = aBA. Can you find necessary and sufficient conditionsso that (A + B)3 = A3 + B3'?

7. Suppose AB = C. Argue that the columns of C are linear combinationsof the columns of A.

B.5 Transpose 499

8. Leta E C. Argue that

a 00 ABC

CC

9. Argue that AB is a symmetric matrix iff AB = BA.

10. Suppose A and B are 1-by-n matrices. Is it true that ABT = BAT?

11. Give several nontrivial examples of 3-by-3 symmetric matrices. Do thesame for skew-symmetric matrices.

12. Find several nontrivial pairs of matrices A and B such that AB 54 BA.Do the same with A B = 0, where neither A nor B is zero. Also findexamples of AC = BC, C 96 0 but A 0 B.

0 2 013. Define [A, B] = AB - BA. Let A = 2 0 42

0 2 0

0 i 2 0 -1 0 0B -i 2 0 i 2 , and C 0 0 0 Prove that

0 i 0 0 0 1

2

[A, B] = X, [B, C] = M, [C, A] = i B, A2 + B2 + C2 = 213.

14. Determine all matrices that commute with L 0 0 J . Do the same for

0 1 0

0 0 1 . Can you generalize?0 0 0

X 1 0

15. Find all matrices that commute with 0 X I . Now find all ma-0 0 x

0 0 1

trices that commute with 0 1 01 0 0

0 0 1

16. Suppose A = 0 1 0 and p(x) = x2 - 2x + 3. What is p(A)?1 0 1

17. Suppose A is a square matrix and p(x) is a polynomial. Argue that Acommutes with p(A).


18. Suppose A is a square matrix over the complex numbers that commuteswith all matrices of the same size. Argue that there must exist a scalar Xsuch that A = V.

19. Argue that if A commutes with a diagonal matrix with distinct diagonalelements, then A itself must be diagonal.

20. Suppose P is an idempotent matrix (P = P2). Argue that P = Pk forall k in N.

21. Suppose U is an upper triangular matrix with zeros down the main diag-onal. Argue that U is nilpotent.

22. Determine all 2-by-2 matrices A such that A2 = I.

2

23. Compute [ 1

J

. Determine all 2-by-2 nilpotent matrices.

24. Argue that any matrix can be written uniquely as the sum of a symmetricmatrix plus a skew-symmetric matrix.

25. Argue that for any matrix A, AAT, AT A, and A + AT are always sym-metric while A - AT is skew-symmetric.

26. Suppose A2 = I. Argue that P = ;(I + A) is idempotent.

27. Argue that for any complex matrix A, AA*, A*A, and i(A - A*) arealways Hermitian.

28. Argue that every complex matrix A can be written as A = B +iC, whereB and C are Hermitian.

A-I29. Suppose A and B are square matrices. Argue that Ak - Bk = E Bj (A -

j=0

B)Ak-I-j.

30. Suppose P is idempotent. Argue that tr(PA) = tr(PAP).

31. Suppose A ® C ].whatisA2?A3?CanyouguessA?1?

32. For complex matrices A and B, argue that AB = A B, assuming A andB can be multiplied.

33. Argue that (I - A)(I + A + A2+. + A"') = I - A"'+I for any positiveinteger m.

B.5 Transpose 501

34. Argue that 2AA* - 2BB* = (A + B)*(A - B) + (A - B)*(A + B) forcomplex matrices A and B.

35. This exercise involves the commutator, or "Lie bracket," of exercise 13.Recall [A, B] = A B - BA. First, argue that the trace of any commutatoris the zero matrix. Next, work out the 2-by-2 case explicitly. That is,

calculate Ia b e fc d g h

1[ 1[] explicitly as a 2-by-2 matrix.

36. There is an interesting mapping associated with the commutator. This isfor a fixed matrix A, AdA by AdA(X) = [A, X] = AX - XA. Verify thefollowing:

(a) AdA(A)=O.(b) AdA(BC) = (AdA(B))C + B(AdA(C)). (Does this remind you of

anything from calculus?)(c) AdAd,,(X) = ([AdA, AdB])(X)(d) AdA(aX) = aAdA(X), where a is a scalar.(e) AdA(X + Y) = AdA(X) + MAW)-

37. Since we add matrices componentwise, it is tempting to multiply themcomponentwise as well. Of course, this is not the way we are taught to doit. However, there is no reason we cannot define a slotwise multiplicationof matrices and investigate its properties. Take two rn-by-n matrices,A = [a;j] and B = [b, ], and define A 0 B = [a,1b;j], which will alsobe m-by-n. Investigate what rules hold for this kind of multiplication.For example, is this product commutative? Does it satisfy the associativelaw? Does it distribute over addition? What can you say about the rankof A p B? How about vec(A (D B)?

38. We could define the absolute value or magnitude of a matrix A = [a;j] ECmxn by taking the magnitude of each entry: JAI = [la;jl]. (Do notconfuse this notation with the determinant of A.) What can you say aboutIA + BI? How about JaAI where a is any scalar? What about IABI,IA*J,and IA0BI?

39. This exercise is about the conjugate of a matrix. Define ent;i(A) _ent;j .(A). That is, A = Prove the following:

(a) A+B=A+B(b) AB = AB(c) aA = &A(d) (A)k = (Ak).


40. Prove that BAA* = CAA* implies BA = CA.

41. Suppose A2 = 1. Prove that (I + A)" = 2"-'(1 + A). What can you sayabout(I - A)"?

42. Prove that if H = H*, then B*HB is Hermitian for any conformable B.

43. Suppose A and B are n-by-n. Argue that A and B commute if A - X1and B - XI commute for every scalar X.

44. Suppose A and B commute. Prove that AI'By = B`t AN for all positiveintegers p and q.

45. Consider the Pauli spin matrices Q, = [ 00 J , a'.

00`

and v2 = [ 0 0 J. Argue that these matrices anticommute (AB =

-BA) pairwise. Also, compute all possible commutators.

B.5.1 MATLAB Moment

B.5.1.1 Matrix Manipulations

The usual matrix operations are built in to MATLAB. If A and B are com-patible matrices,

A + B matrix addition

A * B matrix multiplication

A - B matrix subtraction

An matrix to the nth power

ctranspose(A) conjugate transpose

transpose(A) transpose of A

A' conjugate transpose

A.' transpose without conjugation

A."n raise individual entries in A to the rnth power

Note that the dot returns entrywise operations. So, for example, A * B is ordinarymatrix multiplication but A. * B returns entrywise multiplication.

B. 6 Submatrices 503

B.6 Submatrices

In this section, we describe what a submatrix of a matrix is and establishsome notation that will be useful in discussing determinants of certain squaresubmatrices. If A is an m-by-n matrix, then a submatrix of A is obtained bydeleting rows and/or columns from this matrix. For example, if

all a12 a13 a14

A = all a22 a23 a24

a31 a32 a33 a34

we could delete the first row and second column to obtain the submatrix

a12 a13 a14

a32 a33 a34

From the three rows of A, there are (3) choices to delete, where r = 0, 1, 2, 3,and from the four columns, there are (4) columns to delete, where c = 0, 1, 2, 3.Thus, there are (3)(4) submatrices of A of size (3-r)-by-(4-c).

A submatrix of an n-by-n matrix is called a principal submatrix if it isobtained by deleting the same rows and columns. That is, if the ith row isdeleted, the ith column is deleted as well. The result is, of course, a squarematrix. For example, if

all a12 a13

A = a21 a22 a23

a31 a32 a33

and we delete the second row and second column, we get the principal submatrix

all a13

a31 a33

The r-by-r principal submatrix of an n-by-n matrix obtained by striking outits last n-r rows and columns is called a leading principal submatrix. For thematrix A above, the leading principal submatrices are

all a12 a13all a12[

[all], a21 a22 a23a2

a22a31 a32 a33


It is handy to have a notation to be able to specify submatrices more precisely.Suppose k and in are positive integers with k < in. Define Qk,n, to he the setof all sequences of integers ( i 1 , i2, ... , ik) of length k chosen from the first npositive integers [m ] = { 1, 2, ... , n }:

QA..' = 1 0 1 ,'---,6iA) I 1 < it < i2 < ... < ik < m}.

Note that Qk.,,, can be linearly ordered lexicographically. That is, if a and 0are in Qk,,,,, we define a< 0 iff a = ( i 1 , i2, ... , ik), [ 3 = ( J l , j2, ... , Jk) andi, < j, or i, = j, and i2 < k2, or i, = j, and i2 = j2 and i3 < J3, , it = jland, ... , iA - I = fA-I but ik < jk. This is just the way words are ordered in thedictionary. For example, in lexicographic order:

Q2.3

Q2.4

/Q

3.4

Q 1.4

_ {(1, 2), (1, 3), (2, 3)},_ {(l, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)},

_ {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)},

_ {(1),(2),(3),(4)}.

It is an exercise that the cardinality of the set Qk.n, is the binomial coefficientin choose k:

QA.n,I=( k) .

There is a simple function on Qk,,,, that will prove useful. It is the sum functionS : Qk.n, - N, defined by s(i,, i2, ... , ik) = it + i2 + - - + ik. Now if A isin-by-n, a E Qk,n [3 E Qj,,, and the notation A[a 101 stands for the k-by-jsubmatrix of A consisting of elements whose row index comes from a andwhose column index comes from [3. For example, if

all a12 a13 a14 a15

a21 a22 a23 a24 a25A

a31 a32 a33 a34 a35

1a41 a42 a43 a44 a45

and a E Q2.4 , a = (1, 3), R E Q3.5 R = (2, 4, 5), then

A[a ] = f a12 a14

a32 a34

A[a I a] = f alt a13

a31 a33

and so on. We adopt the shorthand, A [ a I a] = A [a] for square matrices A. Notethat A[a] is a principal submatrix of A. Also, we abbreviate A[(i,, i2, ... , ik)]

B.6 Submatrices 505

by A[i 1, i2, ... , ik]. Thus, A[ 1, ... , r] is the r-by-r leading principal submatrix

of A. Using A above, A[1, 2] = all a12a21 a2,

Each a E Qk,n, has a complementary sequence & in Qk-m.m consisting ofthe integers in { l , ... , m } not included in a, but listed in increasing order. Thisallows us to define several more submatrices:

A[aI A

A(a I R]A[&IR],A(a I R) = A[« I Rl

For square matrices, A(a) - A(a I a). Note if A[a] is a nonsingular principalsubmatrix of A, we can define its Schur complement as

A/A[a] = A(a](A[a])-IA[a).

We sometimes abuse notation and think of a sequence as a set with a preferredordering. For example, we write A[a U {i,,} I a U { jp}], where i,, and j,, are notin a but they are put in the appropriate order. This is called "bordering" UsingA and a = (1, 3) from above, we see

all a13 a15

A[a U {4} 1 a U {5}] = a31 a33 a35

a41 a43 a45

In this notation, we can generalize the notion of a Schur complement evenfurther:

A/A[a, R] = A[& I R] - A[a I R] (A [a I R])-I A[a I R].

APB Exercise Set 5

1. What can you say about the principal submatrices of a symmetric matrix?A diagonal matrix? An upper triangular matrix? A lower triangular matrix?An Hermitian matrix?

2. How do the submatrices of A compare with the submatrices of A'9.Specifically, if A is the r-by-s submatrix of the m-by-n matrix A obtainedby deleting rows i l , i2, ... , in,_r and columns jl, j2, ... , and B isthe s-by-r submatrix obtained from AT by deleting rows jl , j2, ...and columns, i 1 , i2, ... , ihow does B compare to A?

3. Prove that Qk.n, I =().


Further Reading


[Butson, 1962] A. T. Butson, Generalized Hadamard Matrices, Proceed-ings of the American Mathematical Society, Vol. 13, (1962), 894-898.

[Davis, 19651 Philip J. Davis, The Mathematics of Matrices: A First Bookof Matrix Theory and Linear Algebra, Blaisdell Publishing Company,New York, (1965).

[G&B, 19631 S. W. Golomb and L. D. Baumert, The Search for HadamardMatrices, The American Mathematical Monthly, Vol. 70, (1963), 12-17.

[J&S, 1996] C. R. Johnson and E. Schreiner, The Relationship BetweenAB and BA, The American Mathematical Monthly, Vol. 103, (1996),578-582.

[Kolo, 1964] Ignace 1. Kolodner, A Note on Matrix Notation, The Amer-ican Mathematical Monthly, Vol. 71, No. 9, November, (1964), 1031-1032.

[Leslie, 1945] P. H. Leslie, On the Use of Matrices in Certain PopulationMathematics, Biometrika, Vol. 33, (1945).

ILiitkepohl, 1996] Helmut Ldtkepohl, Handbook of Matrices, John Wiley& Sons, New York, (1996).

B.6.1 MATLAB Moment

B.6.1.1 Getting at Pieces of Matrices

Suppose A is a matrix. It is easy to extract portions of A using MATLAB.

A(i, j) returns the i, j-entry of A.A( . , j) returns the jth column of A.

A(i, . ) returns the ith row of A.

A(end, ) returns the last row of A.

B.6 Submutrices 507

A( . , end) returns the last column of A.A( ) returns a long column obtained by stacking the columns of A.

[) is the empty 0-by-0 matrix.

Submatrices can be specified in various ways. For example,

A([i j], [p q r])

returns the submatrix of A consisting of the intersection of rows i and j andcolumns p, q, and r. More generally,

A(i : j, k : m)

returns the submatrix that is the intersection of rows i to j and columns k tom.The size command can be useful; size(A), for an m-by-n matrix A, returns

the two-element row vector [m, n] containing the number of rows and columnsin the matrix. [m, n] = size(A) for matrix A returns the number of rows andcolumns in A as separate output variables.

Appendix C

Determinants

C.1 Motivation

Today, determinants are out of favor with some people (see Axler [ 1995]).Even so, they can he quite useful in certain theoretical situations. The conceptgoes back to Seki Kowa (March 1642-24 October 1708), a famous Japanesemathematician. In the west, Gottfried Leibniz (I July 1646-14 November1716) developed the idea of what we call Cramer's rule today. However, the idealay dormant until 1750, when determinants became a major tool in the theoryof systems of linear equations. According to Aitken [ 1962], Vandermonde maybe regarded the founder of a notation and of rules for computing with determi-nants. These days, matrix theory and linear algebra play the dominant role, withsome texts barely mentioning determinants. We gather the salient facts aboutdeterminants in this appendix; but first we consider some geometric motivation.

Consider the problem of finding the area of a parallelogram formed by twoindependent vectors -'A = (a, b) and -9 = (c, d) in R2.

The area of the parallelogram is base times height, that is II A II I sin(O).

Thus, the squared area is I 1 112 I

_g II2

(sin (9))2 = III2 II

112

(1-cost (0))111112 11 _g

2-11 _X

1211 _g

2 Cost (0) =11111 2 11

2-_X' _g )2 = (a2+b2)(c2+d2)-(ac+bd)2 = a2c2+a2d2+b2c2+b2d2-a2c2-2acbd -b2d2 =a2d2 -2adbc+b2c2 = (ad - bc)2. So the area of the parallelogram determinedby independent vectors A =r (a, b) and -9 = (c, d) is the absolute value of

ad - be. If we make a matrixL

b d] , then we can designate this important

number as det =nd -he

More generally, det rall

a12 al a22 -a12a21. Notice the "crisscross"a21 a22

way we compute the determinant of a 2-by-2 matrix? Notice the minus sign? Intwo dimensions, then, the determinant is very closely related to the geometricidea of area.

509

510 Determinants

Figure Cl.!: Area of a parallelogram.

Consider a parallelepiped in R3 determined by three independent vectors,_ (a,, a2, a3), 1 _ (bi, b2, b3), and (ci, c2, c3). The volume of this

"box" is the area of its base times its altitude; that is,

volume = (area of base)(altitude)

_ (1111111 Z'I sin (0))(11111 cos (0)) _

= 1((b2c3 - c2b3), (b3ci - bic3), (bicz - c,b2)) (a,, a2, a3)I

= Iai(b2c3 - c2b3) + a2(b3ci - bic3) + a3(bic2 - cib2)I= Ia,b2c3 - aic2b3 + a2b3c, - a2b,c3 + a3b,c2 - aic,b,I.

This is getting more complicated. Again we create a matrix whose columns area, b, c,

the vectors and define a 3-by-3 determinant: det a2 b2 c2a3 b3 c3

a,c2b3 + a2b3c, - a2b,c3 + a3b,c2 - a3c,b2.

= alb,c3 -

C. I Motivation

z

Figure C1.2: Volume of a parallelepiped.

511

Actually, there is a little trick to help us compute a 3-by-3 determinant by hand.Write the first two columns over again. Then go down diagonals with plusproducts and up diagonals with minus products:

all a12 a13 all a12

det a21 a22 a23 a2l a22

a31 a32 a33 a31 a32

all a22 a33 + a12 a23 a31 + a13 a21 a32

-a31 a22 a13 - a32 a23 all - a33 a21 a12

Thus, in three dimensions, the idea of determinant is closely associated withthe geometric idea of volume.

5 12 Determinants

Though we cannot draw the pictures, we can continue this idea into R". A"solid" in W whose adjacent sides are defined by a linearly independent set ofvectors a .1, 71 2, ... , ,, is called an n-dimensional parallelepiped. Can weget hold of its volume? We can motivate our generalization by looking at thetwo-dimensional example again.

Form the matrix A whose columns are the vectors (a, b) and (c, d):

a cA-b d

We note that the height of the parallelogram is the length of the ortho onalprojection of t' onto 1, which is (! - P-)(1), where we recall P-(

1 . Thus, the area of the parallelogram is A (/ - P-t)(1)ANow here is something you might not think of: Write the matrix A in its QRfactorization

A = Q R = 911 912

921 q22

r12

r22 I .

Then, by direct computation using the orthonormality of the columns of Q, we

find r1 I = til and r22 = 11 (1 - P-t)(') 11 and so the area of the parallogramis just the determinant of R. This idea generalizes.

Suppose we make a matrix A whose columns are these independent vectors,A = [x1 I x, I ... I x"]. Then the volume is IIx1 II II(I - P2)x2 II. II U - P,,)xn II,where Pk is the orthogonal projection onto span [x1, x2, , xi1_1 } . Write Ain its QR factorization. Then the volume is the product of the diagonal ele-ments of R. Now (det(A))2 = det(AT A) = det(RT QT QR) = det(RT R) =(det(R))2 = (volume)2. But then, volume equals the absolute value of det(A).Even in higher dimensions, the determinant is closely associated with the ge-ometric idea of volume. Well, all this presumes we know some facts aboutdeterminants. It is time to go back to the drawing board and start from scratch.

C.2 Defining Determinants

We could just give a formula to define a determinant, but we prefer to beginmore abstractly. This approach goes back to Kronecker. We begin by defining afunction of the columns of a square matrix. (Rows could be used just as well.)The first step is to dismantle the matrix into its columns. Define

col : Cnxn Cnx1 x C"x1 x .. X Cnx1 (n copies)

C.2 Defining Determinants 513

by

col(A) = (col1(A), col2(A), ... ,

For example,

1 2 1 2 2X1 2X1col([3 4 ]=4 3J, 4 EC xC .

Clearly, col is a one-to-one and onto function. Then we introduce a "determinantfunction" on the n-tuple space with certain crucial properties. Define D : C"X 1 xCnX1 x . . . X CnXI -> C to be n-linear if

I. D(... ,a[xi],...)=a D(... ,[x1],...)fori = 1,2,... n

2. D(... , [xi] + [yi], ...) = D(... , [xi], ...) + D(... , [yi], ...) i =1,2,... ,n.

We say D is alternating iff D = 0 whenever two adjacent columns are equal.Finally, D : C" X 1 x C" X l x . . . x C" X I -+ C is a determinant function iff it isn-linear and alternating. We are saying that, as a function of each column, D islinear, hence the name n-linear. Then we have the associated determinant

det(A) = D(col(A)).

Note that (2) above does not say det(A + B) = det(A) + det(B), which is falseas soon as n > 1. Also note

det : C'" ---> C.

To see some examples, fix a number b and consider Dh([all a1 ]) _

allall

b(a11a22 - ai2a21) Dh is a determinant function. Define D(

ant

alt I I aln

a22 a2n

ant I

I

ann

aI1a22 This is an n-linear function. Is it

514 Determinants

a determinant function? Let or E S and define Do(

alla21

and 1[aln

a2n= ala(I)a2,(2) a determinant function'?

ann

alla21

Let f : Sn -* C be any function. Define Dt by D1(

and

a1,,

a2n

_ Ef((T)ala(I)a2a(2) ano(n)aES

ann

Dt is an n-linear function. Is it a

determinant function'?Our goal now is to show that there is a determinant function for every n E N

and this determinant function is uniquely determined by its value on the identitymatrix /n. Ifn = I this is easy: D([a]) = D(a[1]) = aD([1]) = aD(11).First, we assemble some facts.

THEOREM C.1Let D : C"' x C"' x x C' > -+ C be a determinant function. Then

1. D=0ifthere is azero column.

2. If two adjacent columns are interchanged, D changes sign.

3. If any two columns are interchanged, then D changes sign.

4. If two columns are equal, D = 0.

5. If the jth column is multiplied by the scalar a and the result is added tothe ith column, the value of D does not change.

6.

all a12 aln

a21 a22 a2nD( , ... , ) _ sgn(Q)ala(I) ...ano(n) D(1,,)

aE S

and ant ann

C.2 Defining Determinants 515

PROOF In view of (1) in the definition of n-linear, (1) of the theorem is clear.To prove (2), let [x1 and [y] be the ith and (i + I )st columns. Then

0 = D(D( ,[x],[x] ,[y],[x] +[y],D(... , [x], [x], ... ) + D(... , [x], [y], ...) + D(... , [y], [x], ... )+D(... , [yl, [y], ... )

O + D(... , [xl, [yl, ...) + D(... , [yl, [x], ...) + 0.

Therefore,

D(... , [x], [y], ... ) = _D(... , [y], [x], ... ).

The proof of (3) uses (2). Suppose the ith and jth columns are swapped. Sayi < j. Then by j - i swaps of adjacent columns, the jth column can be putinto the ith position. Next, the ith column is put in the jth place by j - i - 1swaps of adjacent columns. The sign change is (_I )2i-2i-1 = -1. Now (4) isclear. For (5), we use linearity in the ith column. We compute

D([xi 1, ... , [x,] + a[xil ....,[xj1,... , lx,,]) = D([xil,... , [x1]...... [xi], ... , [x.1)

+aD([x1],... , [xi]...., [xi],... , [xn1)

= D([x1 ], ... , [x1]...... 1xJ, ... , [x,]) + 0

= D([x11, ... , [x1]...... [Xi], ... , [x]).

The proof of (6) is somewhat lengthy and the reader is referred to A&W [ 19921for the details. 0

In view of (6), we see that a determinant function is completely determinedby its action on the identity matrix. Let D be such that D,,(I,,) = a. Indeed,D,, = aD1 so we take det as the associated determinant to D1, our uniquelydefined determinant function that takes the value I on the identity matrix. Note,we could have just started with this formula

det(A) = ESgn(Q)alo(1) . . ano(n)OES,

and proved properties of this function but we like this abstract approach. Youmay already be familiar with the following results, which follow readily fromthis formula for det(A).

516 Determinants

COROLLARY CI

1. If A E C'1 <'1 is upper or lower triangular, then det(A) =a I Ia22 ... ally.

2. det(A) = det(AT).

3. det(AB) = det(A)det(B).

4. If S is invertible, det(S'I AS) = det(A).

5. If i 0 j, then det(P11A) _ -det(A).

6. If i 0 j, then det(TI j(a)A) = det(A).

7. I for E S,,, then det(P(v)) = sgn(a).

8. det(cA) = c"det(A) if A is n-by-n.

PROOF We will offer only two proofs as illustration and leave the rest to thereader. First, we illustrate the use of the formula inproving (2).

det(AT) Y:sgn(a)a,(I)1 ... aa(,),oEs

psg n (T- I )aT -' (1)1 ... C1T-1(/l )'1TESL

Esppn(T)aT 1(I)1 ...aT I(n)nTE S

Esgn(T)aTT-1(I)T(I) ... aTT-i(n)T(n)TES,,

Esgn(T)aIT(1) . anT(n)

TE S.

det(A).

Next, we illustrate a proof based on the abstract characterization of determinant.Look at (3). The proof from the formula is rather messy, but consider a functionDB(A) = det(AB). One easily checks that DB is n-linear and alternating onthe columns of A. Thus, DB(A) = bdet(A), but b = DB(I) = det(B), so thetheorem follows. 0

C. 3 Some Theorems about Determinants 517

C.3 Some Theorems about Determinants

In this section, we present some of the more important facts about determi-nants. We need some additional concepts. Suppose A = [a;j] E C"x". We shalluse the notation developed for submatrices in the previous appendix. The maintheorems we develop are the Laplace expansion theorem and the Cauchy-Binettheorem. These are very useful theorems.

C.3.1 Minors

If A E C"', the determinant of any submatrix A[a I [3] where a E Qk,mand [3 E Qk,,, is called a k-by-k minor of A or the (a, [3)-minor of A. Here,0 < k < min{m, n}. The complementary minor is the determinant of A[a 101if this makes sense. If k = m = n, then Q,,,,", has only one element so there isonly one minor of order m, namely det(A). If k = I, there are m2 minors oforder one, which we identify with the elements of the matrix A. There are (111)

elements in "'2

Qk,," and so there are (k) minors of order k. The determinant ofa principal submatrix A [a] is called a principal minor and the determinant ofa leading principal submatrix is called a leading principal minor. For example,if A = [a, ] is 5-by-5, a = (2, 3, 5), and [i = (1, 2, 4), then the (a, [3)-minor is

a21 a22 a24

det a31 a32 a34

a51 a52 a54

and the complementary minor is

deta13 a15

a43 a45 1

There are (3) = 10 minors of A of order three altogether.

C.3.2 The Cauchy-Binet Theorem

You probably recall the well-known theorem that "the determinant of aproduct of two square matrices is the product of their determinants." The gen-eralization of this result has a history that can be read in Muir, [ 1906 pages123-130]. We present a vast generalization of this theorem next. We follow thetreatment in A&W [ 19921.

518 Determinants

THEOREM C.2 (Cauchy-Binet, 1812)Suppose A E C""", B E C""' and C = AB E C""" . Suppose 1< tmin fin, n, r} and suppose a E Q,.,,,, 0 E Q,.r. Then

det(C[a R]) = E det(A[a I y])det(BIy I PD.yEQ,n

PROOF Suppose a = (a,_. . (X,) E Q,,,,, and [3 = ([31, ... , {3,) E Q, rWe compute the (i, j)-entry of C[a I [3] :

11

ent;j(AB[a I R]) = rowa,(A) colp,(B) = Eaa,kbkp,k=1

so

Eaaikbkp, ... Eaa,kbkp,k=n

1

k=n

1rAB[aI13]=

1, 1,

Eaa,kbkp, i:aa,kbkp,k=1 k=1

Now, using the n-linearity of the determinant function, we get

n n I bk,p,

det(AB[a I [3]) _ E ... Eaa,k, ...aa,k, detk,=1 k,=1 L bk, P. I-

Ifkj = kj for i # j, then the ith and jth rows of the matrix on the right are equal.Thus, the determinant is zero in this case. The only nonzero determinants thatappear on the right occur when the sequence (k 1 , k2, ... , k,) is a permutation ofa sequence -y = (y1, ... , E Q, ,,. Let cr he a permutation in the symmetricgroup S, such that y; = kf(;) for I < i < t. Then

bk,p,

det

bk, p,

= sgn(o) det B[y 10].

Given y E Q,.n, all possible permutations of y are included in the summationabove. Therefore,

det (AB[a 10]) = E ( aa,,,,,,,) det B[y I [3]yEQ,,, UEs,

det(A[a I yl)det(B[y I (3])yEQ,,, 0

C.3 Some Theorems about Determinants 519

Numerous corollaries follow from this result. First, as notation, let I : n =(1,2,3,... ,n).

COROLLARY C.2Suppose A is nt-by-n and B is n-by-m where m < n. Then

det(AB) = E det(A[1 : in I y])det(B[y I 1 : m]).

In other words, the determinant of the product A B is the sum of the productsof all possible minors of the maximal order m of A with the correspondingminors of B of the same order. Also note that if m > n, det(AB) = 0.

COROLLARY C.3For conformable square matrices A, B, det(AB) = det(A)det(B).

PROOF If y E Q,,, then y = I : n, so A[y I y] = A, B[y I y] andAB[y I y] = AB. Thus det(AB) = det(AB[y I y]) _ E det(A[y I

yE Qnn

y])det(B[y I y]) = det(A)det(B).

COROLLARY C.4If A is k-by-n where k < n, then det(AA*) _

IdetA[l:kIy]I2>0.yEQ4

COROLLARY C.S (the Cauchy identity)

n ` n

F-aici >aidij=1 i=1 det

ajak det cj ck

n it = bj bk dj dk

>bici >bidi I<j<k<ni=1 i=1

In other words,

U

(b1d1) ( aidi) (rubic) (ab- akbj)(Eaici)i=" I i - =1 I<'<k<n

x (cjdk - ckdj) .

As a special case, we get the following corollary.

520 Determinants

COROLLARY C.6 (Cauchy inequality)Over R,

('a?)n

bi2E

2

(>aibi)i=1

I detI <i <k <n

a; ak

b; bk

so over R,

2

n n n

a; b; < (a)? (b).;!=1 i-I !=1

C.3.3 The Laplace Expansion Theorem

Another classical theorem deals with expressing the determinant of a squarematrix in terms of rows or columns and smaller order determinants. Recall thesum map on Q,.,,; s : Qr.n -* Ndefinedbys(ii,i2,... ,ir)=it+i2+. { i,.

THEOREM C.3 (Laplace expansion theorem)Suppose A E C"' ' and a e Q,,,, for I < t < n. Then

1. (fix a)det(A)= (-1)'I°'+`(a)det(A[a I R])det(A[a 10])

(expansion of det(A) by the rows in a)

2. (fix [3) det(A) = > (-1)'(a)+'(f3)det(A[a I R]) det(A[a I R])aEQ,n

(expansion of det(A) by the columns in [3)

PROOF Fix a in Q,,,, and define Da(A) = (-I) `la>+'t1 det(A[a I

PE Q, n

[3])det(A[a I [3]). Then D. : tC" n -+ C is n-linear as a function of thecolumns of A. We need Da to be alternating and Da(l) = I to prove the result.Then, by uniqueness, Da = det. Suppose two columns of A are equal; saycol,,(A) = colq(A) with p < q. If p and q are both in [3 E Q,,,,, then A[a 10]will have two columns equal so det(A[a I [3]) = 0. Similarly, if both p andq are in [3 E then det(A[a I [3]) = 0. Thus, in evaluating Da(A), itis only necessary to consider those [3 E Qr.,, such that p E [3 and q E [3 orvice versa. Suppose then that p E [3 and q E [3. Define a new sequence inQr.n by replacing p in [3 by q. Then [3' agrees with [3 except that q has beenreplaced by p. Thus, s([3') - s([3) = q - p. Now consider (-I)''(a)det(A[a I[3]) det(A[a I [3]) + (- 1)'(P') det(A [a I [3']) det(A [a I [3']). We claim this sum


is zero. If we can show this, then D0(A) = 0 since (3 and (3' occur in pairs inQ,.,,. It will follow that DQ(A) = 0 whenever two columns of A agree, makingD alternating. Suppose p = (3k and q = (31. Then (3 and (3' agree except in therange from p to q, as do (3 and (3'. This includes a total of q - p + 1 entries.We have (31 < ... < Rk = P < Rk+1 < ... < Rk+r-1 < q < [3k+r < ... < (3rand Ala

I(3] = A[a

I (3]P(w-1) where w is the r-cycle (k + r - 1, k +r - 2, .., k). Similarly, A[& I (i'])P(w) where w' is a (q - p + I - r)-cycle.Thus(-1)''(P')det(A[a I (3'])det(A[& I R']) = (-1)'(P)+(r-1)+(q-v)-rdet(A[a

I

(3]) det(A [& I R]). Since s(13') + (r - 1) + (q - p) - r - s((3) = 2(q - p) - 1

is odd, we conclude the sum above is zero. We leave as an exercise the factthat Da(t) = 1. So, Da(A) = det(A) by the uniqueness characterization ofdeterminants, and we are done. 0

(2) Apply the result above to AT.We get the classical Laplace expansion theorem as a special case.

THEOREM C.4 (Laplace cofactor expansion theorem)Let A E C""" Then

I. (-1)k+iakidet(A[k I j]) = biidet(A).k=1

n

2. j(-1)'+iaiidet(A[i I j]) = det(A), (expansion by the jth column).i=)

n

3. y(-I)k+iaikdet(A[j I k]) = biidet(A).k=1

n

4. F_(-1)'+iaiidet(A[i I J]) = det(A), (expansion by the ith row).i=1

So, in theory, the determinant of an n-by-n matrix is reduced by the theoremabove to the computation of it, (n - 1)-by-(n - 1) determinants. This can becontinued down to 3-by-3 or even 2-by-2 matrices but is not practical for largen. There is an interesting connection with inverses. _

Let A E Cnxn We note that A[i I j] = aii = entii(A) and A[iI J]

is the matrix obtained from A by deleting the ith row and_ j th column. The(ij)-cofactor of A is defined by cofii(A) = (-1)'+idet(A[i I j]). Define thecofactor matrix cof(A) by entii(cof (A)) = cofii(A) = (-1)'+idet(A[i I J]).The adjugate of A is then adj(A) = cof(A)T (i.e., the transpose of the matrixobtained from A by replacing each element of A by its cofactor). For example,

3 -2 1 -18 -6 -10let A = 5 6 2 . Then adj(A) = 17 -10 -1

0 -3 -6 -2 28

522 Determinants

In this notation, the Laplace expansions become11

1. >ak,cofki(A) = Fiijdet(A).k=I

n

2. >aijcofij(A) = det(A).i=

3. >aikcofjk(A) = bijdet(A).k=1

n

4. >aijcofij(A) = det(A).j=1

The main reason for taking the transpose of the cofactor matrix above todefine the adjugate matrix is to get the following theorem to be true.

THEOREM CSLetAEC' .

1. Aadj(A) = adj(A)A = det(A)1,,.

2. A is invertible iffdet(A) # 0.

In this case, A-' = det(A)-1 adj(A).

COROLLARY C.7A E Cn"n is invertible iff there exists B E C0)"' with AB = 1,,.

There is a connection with "square" systems of linear equations. Let Ax = b,where A is n-by-n. Then adj(A)Ax = adj(A)b whence det(A)x = adj(A)b.

n n _Thus, at the element level, det(A)xi = J:(adj(A))jibi =j:(-1)i+jbidet(A[i I_ i=I i=Ij]). This last expression is just the determinant of the matrix A with the jth col-umn replaced by b. Call this matrix B. What we have is the familiar Cramer'srule.

THEOREM C.6 (Cramer's rule)Let Ax = b, where A is n-by-n. Then this system has a solution iff det(A) # 0,

det(Bi )in which case xi = for i = l , ... , n.

det(A)


In the body of the text, we have used some results about the determinants ofpartitioned matrices. We fill in some details here.

THEOREM C.7

det I A C ] = det(A) det(C) for matrices of appropriate size.

PROOF Define a function D(A, B, C) = det I A C ], where A and B

are fixed. Suppose C is n-by-n. Then D is n-linear as a function of the columns ofC. Thus, by the uniqueness of determinants, D(A, B, C) = (det(C))D(A, B,Using transvections, we can zero out B and not change the value of D. Thus,D(A, B, D(A, O, Now suppose A is m-by-m. Then D(A, O, ism-linear and alternating as a function of the columns of A. Thus D(A, 0,(det(A))D(Im, 0, 1,). But note that D(1,,,, 0, 1. D(A, B, C) = (det(C))D(A, B, D(A, B, C) = (det(C))D(A, 0, (det(C))(det(A)). 0

This theorem has many nice consequences.

COROLLARY C.8

1. detL

B®

] = det(A) det(C) for matrices of appropriate size.

2. det L AC ]

= det(A) det(C) for matrices of appropriate size.

3. Let M = L CD

] where A is nonsingular. Then

det(M) = det(A)det(D - CA -I B).

BDJ

BI

= det(D - CB) for matrices of appropriate size.

= det(l - CB) fur matrices of appropriate size.

PROOF For the most part, these are easy. For three, note ICA B ] =D

]

CA-1 1

[0 D- CA-' B]

L] U

524 Determinants

APC Exercise Set 1

1. Prove that if A E BE C" ", then det( A® B) =det(A)"det(B)"'.

2. Fill in the proofs for Theorem C. I and Corollary C.I I.

3. Fill in the proofs for Corollary C.2, Theorem C.4, and Corollary C.7.

4. Argue that A isinvertibleiffadj(A)isinvertible,inwhich case adj(A)-' _det(A)-I A.

5. Prove that det(adj(A)) = det(A)"-'.

6. Prove that if det(A) = 1, then adj(adj(A)) = A.

7. Argue that adj(AB) = adj(B)adj(A).

8. Prove that B invertible implies adj(B-'AB) = B-'adj(A)B.

9. Prove that the determinant of a triangular matrix is the product of thediagonal elements.

10. Prove that the inverse of a lower (upper) triangular matrix is lower (upper)triangular when there are no zero elements on the diagonal.

11. Argue that ad j (AT) = (ad j (A))T .

12. Argue that det(adj(A)) = (det(A))"-1 for it > 2 and A n-by-n.

13. Give an example to show det(A+ B) = det(A)+det(B) does not alwayshold.

14. If A is n-by-n, argue that det(aA) = a"det(A). In particular, this showsthat det(det(B)A) = det(B)" det(A).

15. It is important to know how the elementary matrices affect determinants.Argue the following:

(a) det(D;(a)A) = adet(A).(b) det(T;j(a)A) = det(A).(c) det(P;jA) = -det(A).(d) det(AD;(a)) = adet(A).(e) det(AT;j(a)) = det(A).(t) det(AP;j) = -det(A).


16. Argue that over C, det(AA*) > 0.

17. Argue that over C, det(A) = det(A). Conclude that if A = A*, det(A)is real.

18. If one row of a matrix is a multiple of another row of a matrix, what canyou say about its determinant?

1 10 0 a

19. What is det [b

0J? How about det 0 b 0 '? Can you gener-

c 0 0alize?

20. Suppose A is nonsingular. Argue that (adj(A))-' = det(A-')A =ad j (A-' ).

21. Find a method for constructing integer matrices that are invertible andthe inverse matrices also have integer entries.

22. If A is skew-symmetric, what can you say about det(A)?

23. What is det(C(h)), where C(h) is the companion matrix of the polynomialh(x).

24. Prove det I I B J = det I Al®

1 1 01 1 0 0

0l 1

25. Suppose A = [ 111

-11 1

O 1 1

0 -1 l0 0 -1 1

Calculate det(A) for each. Do you recognize these famous numbers'?

26. Prove that d et

1 1 ... 1 ,

X1 x2 ... Xn2 2 2

x1 x2 xn = 11 (xi - xj) . This1<j<i<n

Ln-I n-I n-1

x1 x2xn

is a famous determinant known as the Vandermonde determinant.

27. There is a famous sequence of numbers called the Fibonacci sequence.There is an enormous literature out there on this sequence. It starts out{ 1, 1, 2, 3, 5, ... }. Do you see the pattern'?

526 Determinants

I i 0 0 ... 0i l i 0 .. 0

O i l i ... 0

Anyway, let F he the n-by-n matrix Fn =0 0 i I

0 0 0 i 1

Compute the determinants of F1, F2, F3, F4, and F5, and decide if thereis a connection with the Fibonacci sequence.

b+, a

28. Find det LI

+a

b+,c+ab

a

b+c

Notice anything interesting'?u+h

a+b a+b c

29. (F. Zhang) For any n-by-n matrices A and B, argue that det I AL -B>0.

BA

30. (R. Bacher) What is the determinant of a matrix of size n-by-n if its (i, j)entry is a°'->>2 for I < i, j < n?

l+a 1 1

31. Find det 1 I +b I . Can you generalize your findings?1 1 l+c

32. Here is a slick proof of Cramer's rule. Consider the linear system Ax = b,

where A is n-by-n, where x =

XI

X2

X3 . Replace the first column of the

L -Cnidentity matrix by b and consider A[x I e2

Ie3

I Ien] = [Ax

I

Ae2 I Ae3 I I [b I col2(A) I col3(A) II

Takingdeterminants and using that the determinant of a product is the productof the determinants, we find

det(A) det([x I e2 I e3 I ... I det([b I col2(A) I col3(A) I

I coln(A)])

But det([x I e2 I e3 I I e]) = xi, as we see by a Laplace expansion,so

xi = det([b I col2(A) I COMA) I I colt, (A)])det(A)


The same argument applies if x is placed in any column of the identitymatrix. Make the general argument. Illustrate the argument in the 2-by-2and 3-by-3 case.

33. Suppose all r-by-r submatrices of a matrix have determinant zero. Arguethat all submatrices (r + I)-by-(r + 1) have determinant zero.

34. Prove that any minor of order r in the product matrix AB is a sum ofproducts of minors of order r in A with minors of order r in B.

Further Reading

[A&W, 1992] William A. Adkins and Steven H. Weintraub, Algebra: AnApproach via Module Theory, Springer-Verlag, New York, (1992).

[Aitken, 1962] A. C. Aitken, Determinants and Matrices, Oliver andBoyd, New York: Interscience Publishers, Inc., (1962).

[Axler, 1995] Sheldon Axler, Down with Determinants, The AmericanMathematical Monthly, Vol. 102, No. 2, February, (1995), 139-154.

[Axler, 19961 Sheldon Axler, Linear Algebra Done Right, Springer,New York, (1996).

[B&R, 1986] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces,Vol. 2, Chapman & Hall, New York, (1986).

[Bress, 1999] David M. Bressoud, Proofs and Confirmations: The Storyof the Alternating Sign Matrix Conjecture, Cambridge University Press,Cambridge, (1999).

[B&S, 1983/84] R. A. Brualdi and H. Schneider, Determinantal Iden-tities: Gauss, Schur, Cauchy, Sylvester, Krone, Linear Algebra and ItsApplications, 52/53, (1983), 769-791, and 59, (1984), 203-207.

[C,D'E et al., 2002] Nathan D. Cahill, John R. D'Errico, Darren A.Narayan and Jack Y. Harayan, Fibonacci Determinants, The CollegeMathematics Journal, Vol. 33, No. 3, May, (2002), 221-225.

528 Determinants

[Des, 18191 P. Desnanot, Complement de la theorie des equations dupremier degrd, Paris, (1819).

[Dodg, 1866] Charles L. Dodgson, Condensation of Determinants, Pro-ceedings of the Royal Society, London, 15, (1866), 150-155.

[Garibaldi, 20041 Skip Garibaldi, The Characteristic Polynomial and De-terminant are Not Ad Hoc Constructions, The American MathematicalMonthly, Vol. I1 1 , No. 9, November, (2004), 761-778.

[Muir, 1882] Thomas Muir, A Treatise on the Theory of Determinants,Macmillan and Co., London, (1882).

[Muir, 1906-1923] Thomas Muir, The Theory of Determinants in the His-torical Order of Development, 4 volumes, Macmillan and Co., London,(1906-1923).

[Muir, 1930] Thomas Muir, Contributions to the History of Determinants,1900-1920, Blackie & Sons, London, (1930).

[Rob&Rum, 1986] David P. Robbins and Howard Rumsey, Determinantsand Alternating Sign Matrices, Advances in Mathematics, Vol. 62, (1986),169-184.

[Skala, 1971], Helen Skala, An Application of Determinants, TheAmerical Mathematical Monthly, Vol. 78, (1971), 889-990.

C.4 The Trace of a Square Matrix

There is another scalar that can be assigned to a square matrix that is veryuseful. The trace of a square matrix is just the sum of the diagonal elements.

DEFINITION C.1 (trace)Let A be in en xn. We define the trace of A as the sum of the diagonal elements

of A. In symbols, tr(A) _ >ente1(A) = ai i + a22 + + ann. We view tr as a

function from Cnxn to C.

Next, we develop the important properties of the trace of a matrix. The firstis that it is a linear map.

C.4 The Trace of a Square Matrix 529

THEOREM C.8Let A, B be matrices in C', In. Then

1. tr(A + B) = tr(A) + tr(B).

2. tr(AA) = atr(A).

3. tr(AB) = tr(BA).

4. If S is invertible, then tr(S-' AS) = tr(A).

5. tr(ain) = na.

6. tr(ABC) = tr(BCA) = tr(CAB).

7. tr(AT B) = tr(ABT).

8. tr(AT) = tr(A).

9. tr(A) = tr(A).

10. tr(A*) = tr(A).

The trace can be used to define an inner product on the space of matricesCnxn

THEOREM C.9The function (A I B) = tr(B*A) defines an inner product on C"x". In parti-cular,

1. tr(A*A)=tr(AA*)>0and =0iffA=®.

2. tr(AX) = O for all X implies A = 0.

3. Itr(AB)I < tr(A*A)tr(B*B) < Z(tr(A*A)+tr(B*B)).

4. tr(A2) + tr(B2) = tr((A + B)2) - 2tr(AB).

Can you see how to generalize (3) above?There is an interesting connection with the trace and orthonormal bases of

Cn.

THEOREM C.10Suppose ei,e2, , e is an orthonormal basis of C" with respect to the usual

n

inner product W. Then tr(A) _ (Ae; I ei ).

530 Determinants

There is also an interesting connection to the eigenvalues of a complex matrix.

THEOREM C.11The trace of A E C"" is the suin of the eigenvalues of A.

APC Exercise Set 2

1. Fill in the proofs of the theorems in this section.

2. (G. Trenkler) Argue that A2 = -A iff rank(A) = -tr(A) and rank(A +1) = n + tr(A), where A is n-by-n.

Appendix D

A Review of Basics

D.1 Spanning

Suppose v 1 , v2, ... , VP are vectors in C". Suppose a 1, a2, ... , aP are scalarsin C. Then the vector v = a1 v 1 +a2v2+ +aPVP is called a linear combination

of v1, v2, ... , VP. For example, I 2 I is a linear combination of

and I

1

since 2L

iJ

+ 3i [ 0] = [

9i ]. However, there is

01 10Icould be a linear combination of and

3

(why not?).2 j L

1

J L JRecall that the system of linear equations Ax = b has a solution iff b canbe expressed as a linear combination of the columns of A. Indeed, if b =

Cl

C2

c,coli(A)+c2col2(A)+ +c, then x = solves the system.

C"

Consider a subset S of vectors from C". Define the span of S (in symbols,sp(S)) as the set of all possible (finite) linear combinations that can be formedusing the vectors in S. Let's agree the Tan of the empty set is the set having onlythe zero vector in it (i.e., sp(o) = ()). For example, if S = ((1, 0, 0)), thensp(S) = ((a, 0, 0) 1 a E Q. Note how spanning tends to make sets bigger.In this example, we went from one vector to an infinite number. As anotherexample, note sp((1, 0), (0, 1)} = C2. We now summarize the basic facts aboutspanning.

THEOREM D.1 (basic facts about span)Let S, S1, S2 be subsets of vectors in C".

1. For any subset S, S e sp(S). (increasing)

2. For any subset S, sp(S) is a subspace of C. in fact, sp(S) is the smallest

531

532 A Review of Basics

subspace of C" containing S.

3. If S, C S2, then .sp(Si) c sp(S2). (monotone)

4. For any subset S, sp(sp(S)) = sp(S). (idempotent)

5. M is a subspace of C" iff M = sp(M).

6. sp(S, fl S2) c sp(S,)fl sp(S2).

7. sp(Si U S2) = sp(S, )+ sp(S2).

8. sp(Si) = sp(S2) iff each vector in S, is a linear combination of vectorsin S2 and conversely.

PROOF The proofs are left as exercises. U

We can view spas a function with certain properties from the set of all subsetsof C" to the set of all subspaces of C" , sp : P(C") -* La t (C" ). The fixed pointsof sp are exactly the subspaces of C".

If a subspace M of C" is such that M = sp(S), we call the vectors in Sgenerators of M. If S is a finite set, we say M is finitely generated. It would benice to have a very efficient spanning set in the sense that none of the vectorsin the set are redundant (i.e., can be generated as a linear combination of othervectors in the set).

THEOREM D.2Let S = (VI, V2, ... , V,,) where p > 2. Then the following are equivalent.

I. sp({v,, V2, ... , v,,}) = sP({v,, V2, ... , Vk-1, Vk+i, ... , v,,}).

2. vk is a linear combination o f v1 , V2, ... , Vk_,, V 4 + 1 ,--, V,,.

3. There exist scalars a,, a2, ... , a,,, not all zero, such that a, v, + a2v2 +

... + a,,vv = V.


The last condition of the theorem above leads us to the concept developed inthe next section.

D.2 Linear Independence 533

D.2 Linear Independence

A set of vectors {v1, v2, ... , is called linearly dependent iff there existscalars o ti, a2, ... , a not all zero, such that a, vi + a2v2 + + -6.Such an equation is referred to as a dependency relation. These are, evidently,not unique when they exist. If a set of vectors { v i , v2, ... , v,,) is not linearlydependent, it is called linearly independent. Thus, the set { v 1 , v2, ... , v,,} is

-iTlinearly independent iff the equation aivi + a2v2 + + at,vt, = impliesa1 = a2 = . = aP = 0. For example, any set of vectors that has the zerovector in it must he linearly dependent (why?). A set with just two distinctvectors is dependent iff one of the vectors is a scalar multiple of the other. Theset {(1,0), (0,1)} is independent in C2.

THEOREM D.3Let S be a set of two or more vectors.

1. S is linearly dependent iff at least one vector in S is a linear combinationof other vectors in S.

2. S is linearly independent iff no vector in S is expressible as a linearcombination of vectors in S.

3. Any subset of a linearly independent set is linearly independent.

4. Any set containing a linearly dependent set is linearly dependent.

5. (Extension theorem) Let S = {v1, V2, ... , vN} be a linearly independentset and v ¢ sp(S). Then S _ (V I, v2, ... , vv, v} is also an independentset.

6. Let S = { v i , v2, ... , be a set of two or more nonzero vectors. ThenS is dependent iff at least one vector in S is a linear combination of thevectors preceding it in S.

7. If S is a linearly independent set and v E sp(S), then v is uniquelyexpressible as a linear combination of vectors from S.

PROOF As usual, the proofs are left as exercises. U


D.3 Basis and Dimension

There is a very important result about the size of linearly independent sets infinitely generated subspaces that allows us to introduce the idea of dimension.We begin by developing some language. Let B be a set of vectors in a subspaceM of C. We say B is a basis of M iff (1) B is an independent set and (2)sp(B) = M. For example, let 8 = ((1, 0), (0, 1)) in C22. Then clearly, 5 is abasis of C2. A subspace M of C" is called finitely generated iff it contains afinite subset S with sp(S) = M. We see from above, C2 is finitely generated.Next is a fundamental result about finitely generated subspaces. It is such acrucial fact, we offer a proof.

THEOREM D.4 (Steinitz exchange theorem)Let M be a finitely generated subspace of C". Specifically, let Al = sp({v,,v2, ... , v,, } ). Let T be an independent set of vectors in M, say T = {w, ,W2, ... , w,,, }. Then m < p. In other words, in a finitely generated subspace,you cannot have more independent vectors than you have generators.

PROOF First, note w, a M, so w, is a linear combination of the vs.Consider the set T, = {w, , v, , ... , v,,). Clearly sp(T,) = M. Now T, is adependent set since at least one vector in it, namely w,, is a linear combi-nation of the others. By Theorem D.3, some vector in T, is a linear combi-nation of vectors preceding it in the list, say vj. Throw it out and considerS, = (w,, v, , ... , vi-1, vj+,, ... , v,,). Note sp(Sj) = M since vj was re-dundant. Now we go again. Note W2 E M so w2 E sp(S,) = M. ConsiderT 2 = (w2, w,, v,, ... , vj_,, vj+,, ... , v,,). Clearly T2 is a linearly dependentset since W2 is a linear combination of the elements in S1. Again by TheoremD.3, some vector in T2 is a linear combination of vectors previous to it. Couldthis vector he W2? No, since w, and w2 are independent, so it must be one ofthe vs. Throw it out and call the resulting set S2. Note sp(S2) = M since thevector we threw out is redundant. Continue in this manner exchanging vs forws. If we eliminate all the vs and still have some ws left over, then some w willbe in the span of the other ws preceding it, contradicting the independence ofthe ws. So there must be more vs than ws or perhaps the same number. That ism<p. 0

With such a beautiful theorem, we can reap a harvest of corollaries.

COROLLARY D.1Any n + I vectors in a subspace generated by n vectors must be dependent.

D.3 Basis and Dimension 535

COROLLARY D.2Any n + I vectors in C" are necessarily dependent.

COROLLARY D.3Any two bases of a finitely generated subspace have the same number of vectorsin them.

PROOF (Hint: View one basis as an independent set and the other as a setof generators. Then change these roles.) 0

This last result allows us to define the concept of dimension. A subspace Mof C" is m-dimensional if M has a basis of m vectors. In view of the previouscorollary, this is a uniquely defined number. The notation is dim(M) = m. Notedim(C") = n.

COROLLARY D.4If M is generated by n vectors and S = {v I, v2, ... , v" } is an independent setof vectors in M, then S must be a basis for M.

COROLLARY D.5If M has dimension n and S = IV 1, v2, ... , v") spans M, then S must be a basisfor M.

COROLLARY D.6Suppose M # 16) is a finitely generated subspace of C"'. Then

1. M has a finite basis.

2. Any set of generators of M contains a basis.

3. Any independent subset of M can be extended to a basis of M.

COROLLARY D. 7Suppose M and N are subspaces of C"' and dim(N) = n and M e_ N. Thendim(M)< dim(N). Moreover, if in addition, dim(M) = dim(N), then M = N.

These are wonderful and useful corollaries to the Steinitz theorem. Someresults we want to review depend on making new subspaces from old ones.Recall that when M, and M2 are subspaces of C", we can form their intersectionM, (M2 and their sum M, + M2 = {u + v I U E M, and V E M2}. It is easyto show that these constructions lead to subspaces. A sum is called a direct

536 A Review of Busies

sum when M, fl M2 The notation is M, ® M, for a direct sum.If C" = M, ® M2, we say M2 is a complement of M, or that M, and M2 arecomplementary subspaces. Typically, a given subspace has many complements.

THEOREM D.5Suppose M, and M2 are subspaces of C" with bases 5 and E2 respectively.Then TA.E.:

I. C" = M, ® M2.

2. For each vector w in C", there exist unique vector v in M, and u in M2with w = v + u.

3. Ci, fl E2 = 0 and B, U E2 is a basis for C".


If M C N and there exists a subspace K with M ® K = N, then K is calleda relative complement of M in N.

THEOREM D.6Suppose N is finitely generated and M C_ N. Then M has a relative complementin N.

PROOF As usual, the proof is left as an exercise. 0

COROLLARY D.8Any subspace of C" has a complement.

We end with a famous formula relating dimensions of two subspaces.

THEOREM D. 7 (the dimension formula)Suppose M, and M2 are subspaces of a finitely generated subspace M. ThenM1, M2, M, +M2, and M, fl M2 are all finite dimensional and dim(M, + M2) _dim(MI) + dim(M2) - dim(M, fl M2).

PROOF Start with a basis of M, fl M2, say u,, u2, ... , u,,. Then extendthis basis to one of M, and one of M2, say u,, ... , ua,v,, ... , vh and u,,u2, ... , u,,, w,, ... , w, respectively. Now argue that u,, ... , u,,,v,, ... , vh,WI.... , w, is a basis for M, + M2. 0

D.3 Basis and Dimension 537

Does the dimension formula remind you of anything from probabilitytheory?

APD Exercise Set 1

1. How would the dimension formula read if three subspaces were involved?Can you generalize to a finite number of subspaces?

2. Fill in the arguments that have been omitted in the discussion above.

3. Suppose {u1, u2, ... , u,,} and {v1, v2, ... , vq} are two sets of vectorswith p > q. Suppose each u; lies in the span of {v1, v2, ... , vq }. Arguethat {u1, u2, ... , u1,} is necessarily a linearly dependent set.

4. Suppose M1 and M2 are subspaces of C" with dim(M,+M2) = dim(M,flM2) + 1. Prove that either M, c M2 or M2 a Ml .

5. Suppose M1, M2, and M3 are subspaces of C". Argue thatdim(M1 fl M2 fl M3) + 2n > dim(M1) + dim(M2) + dim(M3).

6. Suppose {u1, u2, ... , and {v1, v2, ... , are two bases of V.Form the matrix U where the columns are the u; s and the matrix Vwhose columns are the vis. Argue that there is an invertible matrix Ssuch that SU = V.

7. Suppose {u,, u2, ... , u,,} spanasubspaceM.Arguethat{Au1, Au2, ... ,Aun} spans A(M).

8. Suppose Ml,... , Mk are subspaces of C". Argue that dim(M1 fl ... flk k-1

Mk) = n - E(n - dim(Mi))+>{n -dim((M1 fl ... fl Mi)+ Mj+1)}.i=1 j=1

k

Deduce that dim(M1 fl... fl Mk) > n - E(n -dim(M1)) and dim(M1 fl=1

k

...flMk) = n->.(n-dim(Mj))iffforalli = 1,... k, Mi+( fl Mi) _i=1 i0i

Cn

9. Select some columns from a matrix B and suppose they are dependent.Prove that the same columns in AB are also dependent for any matrix Athat can multiply B.


D.4 Change of Basis

You have no doubt seen in your linear algebra class how to associate a matrixto a linear transformation between vector spaces. In this section, we review howthat went. Of course, we will need to have bases to effect this connection. First,we consider the change of basis problem.

Let V be a complex vector space and let B = (b,, b2, ... , bn } he a basis forV. Let x be a vector in V. Then x can he written uniquely as x = b, R i +b2 [32 +

+ That is, the scalars [3i, [32, ... , (3 are uniquely determined by xand the basis B. We call these scalars in C the "coordinates" of x relative to thebasis B. Thus, we have the correspondence

131

12X H = Mat(x; B) - [x]B.

Rn

Now suppose C = {ci , c2, ... , cn } is another basis of V. The same vector xcan be expressed (again uniquely) relative to this basis as x = c,y, + c2'y2 +

+ Thus, we have the correspondence

1

) I

Y2x ( = Mat(x;C) - [x]c.

L yn JThe question we seek to resolve is, what is the connection between these setsof coordinates determined by x? First we set an exercise:

1. Show Mat(x + y; B) = Mat(x; B) + Mat(y; B).

2. Show Mat(xa; B) = Mat(x; B)a.

For simplicity to fix ideas, suppose B = {b,, b2 } and C = {C1, c2}. Let x bea vector in V. Now c, and c2 are vectors in V and so are uniquely expressible

in the basis B. Say c, = b, a + b2(3 so that [c, ]B =L R J

and [c2]B = [ y l ,

1

SJJ

reflecting that e2 = b, y + b2S. Let [x]c =L

J . What is [x]B? Well, x =

c1 µ+c2o =(b,a+b2(3)µ+(b,y+b2s)a = bi (aµ+ya)+b20µ+8(F).Thus,

[x]B = (Pµ +-Yo-

1=1 R 8 11 'L [ 0s

I[x]c .

D.4 Change of Basis 539

The matrixL R

b 1 gives us a computational way to go from the coordinates

of x in the C-basis to the coordinates of x in the 8-basis. Thus, we call the matrixOL -Y

LR sJthe transition matrix or change of basis matrix from the C to the B

basis. We write

R13.-c = r R s J= [[el]B I [C2]E1

and note

[x]B = RB.-c [x]c

The general case is the same. Only the notation becomes more obscure. Let 13= {b1, b2, ... , and C = { c i , c2, ... , be bases of V. Define R13-c =[[ci]B I [c2]13 I ... I [c]B]. Then for any x in V, [x],6 = RB.-c[x]c. There isa clever trick that can be used to compute transition matrices given two bases.We illustrate with an example. Let B = {(2, 0, 1), (1, 2, 0), (1, 1, ]) } and C= {(6, 3, 3), (4, -1, 3), (5, 5, 2)}. Form the augmented matrix [B 1 C] and useelementary matrices on the left to produce the identity matrix:

2 1 1 6 4 5 1 0 0 2 2 1

0 2 1 1 3 -1 5 - 0 1 0 1 1 -1 2 .

1 0 1 3 3 2 0 0 1 1 1 1

2 2 1

Then RB,c = 1 -1 2 . We finally make t he co nnec tion w ith invert-

ible matrices.I I I

THEOREM D.8With notation as above

1. RA.-BRB.-c = RA-

C-2. RB,B = 1.

3. (RB,c)-' exists and equals RC-B. Moreover, [x]B(RB.-c)-' [x]ti = [x]c

= Rti.-c [xlc iff

PROOF The proof is left to the reader. 0

Next, we tackle the problem of attaching a matrix to any linear transforma-tion between two vector spaces that have finite bases. First, recall that a lineartransformation is a function that preserves all the vector space structure, namely,


addition and scalar multiplication. So let T : V -+ W. T is a linear transfor-mation iff (1) T(x + y) = T(x) + T(y) for all x, y in V and (2) T(xr) = T(x)rfor all x in V and r in C. We shall now see how to assign a matrix to T relativeto a pair of bases.

As above, we start with a simple illustration and then generalize. Let B ={bi, bz} be a basis of V and C = {ci, cz, c3} be a basis for W. Now T(b,) isa vector in W and so must be uniquely expressible in the C basis, say T(b, )= CI (X + c20 + c3y. The same is true for T(bz), say T(b2) = c, s + cze + cap.Define the matrix of T relative to the bases 8 and C by

sMat(T; 5, C) = R e = [[T(bi)]c I [T (bz)]c]

'Y P

Then a remarkable thing happens. Let x be a vector in V. Then say x =

bi ri + bzrz. Then [x]ri

J .Also, T(x) = T (b, r, + b2r2) = T(bi )ri +

r2T(b2)r2 = (Cl a + CI -P + c3y)ri + (Cl 8 + cze + c3p) rz = c, («r, + 8r,)) +

«r, + 8rzcz (Rri + er2)+c3 (yri + prz). Thus Mat(T (x); C) = (3r, + erz = (3 e

11

s

r 1 yr, + Prz 'Y prr2

Mat(T; 8, C)Mat(x; B). In other words,

[T(x)]c = Mat(T;B,C)[x]8.

The reader may verify that this same formula persists no matter what the finitecardinalities of B and C are.

We end with a result that justifies the weird way we multiply matrices. Thecomposition of linear transformations is again a linear transformation. It turnsout that the matrix of a composition is the product of the individual matrices.More precisely, we have the following theorem.

THEOREM D.9Suppose V W, and P are complex vector spaces with bases 8, C, A, respectively.Suppose T : V -* W and S : W -+ P are linear transformations. ThenMat(S o T; B, A) = Mat(S;C,A)Mat(T;B,C).

PROOF Take any x in V. Then, using our formulas above,Mat(SoT;f3, A)[x],j = [(SoT)(x)].4 = [S(T(x))]A = Mat(S;C,A)[T(x)]c

= Mat(S;C,A)(Mat(T; B, C)[x]8). Now this holds for all vectors x so it holds

0when we choose basis vectors for x. But [b,]13 = and generally,

0

D.4 Change of Basis 541

[bi]n = e;, the standard basis vector. By arguments we have seen before,Mat(S o T;5, A) must equal the product Mat(S;C,A)Mat(T;8,C) columnby column. Thus, the theorem follows. 0

If we had not yet defined matrix multiplication, this theorem would he ourguide since we surely want this wonderful formula relating composition tomatrix multiplication to be true.

APD Exercise Set 2

1. Find the coordinate matrix Mat(x; B) of x = (3, -1, 4) relative to thebasis B = {(1,0,0), (1, 1,0), (1, 1, 1)). Now find the coordinate matrixof x relative to C = ((2,0,0), (3,3,0), (4,4,4)}. Suppose Mat(v;B) _

4 4

7 . Find v. Suppose Mat(v;C) = 7 . Find v.10 10

2. Let B = {(I, 0, 1), (6, -4, 2), (-1, -6, I )j and C =--1, - 1, 0), (-1, -3, 2), (2, 3, -7)}. Verify that these are bases and

2find the transition matrix RCt-a and Ra. -c. If Mat(v; 13) = 4 , use

6the transition matrix to find Mat(v;C).

3. Argue that (Re-13)-l = RB-c, Rj3_8 = 1, and RA+ Rt3_c = RA C.

4. Argue that if C is any basis of V and S is any invertible matrix over C.then there is a basis B of V so that S = Rc.-13.

5. Investigate the map Mat(_;8,C) : Hom(V, W) -> tC' ' that assignsC-linear map between V and W and the matrix in (C-11 relative to thebases 5 of V and C of W.

6. Suppose T : V --+ V is invertible. Is Mat(T; 8,13) an invertible matri;for any basis B of V?

7. How would you formulate the idea of the kernel and image of a C-Iineamap?

8. Is T, : V -> V by T,(x) = xr a C-linear map on the V? If so, what is itimage and what is its kernel'?


9. Let T be any C-linear map between two vector spaces. Argue that

(a)(b) T(-x) _ -T(x) for all x.(c) T(x - y) = T(x) - T(y) for all x and y.

10. Argue that a linear map is completely determined by its action on a basis.

Further Reading

[B&R, 1986(2)1 T. S. Blyth and E. F. Robertson, Matrices and VectorSpaces, Vol. 2, Chapman & Hall, New York, (1986).

[B&B, 2003] Karim Boulabiar and Gerard Buskes, After the DeterminantsAre Down: A Criterion for Invertibility, The American MathematicalMonthly, Vol. 110, No. 8, October, (2003).

[Brown, 1988] William C. Brown, A Second Course in Linear Algebra,John Wiley & Sons, New York, (1988).

[Max&Mey, 20011 C. J. Maxon and J. H. Meyer, How Many SubspacesForce Linearity?, The American Mathematical Monthly, Vol. 108, No. 6,June-July, (2001), 531-536.

Index

I -inverse, 199-2092-inverses, 208-209, 217-222

A

Absolute error, I IAdjoint matrix, 76Adjugate, 76-81Adjugate matrix, 76-81Affine projection, 315-323Affine subspace, 8, 315-317Algebra of projections, 299-308Algebraic multiplicity, 333Alternating, 431, 436Angle between vectors, 257Appoloniuis identity, 260Approximation, 291, 294-297Argand diagram, 464

B

Back substitution, 41-49Bailey theorem, 218-219Banach norm, 245, 248Banachiewicz inversion formula, 26Base, 10-11Basic column, 74, 165Basic facts about all norms, 235-236Basic facts about any distance function,

237-238Basic facts about inner product, 259Basic facts about inverses, 18-19Basic facts about matnx norms, 246Basic facts about span, 531-532Basic facts about tensor products, 453-454Basic facts about the conjugate matrix, 497Basic facts about the inner product, 259Basic facts about transpose, 496Basic properties of induced norms,

248-250Basic properties of vec, 454-455Basic rules of matrix addition, 487-488

Basic rules of matrix multiplication,490-495

Basic rules of scalar multiplication,489-490

Basic rules of sigma, 492Basic variables, 74Basis, 534-542Bessel's inequality, 264Bilinear form, 431, 437-440,450-452Bilinear map, 431Blocks, 15-17Bose theorem, 202-203Built-in matrices, 14-15

C

Cancellation error, 12Canonical form, 340Canonical representative, 163Cartesian decomposition, 448, 498Cauchy-Binet Theorem, 517-520Cauchy inequality, 233, 240Cauchy-Schwarz inequality, 233, 240Cauchy sequence. 242Cayley-Hamilton theorem, 81-98Change of basis, 538-542Characteristic matrix, 81Characteristic polynomial, 81, 90-95Chopping, 1 ICirculant matrix, 187-188Cline theorem, 231Column equivalence, 163Column index, 485Column rank, 99, 103Column space, 56, 99Column view, 2Columns, 165Commutative law of multiplication, 2Companion matrix, 335Complementary minor, 517Complementary subspace, 118Complete pivoting, 170

543

544

Complex conjugate, 468-473Complex numbers, 459-483Condition number, 249, 254-255Congruence, 438, 447-450Conjugate partition, 135-136Conjugate transpose, 7, 497Consistency condition, 190Convergent sequence of vectors, 233, 238Convex, 235-236Core, 131-132, 225Core-nilpotent decomposition, 225Core-nilpotent factorization, 131-132Cosets, 324

D

De Moivre's theorem. 475Delinite, 437Determinant function, 513Determinants, 509-530Diagonal element, 486Diagonal matrices, 15Diagonalizable with respect to a unitary,

372

Diagonalizable with respect to equivalence,351-357

Diagonalizable with respect to similarity,357-371

Dilation, 52Dimension formula, 534-537Direct sum, 117-128Direct sum decomposition, 118Disjoint. 118, 120, 127Distance, 468-473Distance function, 237-238Drazin inverse, 223-229

E

Eigenprojection, 329Eigenspace, 329, 404Eigenstuff, 329-338Eigenvalue, 82, 329, 337-338, 347Eigenvector, 329, 332, 337-338, 389-422Elementary column operations, 55Elementary divisors, 426Elementary matrices, 49-63Elementary row operations, 54Elimination matrices, 66Entries, 13-14, 64Equivalence, 351-357Equivalence relation, 144

Index

Equivalent, 43. 372Equivalents to being invertible, 19Error vector, 249Euclidean norm, 235Even function, 126Explicit entry, 13-14Exponent, 10

F

Ferrer's diagram, 136, 138, 412Finitely generated, 532Floating point arithmetic, 10-I IFloating point number, 10Floating point representation, 10Fourier coefficients, 263-264Fourier expansion, 263-265Frame algorithm, 81-98Free variables, 42Frohenius inequality, 114Frobenius norm, 114Full rank factorization, 176-179, 275,

313-315Function view. 5Fundamental projections, 309-310, 313Fundamental subspaces, 99-111, 278, 380Fundamental theorem of algebra,

278-282

G

Gauss elimination, 41-49Generalized eigenspace, 404Generalized eigenvector, 389-422Generalized elementary matrices, 62Generalized inverses, 199-228Generators, 532Geometric law. 3Geometric multiplicity, 329Geometric view, 2Grain matrix, 268Grain-Schmidt orthogonalization process.

268

Grant-Schmidt process, 269Grammian. 268Group inverse, 230-231Gutttnan rank additivity formula, 108

H

Hartwig theorem, 204Henderson-Searle formulas, 21-24

Hermite echelon form, 171-175Hermitian, 257-258Hermitian inner product, 257-258Holder's inequality, 238-240Hyperbolic pair, 450Hyperbolic plane, 450Hyperplane, 2

0

Idempotent, 117-128, 291, 360Identity matrix, 14Inconsistent system, 44Indefinite, 437Independent, 533Index,128-148Induced matrix norm, 248Induced norm, 247-250Inertia, 444Inner product space, 257-262Inner products, 257-290Intersection, 535Invariant factors, 425Invariant subspace, 125, 131, 147,

326

Inverse, 1-39Invertible, 17,41-98Isometry, 243Isotropic, 440

J

JCF, 413JNF, 416Jordan basis, 405

Jordan block, 389-392Jordan block of order k, 389Jordan canonical form, 389-430Jordan decomposition, 366Jordan form, 389-422Jordan matrix, 396-398Jordan segment, 392-395Jordan string, 403Jordan's theorem, 398-422, 428

K

Kronecker product, 452Krylov matrix, 61Krylov subspace, 106Kung's algorithm, 274-276

Index 545

L

Laplace cofactor expansion theorem,520-527

LDU factorization, 63-76Leading coefficient, 155Leading entry, 64Leading principal minor, 77Leading principal submatrices, 67, 77Least squares solutions, 285-288Left inverse, 148-154Leibnitz rule, 87Length, 233, 283, 392Line segment, 236Linear combination, 531Linear independence, 533Linear transformation, 540Linearly dependent, 533Linearly independent, 533Loss of significance, 12-13Lower triangular, 63, 69LU-factorization, 63-76

M

M-dimensional, 535M-perp, 262Magnitude, 233Mahalanobis norm, 251Mantissa, 10MATLAB, 13-17,37-39,47-49,75-76.

95-98, 109-111, 147-148, 167-168,179,190,252-255,269,276-278,313,337-338,376-377,385-387,392,395,397-398,456-457,502,506-507

Matrix addition, 487-488Matrix algebra, 5Matrix diagonalization, 351-387Matrix equality, 486Matrix equivalence, 155-171Matrix inversion. 39Matrix multiplication, 490-495Matrix norm, 244-255Matrix of a linear transformation, 538Matrix operations, 485-507Matrix units, 50-51, 54, 60Matrix view, 3Minimal polynomial, 57-63, 90-95Minimal polynomial of A relative to v, 361Minimum norm, 282-285Minkowski inequality, 241Minor, 77, 517

546

Modified Gram-Schmidt process, 269Modified RREF, 155Modulus, 468-473Moore-Penrose inverse, 155-198, 292Motiviation, 509-512MP-Schur complement, 194

N

Natural map, 325Negative definite, 437Negative semidefinite, 437Newton's identities, 85-90Nilpotent, 131-132, 147-148, 225,

339Nilpotent on V, 367Nonbasic columns, 1b5Nondegenerate, 434Nondegenerate on the left, 434Nondegenetate on the right, 434Nonnegative real numbers, 347Nonsingular, 17Norm, 233-256

Norm equivalence theorem, 236-237Norm of vector, 234Normal, 339Normal equations, 274Normalized, 263Normalized floating point number,

10

Normalized orthonormal set, 263Normed linear space, 233-244Null space, 99Nullity, 99, 102, 104-105, 109, I I I

0Odd function, 126Off-diagonal, 486Operation counts, 39, 170-171Orthogonal, 262-263, 275, 299-300Orthogonal basis, 442Orthogonal full rank factonzation, 275Orthogonal projection, 291-298Orthogonal projection matrix, 299-300Orthogonal set, 262-269Orthogonal vectors, 262-263Orthogonality, 440-442Orthonormal set, 263Orthosymmetnc, 441Othogonal sets of vectors, 262-269

Index

Overdetermined, 46, 48-49Overflow, 10

P

P-norm, 235, 245Parallel projection, 121Parallel sum, 197Parallelogram law, 257Parseval's identity, 265Partial order, 300-307Partial pivoting, 169Particular solution, 8Partition of n, 135-136Pauli spin matrices, 502Penrose theorem, 287-288Permutation matrix, 53Pivot, 41, 74Pivot column, 155Pivot rank, 74Pivoting strategies, 169-170Polar decomposition theorem,

347-350Polar form of complex numbers,

473-480Polynomial, 90-95Polynomials, 90-95, 480-482Positive definite, 437Positive semidefinite, 437Precision, 10Primary decomposition theorem, 364-365Prime mapping, 310-312Principal idempotents, 360Principal minor, 77Principal submatrix, 67-68, 77, 503Pseudoinverse, 182

Pythagorean theorem, 257

Q

Q-eigenvector,403QR factonzation, 269-278Quadratic form, 435Quadratic map, 431, 435Quotient formula, 195Quotient space, 324-327

R

Radical, 441Randomly generated matrices, I5

Rank, 111-117, 194Rank normal form, 162Rank one update, 21, 31Rank plus nullity theorem, 102Real valued function, 245Reflexive generalized inverses, 208Reidel's formula, 196-197Relative complement, 536Relative error, I I

Relative residual, 249-250Representable number, 10Residual vector, 249Reversal law, 21Right inverse, 148-154Rounding, IIRoundoff error, I IRow echelon form, 64Row echelon matrix, 64Row equivalence, 163Row index, 504Row rank, 100, 103Row rank equals column rank

theorem, 103Row reduced echelon form,

155-171Row reduction echelon form, 155Row space, 100Row view, 2Rows, 485

S

Scalar multiplication, 489-490Schur complement, 24-36, 194-198Schur decomposition, 373Schur determinant formulas, 35Schur triangularization, 376-377Schur's lemma, 372Segre characteristic, 396Segre sequence, 135-136Self-adjoint, 293-294Sherman-Mornson-Woodbury

formula, 24-36Signature, 445Similar, 372Simultaneously diagonable, 165Singular value decomposition,

377-387Skew-symmetric, 434Skew-symmetric bilinear forms,

450-452

hidex 547

Skew-symmetric form, 450-452Smith normal form, 422Solving a square system of full rank,

20-21Span,531-532Spectral theorem, 338-347Spectral theory, 329-350Spectrum, 329Square matrix, 128-148, 528-530Square root theorem, 347-350Square systems, 17-39Square theorem, 348Standard nilpotent matrix, 147-148Strictly upper triangular, 368Submatrices, 67, 77, 503-507Subspace, 99-154, 299-308Sum, 117-128Sum function, 504Sylvester's determinant formula, 195Sylvester's law of inertia, 444Sylvester's law of nullity, 113Sylvester's rank formula, I I ISymmetric bilinear forms, 442-447Symmetric form, 442-447, 450-452Symmetric matrices, 447-450System of complex numbers,

464-466

T

Tensor product, 452-457Theorem on elementary column

operations, 55Theorem on elementary row operations,

54Trace, 528-530Transition matrix, 541Transpose, 495-502Transvection, 51Triangular matrices, 63, 69Triangularization theorem, 369Tnlinear, 436Truncating, I IType 1,41,62Type 11, 41,62Type 111, 41,62

U

Underdetermined, 46, 48Underflow, 10

548

Unique polynomial, 57Uniqueness of inverse, 18Uniqueness of the group inverse,

231

Uniqueness theorem, 180-182Unit ball, 235Unit roundoff error, I IUnit sphere, 235-236Unit vector, 263Unitarily equivalent, 372Unitarily similar, 372Unitary, 270-272, 371-377Upper triangular, 63, 69

Index

V

Vandermonde determinant, 525Vector view, 2, 4VmodM, 324

W

Weyr sequence, 132-133

Z

Zero matrix, 14Zero row, 64

matrix theory

Documents

r i e t i e s

r o r i t t

i r o s

t i o n s r e s e

f r o n t i e r s i

t i c o n i

t e r f e r r e r

t e i n t e r p o