xi. linear algebra: m( ) atrices for derivativesswlt/mathecon11.pdf · strictly concave if and only...

Fall 2007 math class notes, page 80

XI. LINEAR ALGEBRA: MATRICES FOR DERIVATIVES

I always think of matrices as compact ways of expressing systems of equations, derivatives, observations, and such. Matrix algebra provides the tools for handling these large systems very efficiently, including matrices of derivatives. In the simplest case, think of a real-valued function of n variables,

f :R

n! R

1 . It has n partial derivatives. The column vector consisting of all these partial derivatives is the gradient vector,

!xf x( ) =

" f "x1

" f "x2

!

" f "xn

#

$

%%%%

&

'

((((

We talk about gradients only for functions into R1 . More generally, the matrix of first derivatives of a function f :Rn

! Rm , also called the Jacobian matrix, is:

f x( ) =

f1(x)

f2(x)

!

fm (x)

!

"

####

$

%

&&&&

' Dxf(x) =

( f1( x1

( f1( x2"

( f1( xn

( f2( x1

( f2( x2

( f2( xn

! # !

( fm( x1

( fm( x2"

( fm( xn

!

"

######

$

%

&&&&&&

=

)xf1(x)( )*

)xf2(x)( )*

!

)xfm (x)( )*

!

"

######

$

%

&&&&&&

The matrix of first derivatives is m ! n . Going back to the case for R1 -valued functions, you will often see instead of gradients, the “row vector” (and to those who claim there are no such thing, I say “1! nmatrix”) of partial derivatives:

Dxf(x) =

! f

! x1

! f

! x2!

! f

! xn"#

$% = &

xf(x)[ ]'

There only difference is geometric orientation.

Example: Let x p,w( ) = x1p1, p

2,w( ), x2 p

1, p

2,w( )( ) =

!wp1

,1"!( )w

p2

#

$%&

'(. What is

Dpx p,w( ) ?

Example: Now let x p,w( ) =w

p1

!1,p1

p2

"

#$%

&'. What is Dpx p,w( ) ?

The second important matrix in theory classes in the matrix of second-derivatives and cross-partials of a real-valued function of n variables,

f :R

n! R

1 . The Hessian matrix is the n ! n whose ij-th entry is !2 f !xi!x j . On the diagonal of this matrix,


we have second derivatives; off the diagonals, we have cross-partials. Since !2f !xi!x j = !

2f !x j!xi , the ij-th entry of the matrix is the same as the ji-th entry, so

the Hessian matrix is always symmetric.

f x( )! Dxf x( ) =

" f

" x1

" f

" x1

!

" f

" x1

#

$

%%%%%%

&

'

((((((

)

= *xf x( )( )

T! D

xx

2f x( ) =

" 2 f

" x1

2

" 2 f

" x1" x

2

"" 2 f

" x1" xn

" 2 f

" x1" x

2

" 2 f

" x2

2

" 2 f

" x2" xn

! # !

" 2 f

" x1" xn

" 2 f

" x2" xn"

" 2 f

" xn2

#

$

%%%%%%

&

'

((((((

= Dx*xf x( )

Hessian matrices are comparable to the second derivative of an R1! R

1 function, and they will always be used to test the concavity of a function of more than one variable. You should get very accustomed to finding Hessians.

Example: F K ,L( ) = AK!L1"! . Find D2

F K ,L( ) with respect to all the inputs.

Example: U x1, x

2( ) = x1

!+ x

2

!( )1 !

. Find Dxx

2U x

1, x

2( ) .

Example: U x1, x

2, x

3( ) = x1!x2

"x3

# and x p,w( ) = ! + " + #( )$1

!w p1,"w p

2,#w p

3( ) . Find V p,w( ) =U x p,w( )( ) and then Dpp2 V p,w( ) .

Example: U x1, x

2( ) = x1 + ln x2 and x p,w( ) = w p1!1 , p

2p1( ) . Find V p,w( ) and

thenDpp2 V p,w( ) .

The test for concavity or convexity of a function of one variable was whether the second derivative was negative or positive. We see that for functions of more than one variable, there is a matrix of second derivatives. How do we identify whether a matrix is positive or negative? The answer is not to look to look simply at each of the !2 f !xi

2 . It is also not sufficient to ensure that each individual element is negative.

Example: Is the function f x, y( ) = x2y2 convex? On first inspection, it looks like the product of two convex functions. It’s convex in each variable individually; that is, both (own) second derivatives are positive. The Hessian matrix of this function is:

D2f x, y( ) =

2y24xy

4xy 2x2

!

"#

$

%&

All the elements of this matrix are positive. And yet, consider this: at two points on the axes, f 0,1( ) = f 1,0( ) = 0 . At a convex combination of those two points, we find that 12 f 0,1( ) + 1

2 f 0( ) = 0 /! 116 = f 1

2 ,12( ) . This is not a convex function. (It looks like a

giant scalloped bowl.)


This demonstrates that we need a new way to define positive and negative for matrices, at least as far as second-derivative tests are concerned.

Definition: An n ! n matrix A is positive semidefinite if for all vectors x !Rn , the

number x !Ax " 0 .

Definition: An n ! n matrix A is negative semidefinite if for all vectors x !Rn ,

the number x !Ax " 0 .

Definition: An n ! n matrix A is positive definite if for all vectors x !Rn , x ! 0 ,

the number x !Ax > 0 .

Definition: An n ! n matrix A is positive definite if for all vectors x !Rn , x ! 0 , the

number x !Ax > 0 .

Any matrix that is not one of these is called indefinite. Note that if M is a scalar (which is really just a 1!1 matrix), these definitions correspond to the usual definitions of weakly or strictly positive or negative, since x2 changes the sign of nothing, provided x ! 0 .

A function is concave if and only if its Hessian matrix is negative semidefinite; strictly concave if and only if its Hessian matrix is negative definite. Whenever we are maximizing a function of more than one variable, we must find the Hessian matrix of the funtion and confirm that it is negative semidefinite in order to ensure that we have found a maximum.

The definitions are generally hard to work with, when doing these tests. There is a set of rules for determining the sign and definiteness of a matrix.

If we have an n ! n matrix A, a k-th order principal submatrix of A is a matrix that results from deleting n ! k rows and the same n ! k columns from A. If we have a 3! 3 matrix,

A =

a11

a12

a13

a21

a22

a23

a31

a32

a33

!

"

###

$

%

&&&

then we can form three second order principal submatrices: by deleting the first row and first column, by deleting the second row and second column, and by deleting the third row and third column:


a22

a32

a23

a33

!

"#

$

%& =

a11

a12

a13

a21

a22

a23

a31

a32

a33

!

"

###

$

%

&&&

a11

a13

a31

a33

!

"#

$

%& =

a11

a12

a13

a21

a22

a23

a31

a32

a33

!

"

###

$

%

&&&

a11

a12

a21

a22

!

"#

$

%& =

a11

a12

a13

a21

a22

a23

a31

a32

a33

!

"

###

$

%

&&&

The leading principal submatrices of A are only those principal submatrices formed by deleting the last n ! k rows and columns of matrix. For the 3! 3 matrix described above, the first, second, and third order leading principal submatrices are:

a11[ ] ,

a11

a12

a21

a22

!

"#

$

%& , and:

a11

a12

a13

a21

a22

a23

a31

a32

a33

!

"

###

$

%

&&&

The determinant of a k-th order principal submatrix is called a k-th order principal minor. Here is the relationship between the principle minors and the sign and definiteness of a matrix:

Theorem: An n ! n matrix A is positive definite if and only if all its n leading principal minors are strictly positive.

Theorem: An n ! n matrix A is positive semidefinite if and only if all its principal minors (not just leading!) are nonnegative.

Theorem: An n ! n matrix A is negative definite if and only if its n leading principal minors alternate in signs, with the sign of the k-th order leading principal minor equal to !1( )

k .

Theorem: An n ! n matrix A is negative semidefinite if and only if all its principal minors (not just leading!) of order k equal zero or have the sign of !1( )

k .

This is what you have to do to test the concavity or convexity of a function of several variables. Finding principal submatrices of two-by-two and three-by-three matrices isn’t terribly difficult. Once it starts getting to four and five and more dimensional, it’s a real pain. Unfortunately, there’s not really a better way to determine concavity.

Only a particularly sadistic professor would ask you to test the concavity of a function of more than three variables. Here are the simple rules for two and three variable cases (just for concavity).


Suppose your Hessian matrix is2 ! 2 :

A =a11

a12

a21

a22

!

"#

$

%&

To show that the function is strictly concave, you need to show that:

a11< 0 and

a11

a12

a21

a22

> 0

For just plain concavity, you to show that both of those hold with weak inequality, and additionally that a

22! 0 . For a function of three variables, you have a 3! 3 Hessian

matrix. For strict concavity, you need to confirm that the leading principal minors have the right signs:

a11< 0 ,

a11

a12

a21

a22

> 0 , and: a11

a12

a13

a21

a22

a23

a31

a32

a33

< 0

For plain concavity, you need to show that all the second derivatives are nonpositive:

a11! 0 , a

22! 0 , a

33! 0

And that all the second-order principal minors are nonnegative:

a22

a32

a23

a33

! 0 , a11

a13

a31

a33

! 0 , a11

a12

a21

a22

! 0

Finally, check that the determinant of the matrix itself is nonpositive.

Example: Let f x, y( ) = x2y2 . Check for concavity or convexity (strict or otherwise).

Example: Let f x, y( ) = xy . Check for concavity or convexity (strict or otherwise).

Example: Let f x, y( ) = x2 + y2 + xy . Check for concavity or convexity (strict or otherwise).

Example: LetF K ,L( ) = AK!L1"! . Find the matrix D

2F K ,L( ) and check for

concavity or strict concavity.

Example: LetU x1, x

2( ) = x1

!+ x

2

!( )1 !

. Find the matrix Dxx

2U x

1, x

2( ) and check for concavity or strict concavity.


Example: Let U x1, x

2, x

3( ) = x1!x2

"x3

# and x p,w( ) = !w

p1 ! +"+#( ),

"w

p2 ! +"+#( ),

# w

p2 ! +"+#( )( ) . Find

the matrix Dpp

2V p,w( ) and check for concavity or strict concavity.

Example: Let U x1, x

2( ) = x1 + ln x2 and x p,w( ) = w

p1!1,

p2

p1( ) . Find the matrix

Dpp

2V p,w( ) and check for concavity or strict concavity.

Okay, enough about concavity. Let’s talk briefly about eigenvalues and eigenvectors, which will be useful for checking the stability of systems of differential (or difference) equations.

In macro, you might have a matrix that describes how several variables in the economy evolve, for instance:

kt+1

mt+1

!

"#

$

%& =

a11

a12

a21

a22

!

"#

$

%&kt

mt

!

"#

$

%&

where kt and k

t+1 describe the capital stock of the country at time t and at t+1;

mtand m

t+1 are real money balances. The matrix A consists of constants, or likely as

not, linear first-order approximations of some functions (remember the Taylor series?). If we want to find the values of these two variables two years in the future, we would use the formula iteratively:

kt+2

mt+2

!

"#

$

%& =

a11

a12

a21

a22

!

"#

$

%&kt+1

mt+1

!

"#

$

%& =

a11

a12

a21

a22

!

"#

$

%&a11

a12

a21

a22

!

"#

$

%&kt

mt

!

"#

$

%&

And then to find the value of the variables n years into the future we just repeat this multiplication n times:

kt+n

mt+n

!

"#

$

%& =

a11

a12

a21

a22

!

"#

$

%&

n

kt

mt

!

"#

$

%& = A

nkt

mt

!

"#

$

%&

Does this settle down at some point, or do the variables keep growing forever? Remember from earlier that if we multiply any x ! "1,1( ) by itself a number of times, it gets really small, and:

xn! 0 as n!"

In macroeconomics we might be wondering something very similar, except that the number x has been replaces with a matrix A. In order to see whether this matrix converges or not, we have to look at its eigenvalues.


Given an n ! n matrix A, a scalar ! is called an eigenvalue or characteristic value of A if there exists a nonzero vector x !R

n (called the eigenvector or characteristic vector) such that:

Ax = !x

Here are some alternatives characterizations of an eigenvalue.

Theorem: The following statements are equivalent:

1. ! is an eigenvalue of A.

2. A ! "I( )x =

!0 has a solution other than x =

!0 .

3. A ! "I is singular

4. A ! "I = 0 .

Matrices often have multiple eigenvalues. In fact, almost all n ! n matrices have n distinct eigenvalues. Given that a matrix A has m eigenvalues

!1,!

2,…,!

m, the

following are true:

!mi=1

m

" = tr A( ) and: !m= A

i=1

m

"

The large capital pi indicates product over a bunch of variables, just as a large capital sigma indicates the sum. We can make a few observations (indirectly) from these properties. First, a square matrix A is invertible if and only if zero is not an eigenvalue of A. Second, if ! is an eigenvalue of A and A is invertible, then !"1 is an eigenvalue ofA!1 . Third, if A and B are both n ! n matrices, then the eigenvalues of AB are the same as those of BA.

The fourth characterization of an eigenvalue is usually the easiest to work with in order to solve for them. In the 2 ! 2 case, we have that:

A =a b

c d

!

"#

$

%& ' A ( )I = 0'

a ( ) b

c d ( )= 0

This means that the eigenvalues are roots to the quadratic equation:

a ! "( ) d ! "( ) ! cb = 0 # "2! a + d( )" + ad ! cb( ) = 0

Consulting Sydsæter, Strøm, and Berck (page seven!) for the quadratic formula, we can write the solutions to the eigenvalue problem as:

! = 12 tr A( ) ± tr A( )( )

2

" 4det A( )( )


This leads to a problem, that eigenvalues are not necessarily real numbers, since the term under the radical is not necessarily positive. What we are interested in is the modulus of any complex eigenvalues.

If we have a complex number z = x + yi , where x and y are real scalars, the modulus or magnitude of z is defined as z = x

2+ y

2 . This is like the length of the vector z in the complex plane. If a number is strictly real, then its modulus is its absolute value. All of this stuff is necessary for this result, important for testing stability.

Theorem: All eigenvalues of a square matrix A have moduli strictly less than zero if and only if An

! 0 asn!" .

Corollary: If A ! 1 or A ! 1 , then An does not converge.

This second result confirms the case when x is a scalar and x ! 0 , then xn does not converge. Again, we see some relationship between a determinant of a matrix and the absolute value of a scalar.

The last thing on the agenda for this lecture is to do a problem working with matrices. You will have to do this frequently in econometrics—in fact, this example is the fundamental principle behind linear regressions.

Somewhere out in the world, there is a relationship between some dependent variable yi and some other variables. We would like to describe this relationship as an affine function of m variables and a constant, more or less:

yi = !

0+ !

1xi1 + !

2xi2 +…+ !mxim + ei

Because we have n > m observations, we can’t exactly solve this system of equations (we have more equations than unknowns). Therefore, we’ll have to say that some of each value of yi is explained by some “outside, unobservable things” captured in the term e

i that have absolutely no bearing on anything we’re actually interested in. We

observe the values of each of the xij and the yi . The question is to find values of !k,

so that we can blame as little as possible of the outcome of “unobserved stuff” on the ei. First, though, let’s write this problem in matrix form:

y1

y2

!

yn

!

"

####

$

%

&&&&n'1

=

1 x11

x21" x

m1

1 x12

x22

xm2

1 ! # !

1 x1n

x2n" x

mn

!

"

####

$

%

&&&&n' m+1( )

(0

(1

!

(m

!

"

####

$

%

&&&&m+1( )'1

+

e1

e2

!

en

!

"

####

$

%

&&&&n'1

Then I rewrite this as:

Yn!1 = Xn! m+1( )" m+1( )!1 + en!1


The problem will be that we want to find the value of beta that minimizes the size of the vector e. Recall that for a vector z !R

k , its size or length is defined as:

z = z

1

2+ z

2

2+…+ z

k

2= !z z

Essentially, the problem is to solve:

min

!"Rm #e e( )

And thus begins our first exercise in working with matrices.

First of all, I remember that minimizing a function is the same thing as minimizing a strictly monotonic transformation of that function, so I square the objective. Also, I make a substitution.

min!"R

m #e e( ) = min!"Rm Y $ X!( )# Y $ X!( )%&

'(

Then I multiply out the term in parentheses, keeping in mind that the usual formula for squaring a term doesn’t work for matrices (since matrix multiplication is not commutative, right?).

min

!"Rm #Y Y $ #Y X! $ #! #X Y + #! #X X!( )

Since I have an objective function that I want to minimize with respect to the vector beta, I am going to take a derivative and set it equal to zero.

Here is a rule for taking the derivative of a linear function of a vector, when the vector is transposed:

!!z

"z M( ) =!! "z

"z M#$%

&'("

Then the first order condition for this problem is to find ! to solve:

!( "e e)

!#= $ "Y X $ "Y X + "# "X X + "# "X X = 0

Collecting terms and transposing, we want to find:

!2 "X Y + 2 "X X# = 0

!X X" = !X Y

!X X( )"1

!X X( )# = !X X( )"1

!X Y( )

! = "X X( )#1

"X Y( )


With a little matrix algebraic manipulation, we have derived the expression for the estimator of the coefficients in least-squares, linear regression model. At least, we’ve found a critical point—the proof that this is indeed a minimum is left to the reader.

In some ways, working with systems of equations written in matrix form is only a bit more complicated that working with a single variable.

xi. linear algebra: m( ) atrices for derivativesswlt/mathecon11.pdf · strictly concave if and only...

Documents