sensitivity analysis - stanford...

Sensitivity AnalysisAA222 - Multidisciplinary Design OptimizationJoaquim R. R. A. Martins • Durand 165 • email: [email protected]

1 Introduction

Sensitivity analysis consists in computing derivatives of one or more quantities(outputs) with respect to one or several independent variables (inputs). Al-though there are various uses for sensitivity information, our main motivationis the use of this information in gradient-based optimization. Since the calcula-tion of gradients is often the most costly step in the optimization cycle, usingefficient methods that accurately calculate sensitivities are extremely important.

There are a several different methods for sensitivity analysis but since noneof them is the clear choice for all cases, it is important to understand theirrelative merits. When choosing a method for computing sensitivities, one ismainly concerned with its accuracy and computational expense. In certaincases it is also important that the method be easily implemented. A methodwhich is efficient but difficult to implement may never be finalized, while aneasier, though computationally more costly method, would actually give someresult. Factors that affect the choice of method include: the ratio of the numberof outputs to the number of inputs, the importance of computational efficiencyand degree of laziness of the programmer.

Consider a general constrained optimization problem of the form:

minimize f(xi)w.r.t xi i = 1, 2, . . . , n

subject to gj(xi) ≥ 0, j = 1, 2, . . . ,m

where f is a non-linear function of n design variables xi and gj are the m non-linear inequality constraints we have to satisfy. In order to solve this problem,a gradient-based optimization algorithm usually requires:

• The sensitivities of the objective function, ∂f/∂xi (n× 1).

• The sensitivities of all the active constraints at the current design point∂gj/∂xi (m× n).

2 Finite-Differences

Finite-difference formulae are very commonly used to estimate sensitivities. Al-though these approximations are neither particularly accurate or efficient, thismethod’s biggest advantage resides in the fact that it is extremely easy to im-plement.

1

All the finite-differencing formulae can be derived by truncating a Taylor se-ries expanded about a given point x. A common estimate for the first derivativeis the forward-difference which can be derived from the expansion of f(x + h),

f(x + h) = f(x) + hf ′(x) +h2

2!f ′′(x) +

h3

3!f ′′′(x) + . . . (1)

Solving for f ′ we get the finite-difference formula,

f ′(x) =f(x + h)− f(x)

h+O(h), (2)

where h is called the finite-difference interval. The truncation error is O(h),and hence this is a first-order approximation.

For a second-order estimate we can use the expansion of f(x− h),

f(x− h) = f(x)− hf ′(x) +h2

2!f ′′(x)− h3

3!f ′′′(x) + . . . , (3)

and subtract it from the expansion given in Equation (1). The resulting equationcan then be solved for the derivative of f to obtain the central-difference formula,

f ′(x) =f(x + h)− f(x− h)

2h+O(h2). (4)

When estimating sensitivities using finite-difference formulae we are faced withthe “step-size dilemma”, i.e. the desire to choose a small step size to minimizetruncation error while avoiding the use of a step so small that errors due tosubtractive cancellation become dominant.

The cost of calculating sensitivities with finite-differences is proportional tothe number of design variables since f must be calculated for each perturbationof xi. This means that if we use forward differences, for example, the cost wouldbe n + 1 times the cost of calculating f .

3 The Complex-Step Derivative Approximation

3.1 Background

The use of complex variables to develop estimates of derivatives originated withthe work of Lyness and Moler [1] and Lyness [2]. Their work produced severalmethods that made use of complex variables, including a reliable method forcalculating the nth derivative of an analytic function. However, only recentlyhas some of this theory been rediscovered by Squire and Trapp [3] and used toobtain a very simple expression for estimating the first derivative. This estimateis suitable for use in modern numerical computing and has shown to be veryaccurate, extremely robust and surprisingly easy to implement, while retaininga reasonable computational cost.

2

3.2 Basic Theory

We will now see that a very simple formula for the first derivative of real func-tions can be obtained using complex calculus. Consider a function, f = u+iv, ofthe complex variable, z = x+iy. If f is analytic the Cauchy-Riemann equationsapply, i.e.,

∂u

∂x=

∂v

∂y(5)

∂u

∂y= −∂v

∂x. (6)

These equations establish the exact relationship between the real and imaginaryparts of the function. We can use the definition of a derivative in the right handside of the first Cauchy-Riemann Equation(5) to obtain,

∂u

∂x= lim

h→0

v(x + i(y + h))− v(x + iy)h

. (7)

where h is a small real number. Since the functions that we are interested in arereal functions of a real variable, we restrict ourselves to the real axis, in whichcase y = 0, u(x) = f(x) and v(x) = 0. Equation (7) can then be re-written as,

∂f

∂x= lim

h→0

Im [f (x + ih)]h

. (8)

For a small discrete h, this can be approximated by,

∂f

∂x≈ Im [f (x + ih)]

h. (9)

We will call this the complex-step derivative approximation. This estimate is notsubject to subtractive cancellation error, since it does not involve a differenceoperation. This constitutes a tremendous advantage over the finite-differenceapproaches expressed in Equation (2, 4).

In order to determine the error involved in this approximation, we will showan alternative derivation based on a Taylor series expansion. Rather than usinga real step h, we now use a pure imaginary step, ih. If f is a real function inreal variables and it is also analytic, we can expand it in a Taylor series abouta real point x as follows,

f(x + ih) = f(x) + ihf ′(x)− h2 f ′′(x)2!

− ih3 f ′′′(x)3!

+ . . . (10)

Taking the imaginary parts of both sides of Equation (10) and dividing theequation by h yields

f ′(x) =Im [f(x + ih)]

h+ h2 f ′′′(x)

3!+ . . . (11)

Hence the approximations is a O(h2) estimate of the derivative of f .

3

3.3 A Simple Numerical Example

Because the complex-step approximation does not involve a difference opera-tion, we can choose extremely small steps sizes with no loss of accuracy due tosubtractive cancellation.

To illustrate this, consider the following analytic function:

f(x) =ex

√sin3x + cos3x

(12)

The exact derivative at x = 1.5 was computed analytically to 16 digits andthen compared to the results given by the complex-step (9) and the forwardand central finite-difference approximations.

��

�� e

Figure 1: Relative error in the sensitivity estimates given by finite-differenceand the complex- step methods with the analytic result as the reference; ε =|f ′−f ′ref ||f ′ref | .

The forward-difference estimate initially converges to the exact result at alinear rate since its truncation error is O(h), while the central-difference con-verges quadratically, as expected. However, as the step is reduced below a valueof about 10−8 for the forward-difference and 10−5 for the central-difference, sub-tractive cancellation errors become significant and the estimates are unreliable.When the interval h is so small that no difference exists in the output (for stepssmaller than 10−16) the finite-difference estimates eventually yields zero andthen ε = 1.

The complex-step estimate converges quadratically with decreasing step size,as predicted by the truncation error estimate. The estimate is practically in-sensitive to small step sizes and below an h of the order of 10−8 it achieves the

4

accuracy of the function evaluation. Comparing the best accuracy of each ofthese approaches, we can see that by using finite-difference we only achieve afraction of the accuracy that is obtained by using the complex-step approxima-tion.

As we can see the complex-step size can be made extremely small. However,there is a lower limit on the step size when using finite precision arithmetic. Therange of real numbers that can be handled in numerical computing is dependenton the particular compiler that is used. In this case, the smallest non-zeronumber that can be represented is 10−308. If a number falls below this value,underflow occurs and the number drops to zero. Note that the estimate is stillaccurate down to a step of the order of 10−307. Below this, underflow occursand the estimate results in NaN. In general, the smallest possible h is the onebelow which underflow occurs somewhere in the algorithm.

When it comes to comparing the relative accuracy of complex and real com-putations, there is an increased error in basic arithmetic operations when usingcomplex numbers, more specifically when dividing and multiplying.

3.4 Complex Function Definitions

In the derivation of the complex-step derivative approximation (9) for a functionf we have assumed that f was an analytic function, i.e. that the Cauchy-Riemann equations apply. It is therefore important to examine to what extentthis assumption holds when the value of the function is calculated by a numericalalgorithm. In addition it is also useful to explain how we can convert realfunctions and operators such that they can take complex numbers as arguments.Fortunately, in the case of Fortran, complex numbers are a standard data typeand many intrinsic functions are already defined for them.

Any algorithm can be broken down into a sequence of basic operations.Two main types of operations are relevant when converting a real algorithm toa complex one:

• Relational operators

• Arithmetic functions and operators.

Relational logic operators such as “greater than” and “less than” are notdefined for complex numbers in Fortran. These operators are usually used inconjunction with if statements in order to redirect the execution thread. Theoriginal algorithm and its “complexified” version must obviously follow the sameexecution thread. Therefore, defining these operators to compare only the realparts of the arguments is the correct approach.

Functions that choose one argument such as max and min are based on re-lational operators. Therefore, according to our previous discussion, we shouldonce more choose a number based on its real part alone and let the imaginarypart “tag along”.

Any algorithm that uses conditional statements is likely to be a discontinuousfunction of its inputs. Either the function value itself is discontinuous or the

5

discontinuity is in the first or higher derivatives. When using a finite-differencemethod, the derivative estimate will be incorrect if the two function evaluationsare within h of the discontinuity location. However, if the complex-step is used,the resulting derivative estimate will be correct right up to the discontinuity. Atthe discontinuity, a derivative does not exist by definition, but if the function isdefined a that point, the approximation will still return a value that will dependon how the function is defined at that point.

Arithmetic functions and operators include addition, multiplication, andtrigonometric functions, to name only a few, and most of these have a standardcomplex definition that is analytic almost everywhere. Many of these definitionsare implemented in Fortran. Whether they are or not depends on the compilerand libraries that are used. The user should check the documentation of theparticular Fortran compiler being used in order to determine which intrinsicfunctions need to be redefined.

Functions of the complex variable are merely extensions of their real coun-terparts. By requiring that the extended function satisfy the Cauchy-Riemannequations, i.e. analyticity, and that its properties be the same as those of thereal function, we can obtain a unique complex function definition. Since thesecomplex functions are analytic, the complex-step approximation is valid andwill yield the correct result.

Some of the functions, however, have singularities or branch cuts on whichthey are not analytic. This does not pose a problem since, as previously ob-served, the complex-step approximation will return a correct one-sided deriva-tive. As for the case of a function that is not defined at a given point, thealgorithm will not return a function value, so a derivative cannot be obtained.However, the derivative estimate will be correct in the neighborhood of thediscontinuity.

The only standard complex function definition that is non-analytic is theabsolute value function or modulus. When the argument of this function is acomplex value, the function returns a positive real number, |z| =

√x2 + y2.

This function’s definition was not derived by imposing analyticity and thereforeit will not yield the correct derivative when using the complex-step estimate. Inorder to derive an analytic definition of abs we start by satisfying the Cauchy-Riemann equations. From the Equation (5), since we know what the value ofthe derivative must be, we can write,

∂u

∂x=

∂v

∂y=

{−1 ⇐ x < 0+1 ⇐ x > 0

. (13)

From Equation (6), since ∂v/∂x = 0 on the real axis, we get that ∂u/∂y = 0on the axis, so the real part of the result must be independent of the imaginarypart of the variable. Therefore, the new sign of the imaginary part depends onlyon the sign of the real part of the complex number, and an analytic “absolute

6

value” function can be defined as:

abs(x + iy) =

{−x− iy ⇐ x < 0+x + iy ⇐ x > 0

. (14)

Note that this is not analytic at x = 0 since a derivative does not exist for thereal absolute value. Once again, the complex-step approximation will give thecorrect value of the first derivative right up to the discontinuity. Later the x > 0condition will be substituted by x ≥ 0 so that we not only obtain a functionvalue for x = 0, but also we are able to calculate the correct right-hand-sidederivative at that point.

3.5 Implementation Procedure

The complex-step method can be implemented in many different programminglanguages. The following is a general procedure that applies to any languagewhich supports complex arithmetic:

1. Substitute all real type variable declarations with complex declarations.It is not strictly necessary to declare all variables complex, but it is mucheasier to do so.

2. Define all functions and operators that are not defined for complex argu-ments and re-define abs.

3. Change input and output statements if necessary.

4. A complex-step can then be added to the desired x and ∂f∂x can be esti-

mated using Equation (9).

The complex-step method can be implemented with of without operatoroverloading, but the latter results in a more elegant implementation.

Fortran: Fortunately, in Fortran 90, intrinsic functions and operators (includ-ing comparison operators) can be overloaded and this makes it possible touse the operator overloading type of implementation. This means that if aparticular function or operator does not take complex arguments, one canextend it by writing another definition that takes this type of arguments.This feature makes it much easier to implement the complex-step methodsince once we overload the functions and operators, there is no need tochange the function calls or conditional statements. The compiler willautomatically determine the argument type and choose the correct func-tion or operation. A module with the necessary definitions and a scriptthat converts the original source code automatically are available on theweb [9].

C/C++: Since C++ also supports overloading, the implementation is analo-gous to the Fortran one. An include file, contains the definition of a new

7

variable type called cmplx as well as all the functions that are necessaryfor the the complex-step method. The inclusion of this file and the re-placement of double or float declarations with cmplx is nearly all thatis required.

Matlab: As in the case of Fortran, one must redefine functions such as abs,max and min. All differentiable functions are defined for complex variables.Results for the simple example in the previous section were computed us-ing Matlab. The standard transpose operation represented by an apostro-phe (’) poses a problem as it takes the complex conjugate of the elementsof the matrix, so one should use the non-conjugate transpose representedby “dot apostrophe” (.’) instead.

Java: Complex arithmetic is not standardized at the moment but there areplans for its implementation. Although function overloading is possible,operator overloading is currently not supported.

Python: When using the Numerical Python module (NumPy), we have accessto complex number arithmetic and implementation is as straightforwardas in Matlab.

4 Algorithmic Differentiation

Algorithmic differentiation — also known as computational differentiation orautomatic differentiation — is a well known method based on the systematicapplication of the differentiation chain rule to computer programs. Althoughthis approach is as accurate an analytic method, it is potentially much easier toimplement since this can be done automatically.

4.1 How it Works

The method is based on the application of the chain rule of differentiation toeach operation in the program flow. The derivatives given by the chain rule canbe propagated forward (forward mode) or backwards (reverse mode).

When using the forward mode, for each intermediate variable in the algo-rithm, a variation due to one input variable is carried through. This is verysimilar to the way the complex-step method works. To illustrate this, supposewe want to differentiate the multiplication operation, f = x1x2, with respect tox1. Table 1 compares how the differentiation would be performed using eitheralgorithmic differentiation or the complex-step method. As we can see, algo-rithmic differentiation stores the derivative value in a separate set of variableswhile the complex step carries the derivative information in the imaginary partof the variables. It is shown that in this case, the complex-step method performsone additional operation — the calculation of the term h1h2 — which, for thepurposes of calculating the derivative is superfluous. The complex-step methodwill nearly always include these superfluous computations which correspond to

8

Algorithmic Complex-Step∆x1 = 1 h1 = 10−20

∆x2 = 0 h2 = 0f = x1x2 f = (x1 + ih1)(x2 + ih2)∆f = x1∆x2 + x2∆x1 f = x1x2 − h1h2 + i(x1h2 + x2h1)df/dx1 = ∆f df/dx1 = Im f/h

Table 1: The differentiation of the multiplication operation f = x1x2 withrespect to x1 using algorithmic differentiation and the complex-step derivativeapproximation.

the higher order terms in the Taylor series expansion of Equation (9). For verysmall h, when using finite precision arithmetic, these terms have no effect onthe real part of the result.

Although this example involves only one operation, both methods work foran algorithm involving an arbitrary sequence of operations by propagating thevariation of one input forward throughout the code. This means that in orderto calculate n derivatives, the differentiated code must be executed n times.

The other mode — the reverse mode — has no equivalent in the complex-stepmethod. When using the reverse mode, the code is executed forwards and thenbackwards to calculate derivatives of one output with respect to n inputs. Thetotal number of operations is independent of n, but the memory requirementsmay be prohibitive, especially for the case of large iterative algorithms.

There is nothing like an example, so we will now use both the forward andreverse modes to compute the derivatives of the function,

f(x1, x2) = x1x2 + sin(x1). (15)

The algorithm that would calculate this function is shown below, togetherwith the derivative calculation using the forward mode.

t1 = x1 ∆t1 = 1t2 = x2 ∆t2 = 0t3 = t1t2 ∆t3 = ∆t1t2 + t1∆t2

t4 = sin(t1) ∆t4 = ∆t1 cos(t1)t5 = t3 + t4 ∆t5 = ∆t3 + ∆t4

The reverse mode is also based on the chain rule. Let tj note all the inter-mediate variables in an algorithm that calculates f(xi). We set t1, . . . , tn tox1, . . . , xn and the last intermediate variable, tm to f . Then the chain rule canbe written as,

∂tj∂ti

=∑

k∈Kj

∂tj∂tk

∂tk∂ti

, i = 1, 2, . . . , m, (16)

9

t5

+

t4 t3

sin

t1 t2

x1 x2

×

Figure 2: Graph of the algorithm that calculates f(x1, x2) = x1x2 + sin(x1)

10

for j = m + 1, . . . , n to obtain the gradients of the intermediate and outputvariables. Kj denotes the set of indices k < j such that the variable tj in thecode depends explicitly on tk. In order to know in advance what these indicesare, we have to form the graph of the algorithm when it is first executed. Thisprovides information on the interdependence of all the intermediate variables.A graph for our sample algorithm is shown in Figure 2.

The sequence of calculations shown below corresponds to the application ofthe reverse mode to our simple function.∂t5∂t5

= 1

∂t5∂t4

=∂t5∂t4

= 1

∂t5∂t3

=∂t5∂t4

∂t4∂t3

+∂t5∂t3

∂t3∂t3

+ = 1 · 0 + 1 · 1 = 1

∂t5∂t2

=∂t5∂t3

∂t3∂t2

+∂t5∂t4

∂t4∂t2

= 1 · t1 + 1 · 0 = t1

∂t5∂t1

=∂t5∂t2

∂t2∂t1

+∂t5∂t3

∂t3∂t1

+∂t5∂t4

∂t4∂t1

= t1 · 0 + 1 · t2 + 1 · cos(t1) = t2 + cos(t1)

The following matrix, helps to visualize the sensitivities of all the variableswith respect to each other.

1 0 0 0 00 1 0 0 0

∂t3∂t1

∂t3∂t2

1 0 0∂t4∂t1

∂t4∂t2

∂t4∂t3

1 0∂t5∂t1

∂t5∂t2

∂t5∂t3

∂t5∂t4

1

(17)

In the case of the example we are considering we have:

1 0 0 0 00 1 0 0 0t2 t1 1 0 0cos(t1) 0 0 1 0t2 + cos(t1) t1 1 1 1

(18)

The cost of calculating the derivative of one output to many inputs is notproportional to the number of input but to the number of outputs. Since whenusing the reverse mode we need to store all the intermediate variables as well asthe complete graph of the algorithm, the amount of memory that is necessaryincreases dramatically. In the case of three-dimensional iterative solver, the costof using this mode can be prohibitive.

4.2 Existing Tools

There are two main methods for implementing algorithmic differentiation: bysource code transformation or by using derived datatypes and operator over-loading.

11

To implement algorithmic differentiation by source transformation, the wholesource code must be processed with a parser and all the derivative calculationsare introduced as additional lines of code. The resulting source code is greatlyenlarged and it becomes practically unreadable. This fact constitutes an imple-mentation disadvantage as it becomes impractical to debug this new extendedcode. One has to work with the original source, and every time it is changed (orif different derivatives are desired) one must rerun the parser before compilinga new version.

In order to use derived types, we need languages that support this feature,such as Fortran 90 or C++. To implement algorithmic differentiation usingthis feature, a new type of structure is created that contains both the valueand its derivative. All the existing operators are then re-defined (overloaded)for the new type. The new operator has exactly the same behavior as beforefor the value part of the new type, but uses the definition of the derivative ofthe operator to calculate the derivative portion. This results in a very elegantimplementation since very few changes are required in the original code.

Many tools for automatic algorithmic differentiation of programs in differentlanguages exist. They have been extensively developed and provide the userwith great functionality, including the calculation of higher-order derivativesand reverse mode options.

Fortran: Tools that use the source transformation approach include: AD-IFOR [11], TAMC, DAFOR, GRESS, Odysse and PADRE2. The nec-essary changes to the source code are made automatically. The deriveddatatype approach is used in the following tools: AD01, ADOL-F, IMASand OPTIMA90. Although it is in theory possible to have a script makethe necessary changes in the source code automatically, none of these toolshave this facility and the changes must be done manually.

C/C++: Established tools for automatic algorithmic differentiation also existfor C/C++[10]. These include include ADIC, an implementation mirror-ing ADIFOR, and ADOL-C, a free package that uses operator overloadingand can operate in the forward or reverse modes and compute higher orderderivatives.

References

[1] Lyness, J. N., and C. B. Moler,, “Numerical differentiation of analyticfunctions”, SIAM J. Numer. Anal., Vol. 4, 1967, pp. 202-210.

[2] Lyness, J. N., “Numerical algorithms based on the theory of complex vari-ables”, Proc. ACM 22nd Nat. Conf., Thompson Book Co., Washington DC,1967, pp. 124-134.

[3] Squire, W., and G. Trapp, “Using Complex Variables to Estimate Deriva-tives of Real Functions”, SIAM Review, Vol. 10, No. 1, March 1998, pp.100-112.

12

[4] Martins, J. R. R. A., I. M. Kroo, and J. J. Alonso “An Automated Methodfor Sensitivity Analysis using Complex Variables” Proceedings of the 38thAerospace Sciences Meeting, AIAA Paper 2000-0689. Reno, NV, January2000.

[5] Martins, J. R. R. A. and P. Sturdza “The Connection Between theComplex-Step Derivative Approximation and Algorithmic Differentiation”Proceedings of the 39th Aerospace Sciences Meeting, Reno, NV, January2001. AIAA Paper 2001-0921.

[6] Anderson, W. K., J. C. Newman, D. L. Whitfield, E. J. Nielsen, “SensitivityAnalysis for the Navier-Stokes Equations on Unstructured Meshes UsingComplex Variables”, AIAA Paper No. 99-3294, Proceedings of the 17thApplied Aerodynamics Conference, 28 Jun. 1999.

[7] Newman, J. C. , W. K. Anderson, D. L. Whitfield, “Multidisciplinary Sensi-tivity Derivatives Using Complex Variables”, MSSU-COE-ERC-98-08, Jul.1998.

[8] http://www.python.org

[9] http://aero-comlab.stanford.edu/jmartins

[10] http://www.sc.rwth-aachen.de/Research /AD/subject.html

[11] Bischof, C., A. Carle, G. Corliss, A. Grienwank, P. Hoveland, “ADIFOR:Generating Derivative Codes from Fortran Programs”, Scientific Program-ming, Vol. 1, No. 1, 1992, pp. 11-29.

13

5 Analytic Sensitivity Analysis

Analytic methods are the most accurate and efficient methods available forsensitivity analysis. They are, however, more involved than the other methodswe have seen so far since they require the knowledge of the governing equationsand the algorithm that is used to solve those equations. In this section we willlearn how to compute analytic sensitivities with direct and adjoint methods.We will start with single discipline systems and then generalize for the case ofmultiple systems such as we would encounter in MDO.

5.1 Single Systems

5.1.1 Notation

fi function of interest/output i = 1, . . . , nf

Rk′ residuals of governing equation, k′ = 1, . . . , nR

xj design/independent/input variables, j = 1, . . . , nx

yk state variables, k = 1, . . . , nR

ψk adjoint vector, k = 1, . . . , nR

5.1.2 Basic Equations

Consider the residuals of the governing equations of a given system,

Rk′ (xj , yk (xj)) = 0 (19)

where xj are the independent variables (the design variables) and yk are thestate variables that depend on the independent ones through the solution of thegoverning equations. Note that the number of equations must equal the numberof unknowns (the state variables.)

Any perturbation in the variables this system of equations must result inno variation of the residuals, if the governing equations are to be satisfied.Therefore, we can write,

δRk′ = 0 ⇒ ∂Rk′

∂xjδxj +

∂Rk′

∂ykδyk = 0. (20)

since there is a variation due the change in the design variables as well as avariation due to the change in the state vector. This equation applies to allk = 1, . . . , nR and j = 1, . . . , nx. Dividing the equation by δxj , we can get itin another form which involves the total derivative dyk/dxj ,

∂Rk′

∂xj+

∂Rk′

∂yk

dyk

dxj= 0. (21)

Our final objective is to obtain the sensitivity of the “function of interest”,fi, which can be the objective function or a set of constraints. The function f

14

also depends on both xj and yk and hence the total variation of fi is,

δfi =∂fi

∂xjδxj +

∂fi

∂ykδyk. (22)

Note that δyk cannot be found explicitly since yk varies implicitly with respectto xj through the solution of the governing equations. We can also divide thisequation by δxj to get the alternate form,

dfi

dxj=

∂f

∂xj+

∂f

∂yk

dyk

dxj, (23)

where i = 1, . . . , nx and k = 1, . . . , nR. The first term on the right-hand-siderepresents the explicit variation of the function of interest with respect to thedesign variables due to the presence of these variables in the expression for fi.The second term the variation of the function due to the change of the statevariables when the governing equations are solved.

x

fy

R = 0

Figure 3: Schematic representation of the governing equations (R = 0), designvariables or inputs (xj), state variables (yk) and the function of interest oroutput (fi).

A graphical representation of the system of governing equations is shown inFigure 3 with the input variables xj as the input and fi as the output. The twoarrows leading to fi illustrate the fact that fi depends on xj not also explicitlybut also through the state variables.

The following two sections describe two different ways of calculating dfi/dxj

which is what we need to perform gradient-based optimization.

15

5.1.3 Direct Sensitivity Equations

The direct approach first calculates the total variation of the state variables,yk, by solving the differentiated governing equation (23) for dyk/dxj , the totalderivative of the state variables with respect to a given design variable. Thismeans solving the linear system of equations,

∂Rk′

∂yk

dyk

dxj= −∂Rk′

∂xj. (24)

The solution procedure usually involves factorizing the square matrix, ∂Rk′/∂yk

and then back-solve to obtain the solution. Note that we have to chose one xj

each time we back-solve, since the right-hand-side vector is different for each j.We can then use the result for dyk/dxj and substitute it in equation (23) to

get dfi/dxj for all i = 1, . . . , nf .

5.1.4 Adjoint Sensitivity Equations

The adjoint approach adjoins the variations of the governing equations (20) andthe function of interest (22),

δfi =∂fi

∂xjδxj +

∂fi

∂ykδyk + ψT

k

(∂Rk′

∂xjδxj +

∂Rk′

∂ykδyk

)

︸︷︷︸δRk′=0

(25)

where ψk is the adjoint vector. The values of the components of this vector arearbitrary because we only consider variations for which the governing equationsare satisfied, i.e., δRk′ = 0. If we collect the terms multiplying each of thevariations we obtain,

δfi =(

∂fi

∂xj+ ψT

k

∂Rk′

∂xj

)δxj +

(∂fi

∂yk+ ψT

k

∂Rk′

∂yk

)δyk (26)

Since ψk is arbitrary, we can chose its values to be those for which the termmultiplying by δyk is zero, i.e., if we solve,

ψTk

∂Rk′

∂yk= − ∂fi

∂yk⇒

[∂Rk′

∂yk

]T

ψk = −[

∂fi

∂yk

]T

(27)

for the adjoint vector ψk. An adjoint vector is the same for any xj , but it isdifferent for each fi.

The term in equation (26) that is multiplied by δxj corresponds to the totalderivative of fi with respect to xj , i.e.,

dfi

dxj=

∂fi

∂xj+ ψT

k

dRk′

dxj, (28)

which are the sensitivities we want to calculate.

16

5.1.5 Direct vs. Adjoint

In the previous two sections, the direct and adjoint sensitivity equations wereboth derived independently from the same two equations (20, 22). We will nowattempt to unify the derivation of these two methods by expressing them in thesame equation. This will help us gain a better understanding on how these twoapproaches are related.

If we want to solve for the total sensitivity of the state variables with respectto the design variables, we would have to solve equation (24). Assuming that wehave the sensitivity matrix of the residuals with respect to the state variables,∂Rk′/∂yk, and that it is invertible, the solution is,

dyk

dxj= −

[∂Rk′

∂yk

]−1∂Rk′

∂xj. (29)

Note that the matrix of partial derivatives of the residuals with respect to thestate variables, ∂Rk′/∂yk, is square, since the number of governing equationsmust equal the number of state variables.

Substituting equation (29) into the expression for the total derivative of thefunction of interest (23) to get,

dfi

dxj=

∂fi

∂xj− ∂fi

∂yk

[∂Rk′

∂yk

]−1∂Rk′

∂xj︸︷︷︸dyk/dxj

. (30)

Both the direct and adjoint methods can be seen in this equation. Using thedirect method we would start by solving for the term shown under-braced inequation (30), i.e., the solution of,

∂Rk′

∂yk

dyk

dxj= −∂Rk′

∂xj, (31)

which is the total sensitivity of the state variables. Note that each set of thesetotal sensitivities is valid for only one design variable, xj . Once we have thesesensitivities, we can use this result in equation (30), i.e.,

dfi

dxj=

∂fi

∂xj+

∂fi

∂yk

dyk

dxj(32)

to get the desired sensitivities.To use the adjoint method, we would define the adjoint vector as shown

below,

dfi

dxj=

∂fi

∂xj− ∂fi

∂yk

[∂Rk′

∂yk

]−1

︸︷︷︸ψT

k

∂Rk′

∂xj(33)

17

Step Direct AdjointFactorization same sameBack-solve nx times nf timesMultiplication same same

The adjoint vector is then the solution of the system,[∂Rk′

∂yk

]T

ψk =∂fi

∂yk(34)

where we have to solve for each fi. The adjoint vector is the same for any chosendesign variable since j does not appear in the equation. We can substitute theresulting adjoint vector into equation (33) to get,

dfi

dxj=

∂fi

∂xj− ψT

k

∂Rk′

∂xj. (35)

Unlike the direct method, where each dyk/dxj can be used for any function fi,we must compute a different adjoint vector ψk for each function of interest.

A comparison of the cost of computing sensitivities with the direct versusadjoint methods is shown in Table 5.1.5. With either method, we must factorizethe same matrix, ∂Rk′/∂yk. The difference in the cost comes form the back-solve step for solving equations (31) and (34) respectively. The direct methodrequires that we perform this step for each design variable (i.e. for each j) whilethe adjoint method requires this to be done for each function of interest (i.e. foreach i). The multiplication step is simply the calculation of the final sensitivityexpressed in equations (32) and (35) respectively. The cost involved in thisstep when computing the same set of sensitivities is the same for both methods.The final conclusion is the established rule that if the number of design variables(inputs) is greater than the number of functions of interest (output), the adjointmethod is more efficient than the direct method and vice-versa. If the numberof outputs is similar to the number of inputs, either method will be costly.

In this discussion, we have assumed that the governing equations have beendiscretized. The same kind of procedure can be applied to continuous governingequations. The principle is the same, but the notation would have to be moregeneral. The equations, in the end, have to be discretized in order to be solvednumerically. Figure 4 shows the two ways of arriving at the discrete sensitiv-ity equations. We can either differentiate the continuous governing equationsfirst and then discretize them, or discretize the governing equations and differ-entiate them in the second step. The resulting sensitivity equations should beequivalent, but are not necessarily the same. Differentiating the continuous gov-erning equations first is usually more involved. In addition, applying boundaryconditions to the differentiated equations can be non-intuitive as some of theseboundary conditions are non-physical.

because the boundary conditions of the continuous sensitivity equations arenon-physical.

18

ContinuousGoverningEquations

ContinuousSensitivityEquations

DiscreteGoverningEquations

DiscreteSensitivityEquations 2

DiscreteSensitivityEquations 1

Figure 4: The two ways of obtaining the discretized sensitivity equations

5.2 Example: Structural Sensitivity Analysis

The discretized governing equations for a finite-element structural model is,

Rk′ = Kk′kuk − fk = 0, (36)

where Kk′k is the stiffness matrix, uk is the vector of displacement (the state)and fk is the vector of applied force (not to be confused with the function ofinterest from the previous section!).

We are interested in finding the sensitivities of the stress, which is relatedto the displacements by the equation,

σi = Sikuk. (37)

We will consider the design variables to be the cross-sectional areas of the ele-ments, Aj . We will now look at the terms that we need to use the generalizedtotal sensitivity equation (30).

For the matrix of sensitivities of the governing equations with respect to thestate variables we find that it is simply the stiffness matrix, i.e.,

∂Rk′

∂yk=

∂(Kk′kuk − fk)∂uk

= Kk′k. (38)

Let’s consider the sensitivity of the residuals with respect to the design variables(cross-sectional areas in our case). Neither the displacements of the appliedforces vary explicitly with the element sizes. The only term that depends on Aj

directly is the stiffness matrix, so we get,

∂Rk′

∂xj=

∂(Kk′kuk − fk)∂Aj

=∂Kk′k

∂Ajuk (39)

The partial derivative of the stress with respect to the displacements is simplygiven by the matrix in equation (37), i.e.,

∂fi

∂yk=

∂σi

∂uk= Sik (40)

19

Finally, the explicit variation of stress with respect to the cross-sectional areasis zero, since the stresses depends only on the displacement field,

∂fi

∂xj=

∂σi

∂Aj= 0. (41)

Substituting these into the generalized total sensitivity equation (30) we get:

dσi

dAj= − ∂σi

∂ukK−1

k′k∂Kk′k

∂xjuk (42)

Referring to the theory presented previously, if we were to use the direct method,we would solve,

Kk′kduk

dAj= −∂Kk′k

∂Ajuk (43)

and then substitute the result in,

dσi

dAj=

∂σi

∂Aj+

∂σi

∂uk

duk

dAj(44)

to calculate the desired sensitivities.The adjoint method could also be used, in which case we would solve equa-

tion (34) for the structures case,

KTk′kψk =

∂σi

∂uk. (45)

Then we would substitute the adjoint vector into the equation,

dσi

dxj=

∂σi

∂xj+ ψT

k

(−∂Kk′k

∂Ajuk

). (46)

to calculate the desired sensitivities.

5.3 Multidisciplinary Sensitivity Analysis

The analysis done in the previous section for single discipline systems can begeneralized for multiple, coupled systems. The same total sensitivity equation(30) applies, but now governing equations and state variables of all disciplinesare included in R and y respectively.

To illustrate this, consider for example a coupled aero-structural systemswere both aerodynamic (A) and structural (S) analysis are involved and thestate variables are the flow state, w and the structural displacements, u. Figure 5shows how such a system is coupled. equation (31) can then be written for thiscoupled system as,

[∂RA

∂w∂RA

∂u∂RS

∂w∂RS

∂u

] [dwdxjdudxj

]= −

[∂RA

∂xj∂RS

∂xj

](47)

20

x

R = 0R = 0A S

f

u

w

Figure 5: Schematic representation of the aero-structural governing equations.

In addition to diagonal terms of the matrix which would appear when solvingthe single systems we have cross terms expressing the sensitivity of one systemto the other’s state variables. These equations are sometimes called the GlobalSensitivity Equations (GSE) [4].

Similarly, we can write a coupled adjoint based on equation (34),

[∂RA

∂w∂RA

∂u∂RS

∂w∂RS

∂u

]T [ψA

ψS

]=

[∂fi

∂w∂fi

∂u

](48)

In addition to the GSE expressed in equation (47), Sobieski [4] also intro-duced an alternative method which he called GSE2 for calculating total sen-sitivities. Instead of looking at the residual variation, see variation in state,yk(x, yk′(x)), where k′ 6= k.

δyk =∂yk

∂xjδxj +

∂yk

∂yk′

dyk′

dxjδxj (49)

Dividing this by δyk, we get,

dyk

dxj=

∂yk

∂xj+

∂yk

∂yk′

dyk′

dxj(50)

21

For all k,

−δkk′∂yk

∂yk′

dyk′

dxj= −∂yk

∂xj(51)

Writing this in matrix form for the two discipline example we get,

[ −I ∂w∂u

∂u∂w −I

] [dwdxjdudxj

]= −

[∂w∂xj∂u∂xj

]. (52)

One advantage of this formulation is that the size of the matrix to be factorizedmight be reduced. This is due to the fact that the state variables of one systemnot always depend on all the state variables of the other system. For example,in the case of the coupled aero-structural system, only the surface aerodynamicpressures affect the structural analysis and we could substitute all the flow state,w, by a much smaller vector of surface pressures. Similarly, we could use onlythe surface structural displacements rather than all of them, since only theseinfluence the aerodynamics.

An adjoint version of this alternative can also be derived and the system tobe solved is for this case,

[ −I ∂w∂u

∂u∂w −I

]T [ψA

ψS

]=

[∂fi

∂w∂fi

∂u

](53)

Since factorizing the full residual sensitivity matrix is in many cases imprac-tical, the method can be slightly modified as follows. Equation (48) can bere-written as,

[∂RA

∂w

T ∂RS

∂w

T

∂RA

∂u

T ∂RS

∂u

T

] [ψA

ψS

]=

[∂fi

∂w∂fi

∂u

](54)

Since the factorization of the matrix in equation ( 54) would be extremely costly,we decided to set up an iterative procedure, much like the one used for our aero-structural solution, where the adjoint vectors are lagged and two different setsof equations are solved separately. For the calculation of the adjoint vector ofone discipline, we use the adjoint vector of the other discipline from the previousiteration, i.e., we would solve,

[∂RA

∂w

]T

ψA =∂fi

∂w−

[∂RS

∂w

]T

ψ̃S (55)

[∂RS

∂u

]T

ψS =∂fi

∂u−

[∂RA

∂u

]T

ψ̃A (56)

whose final result, after convergence, is be the same as equation (48). We willcall this the lagged coupled adjoint method for computing sensitivities of coupled

22

systems. Note that these equations look like the single discipline ones for theaerodynamic and structural adjoint, except that a forcing term is subtractedfrom the right-hand-side.

Once the solution for both of the adjoint vectors have converged, we are ableto compute the final sensitivities of a given cost function by using,

dfi

dxj=

∂fi

∂xj− ψT

A

∂RA

∂xj− ψT

S

∂RS

∂xj. (57)

References

[1] Adelman, H. M., R. T. Haftka, “Sensitivity Analysis of Discrete StructuralSystems”, AIAA Journal, Vol. 24, No. 5, May 1986.

[2] Barthelemy, B., R. T. Haftka and G. A. Cohen, “Physically Based Sensi-tivity Derivatives for Structural Analysis Programs”, Computational Me-chanics, pp. 465-476, Springer-Verlag, 1989.

[3] Belegundu, A. D. and J. S. Arora, “Sensitivity Interpretation of AdjointVariables in Optimal Design”, Computer Methods in Applied Mechanicsand Engineering, Vol. 48, pp. 81-90, 1985.

[4] Sobieszczanski-Sobieski, J., “Sensitivity of Complex, Internally CoupledSystems”, AIAA Journal, Vol. 28, No. 1., January 1990.

[5] Hajela, P., C. L. Bloebaum and J. Sobieszczanski-Sobieski, “Applicationof Global Sensitivity Equations in Multidisciplinary Aircraft Synthesis”,Journal of Aircraft, Vol. 27, No. 12., pp. 1002-1010, December 1990.

23

sensitivity analysis - stanford...

Documents