automatic differentiation - mcmaster universitycs777/presentations/ad.pdf · automatic...
TRANSCRIPT
Automatic Differentiation
Automatic Differentiation
Hamid Reza Ghaffari , Jonathan Li, Yang Li, Zhenghua Nie
Instructor: Prof. Tamas TerlakySchool of Computational Engineering and School
McMaster University
March. 23, 2007
Automatic Differentiation
Outline
1 Introductions2 Forward and Reverse Mode
Forward methodsReverse methodsComparisonExtended knowledgeCase Study
3 Complexity AnalysisForward ModeComplexityReverse ModeComplexity
4 AD SoftwaresAD tools in MATLABAD in C/C++ (ADIC)
DevelopersintroductionADIS AnatomyADICProcessExampleHandling Side EffectsReferences
Automatic Differentiation
Introductions
Why Do we Need Derivatives?
Optimization via gradient method.
Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].
Solution of Nonlinear Equations f (x) = 0 by NewtonMethod
xn+1 = xn −[∂f (xn)
∂x
]−1
f (xn)
requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......
Automatic Differentiation
Introductions
Why Do we Need Derivatives?
Optimization via gradient method.
Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].
Solution of Nonlinear Equations f (x) = 0 by NewtonMethod
xn+1 = xn −[∂f (xn)
∂x
]−1
f (xn)
requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......
Automatic Differentiation
Introductions
Why Do we Need Derivatives?
Optimization via gradient method.
Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].
Solution of Nonlinear Equations f (x) = 0 by NewtonMethod
xn+1 = xn −[∂f (xn)
∂x
]−1
f (xn)
requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......
Automatic Differentiation
Introductions
How Do We Obtain Derivatives?
Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.
Automatic Differentiation
Introductions
How Do We Obtain Derivatives?
Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.
Automatic Differentiation
Introductions
How Do We Obtain Derivatives?
Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.
Automatic Differentiation
Introductions
Main Approaches
Hand CodingDivided DifferencesSymbolic DifferentiationAutomatic Differentiation
Automatic Differentiation
Introductions
Hand Coding
An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.
AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.
DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.
Automatic Differentiation
Introductions
Hand Coding
An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.
AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.
DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.
Automatic Differentiation
Introductions
Hand Coding
An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.
AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.
DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.
Automatic Differentiation
Introductions
Hand Coding
An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.
AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.
DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.
Automatic Differentiation
Introductions
Hand Coding
An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.
AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.
DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.
Automatic Differentiation
Introductions
Hand Coding
An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.
AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.
DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.
Automatic Differentiation
Introductions
Divided Differences
Approximate the derivative of a function f w.r.t the i thcomponent of x at a particular point x0 by differencenumerically, e.g
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
where ei is the i th Cartesian unit vector.
Automatic Differentiation
Introductions
Divided Differences(Ctd.)
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize
Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )
Automatic Differentiation
Introductions
Divided Differences(Ctd.)
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize
Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )
Automatic Differentiation
Introductions
Divided Differences(Ctd.)
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize
Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )
Automatic Differentiation
Introductions
Divided Differences(Ctd.)
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize
Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )
Automatic Differentiation
Introductions
Divided Differences(Ctd.)
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize
Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )
Automatic Differentiation
Introductions
Divided Differences(Ctd.)
∂f (x)
∂xi
∣∣∣∣x0
≈ f (x0 + hei)− f (x0)
h
Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize
Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )
Automatic Differentiation
Introductions
Symbolic Differentiation
Find an explicit derivative expression by computer algebrasystems.
Disadvantages:The length of the representation of the resulting derivativeexpressions increases rapidly with the number, n, ofindependent variables;Inefficient in terms of computing time due to the rapidgrowth of the underlying expressions;Unable to deal with constructs such as branches, loops, orsubroutines that are inherent in computer codes.
Automatic Differentiation
Introductions
Automatic Differentiation
What is Automatic Differentiation?Algorithmic, or automatic, differentiation (AD) is concernedwith the accurate and efficient evaluation of derivatives forfunctions defined by computer programs. No truncationerrors are incurred, and the resulting numerical derivativevalues can be used for all scientific computations that arebased on linear, quadratic, or even higher orderapproximations to nonlinear scalar or vector functions.
Automatic Differentiation
Introductions
Automatic Differentiation (Cont.)
What’s the idea behind Automatic Differentiation?Automatic differentiation techniques rely on the fact thatevery function no matter how complicated is executed on acomputer as a (potentially very long) sequence ofelementary operations such as additions, multiplications,and elementary functions such as sin and cos. Byrepeated application of the chain rule of derivative calculusto the composition of those elementary operations, onecan computes in a completely mechanical fashion.
Automatic Differentiation
Introductions
How good AD is?
ReliabilityAccurate to machine precision, no truncation error exists.Computational CostForward Mode: 2 ∼ 3n × cost(f )Reverse Mode: 5× cost(f )Human EffortSpend less time in preparing a code for differentiation, inparticular in situations where computer models are boundto change frequently.
Automatic Differentiation
Introductions
How widely is AD used?
Sensitivity Analysis of a Mesoscale Weather ModelApplication Area: Climate ModelingData assimilation for ocean circulationApplication Area: OceanographyIntensity Modulated Radiation TherapyApplication Area: BiomedicineMultidisciplinary Design of AircraftApplication Area: Computational Fluid DynamicsThe NEOS serverApplication Area: Optimization......
Source: http://www.autodiff.org/?module=Applications&submenu=& category=all
Automatic Differentiation
Forward and Reverse Mode
AD methods : SimpleExample
Automatic Differentiation
Forward and Reverse Mode
SimpleExample
Unify all the variable..
Automatic Differentiation
Forward and Reverse Mode
Forward method
Forward methodDifferentiate the Code:
ui = xi i = 1, ...n,
ui = Φ({uj}j<i) i = n + 1, ..., N
Differentiate:∇ui = ei i = 1, ..., n
∇ui =∑j<i
ci,j ∗ ∇uj i = n + 1, ..., N
Automatic Differentiation
Forward and Reverse Mode
Reverse method
Reverse methodCompute the Adjoint of the Code
uj =∂y∂uj
=∂(y1, y2, ..ym)
∂uj
Compute for dependent variables
un+p+j =∂(y1, y2, ..ym)
∂uj= ej j = 1, ..., m
Compute for intermediates and independents uj , j = n + p, ..., 1
uj =∂y∂uj
=∑i>j
uici,j
Automatic Differentiation
Forward and Reverse Mode
Forward methods
Forward methods
Forward methodMethod : Compute the gradient of each variable, and usethe chain rule to pass the gradientThe size of computed object: In each computation, itcomputes the vectors with input size n.The computation of gradient of each variable proceedswith the computation of each variableEasily implement
Automatic Differentiation
Forward and Reverse Mode
Forward methods
Forward methods
Computing Variable Value Computing Gradient Value
Automatic Differentiation
Forward and Reverse Mode
Reverse methods
Reverse methods
Reverse methodMethod : Compute Adjoint of each variable, pass theAdjointThe size of computed object: In each computation, itcomputes the vectors with output size m. (Note,usually theoutput size is 1 in optimization application.)The computation of Adjoint of each variable proceed afterthe completion of the computation of all variables.
Automatic Differentiation
Forward and Reverse Mode
Reverse methods
Reverse methods
Reverse methodTraverse through the Computational Graph reversely andget the parents of each variable so as to compute theAdjoint.Obtain the gradient by compute each partial deriviate oneby oneHarder to implement
Automatic Differentiation
Forward and Reverse Mode
Reverse methods
Reverse methods
Computing Variable Value Computing Adjoint Value
Automatic Differentiation
Forward and Reverse Mode
Reverse methods
Implementation of Reverse mode
Implementation of Reverse modeAs mentioned above, the implementation in Forward modeis relatively straightforward. We only propose thecomparison of important feature between SourceTransformation and Operator Overloading:Using Source Transformation: Re-ordering the code upsidedownUsing Operator Overloading: Record computation on a"tape"
Automatic Differentiation
Forward and Reverse Mode
Reverse methods
Implementation of Reverse mode
Re-ordering the code upside down:
Automatic Differentiation
Forward and Reverse Mode
Reverse methods
Implementation of Reverse mode
Record computation on a "tape"Record:Operation,operandsRelated technique: CheckpointingIf the number of operations going large, Checkpointingprevent the program from exhausting all the memory
Automatic Differentiation
Forward and Reverse Mode
Comparison
Comparison
The following topic is discussed in the comparisonbetween Forward mode and backward modeComputational ComplexityMemory RequiredTime to develop
Automatic Differentiation
Forward and Reverse Mode
Comparison
Cost of Forward Propagation of Derivs.
Define{
N|c|=1 : No. of unit local derivatives ci,j = ±1N|c|6=1 : No. of nonunit local derivatives ci,j 6= 0, ±1
Solve for derivatives in forward order 5un+1,5un+2, . . . ,5uN
5ui =∑j≺i
ci,j ∗ 5uj , i = n + 1, . . . , N,
with each 5ui = (∂ui/∂x1, . . . , ∂ui/∂xn), a length n vector.Flop count flops(fwd) given by,
flops(fwd) = nN|c|6=1 (mults.ci,j ∗ 5uj , ci,j 6= 1, 0)+n(N|c|6=1 + N|c|=1) (adds./subs. + ci,j 5 uj)−n(p + m) (first n adds./subs.)
flops(fwd) = n(2N|c|6=1 + N|c|=1 − p −m)
Automatic Differentiation
Forward and Reverse Mode
Comparison
Cost of Reverse Propagation of Adjoints
Solve for adjoints in reverse order un+p, un+p−1, . . . , u1
uj =∑i�j
uici,j .
with uj = ∂∂uj
(y1, y2, . . . , ym) is a length m vector.
Flop count flops(rev) given by,
flops(rev) = mN|c|6=1 (mults.ui ∗ ci,j , ci,j 6= ±1, 0)= +m(N|c|=1 + N|c|6=1) (adds./subs. + (ui ∗ ci,j))
flops(rev) = m(2N|c|6=1 + N|c|=1).
Automatic Differentiation
Forward and Reverse Mode
Comparison
Memory Required
Used Storage:It’s uncertain that which mode takes more memory,usually, reverse mode takes more.The cost of memory for Forward mode is from:Storing size (1) in each variableStoring input size n in each gradient variableThe cost of memory for Reverse mode is from:Storing size (1) in each variableStoring output size m in each Adjoint variableStoring DAG(directed acyclic graph,which present thefunction)
Automatic Differentiation
Forward and Reverse Mode
Comparison
Memory Required
It’s more likely to have less memory used while usingforward mode:1.If there exists reused variable in original function2.If n is so large that Reverse requires lots of memory tostore DAG.It’s more likely to have less memory used while usingreverse mode:1.If n is relatively large, so the storage required for storinggradient is more than storing Adjoint
Automatic Differentiation
Forward and Reverse Mode
Comparison
Time to develop
Time to develop: Usually, it’s hard to develop Reversecode than Forward one, especially using SourceTransformation technique.
Automatic Differentiation
Forward and Reverse Mode
Comparison
Time to develop
Conclusion:Using Forward mode when n � m, such as optimizationUsing Reverse mode when m � n, such as SensitivityAnalysis
Automatic Differentiation
Forward and Reverse Mode
Extended knowledge
Extended knowledge
Directional DerivativesForward mode:seed d = (d1, ...dn)Tseeding ∇xi = dicalculates Jf ∗ dMulti-directional derivatives : replace d by D,whereD = [dij ]i=1,..n,j=1,..q
Automatic Differentiation
Forward and Reverse Mode
Extended knowledge
Extended knowledge
Directional AdjointsReverse mode:seed v = (v1, ...vm)seeding y j = vjcalculates v ∗ JfMulti-directional Adjoint : replace v by V,whereV = [vij ]i=1,..q,j=1,..m
Automatic Differentiation
Forward and Reverse Mode
Case Study
Case Study
Using FADBAD++:FADBAD++ were developed by Ole Stauning and ClausBendtsen.Flexible automatic differentiation using templates andoperator overloading in ANSI C++Only with source code, no additional library required.Free to use
Automatic Differentiation
Forward and Reverse Mode
Case Study
Case Study
Using FADBAD++:Test function : f (x) =
∏xi
Objective: Testing different coding of the function inForward mode, try to reuse the variableResult : Basically, no matter how you code,the memorycost as much as n ∗ n ∗ 8byte , no different between reusevariable or not
Automatic Differentiation
Forward and Reverse Mode
Case Study
Case Study
Using FADBAD++:Test function : f (x) =
∏xi
Objective: Testing Reverse modeResult : test until n = 6500 , Using Forward mode out ofmemory. Reverse is 127 times faster, and only take fewMB.Remark : Couldn’t see how the DAG take the memoryfrom using reverse mode, it’s more likely to observe byusing fewer independent variables but more complicatedfunction.
Automatic Differentiation
Complexity Analysis
Code List
Code-List given by re-writing the code into elemental binaryand unary operations/functions, e.g.[
y1y2
]=
[log2(x1x2) + x2x2
3 − a− x2√b · log(x1x2) + x2/x3 − x2x2
3 + a
]v1 = x1 v7 = v6 ∗ v2 v13 = v8 − v2v2 = x2 v8 = v7 − a v14 = v2
5v3 = x3 v9 = 1/v3 v15 =
√v12
v4 = v1 ∗ v2 v10 = v2 ∗ v9 v16 = v14 + v13v5 = log(v4) v11 = b ∗ v5 v17 = v15 − v8v6 = v2
3 v12 = v11 + v10
Automatic Differentiation
Complexity Analysis
Code-list (ctd.)
Assume code-list containsN± addition/substractions e.g v14 + v13N∗ multiplications e.g. v1 ∗ v2Nf nonlinear functions/operations e.g. log(v4), 1/v3Total of p + m = N± + N∗ + Nf statements
ThenEach addition/subtraction generates two ci,j = ±1Each multiplication generates two ci,j 6= ±1, 0Each nonlinear function generates one ci,j 6= 1, 0 requiringone nonlinear function evaluation e.g. v5 = log(v4) givesc5,4 = 1/v4.
So we have,N|c|=1 = 2N±N|c|6=1 = 2N∗ + 1Nf
Automatic Differentiation
Complexity Analysis
Forward Mode Complexity
Complexity of Forward Mode
flops(Jf ) = flops(f ) + flops(ci,j) + flops(fwd)
Assume flops(nonlinear function) = w , w > 1.
Cost of evaluation function is,
flops(f ) = N∗ + N± + wNf
Cost of evaluation local derivatives ci,j is,
flops(ci,j) = wNf .
Cost of forward propagation of derivatives is
flops(fwd) = n(2N|c|6=1 + N|c|=1 − p −m)= n(3N∗ + N± + Nf )
Automatic Differentiation
Complexity Analysis
Forward Mode Complexity
Complexity of Forward Mode (Ctd.)
Then for forward mode
flops(Jf )flops(f ) = 1 + wNf +n(3N∗+N±+Nf )
N∗+N±+wNf
= 1 + 3nN∗ + nN± + n( 1w + 1
n )wN f
where,
(N∗, N±, wN f ) =(N∗, N±, wNf )
N∗ + N± + wNf.
SinceN∗ + N± + wN f = 1 and all coefficients positive,
flops(Jf )flops(f )
≤ 1 + n ∗max(3, 1, (1w
+1n
)) = 1 + 3n.
n << m, Forward Mode preferred.
Automatic Differentiation
Complexity Analysis
Reverse Mode Complexity
Complexity of Reverse Mode
flops(rev) = m(4N∗ + 2N± + 2Nf ),
giving,
flops(Jf )flops(f ) = 1 + 4mN∗ + 2mN± + m( 2
w + 1m )wN f
and
flops(Jf )flops(f )
≤ 1 + m ∗max(4, 2, (2w
+1m
)) = 1 + 4m
For m = 1flops(5f ) ≤ 5flops(f )
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Differentiation Arithmetic
−→u = (u, u′),
where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).
−→u +−→v = (u + v , u′ + v ′)
−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)
−→x = (x , 1)−→c = (c, 0)
Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Differentiation Arithmetic
−→u = (u, u′),
where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).
−→u +−→v = (u + v , u′ + v ′)
−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)
−→x = (x , 1)−→c = (c, 0)
Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Differentiation Arithmetic
−→u = (u, u′),
where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).
−→u +−→v = (u + v , u′ + v ′)
−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)
−→x = (x , 1)−→c = (c, 0)
Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example of a Rational Function
f (x) = (x+1)(x−2)x+3
f (3) = 2/3, f ′(3) =?
−→f (−→x ) =
(−→x +
−→1 )(
−→x −−→2 )
(−→x +
−→3 )
=((x , 1) + (1, 0))× ((x , 1)− (2, 0))
((x , 1) + (3, 0))
Inserting the value −→x = (3, 1) into−→f produces
−→f (3, 1) =
((3, 1) + (1, 0))× ((3, 1)− (2, 0))
((3, 1) + (3, 0))
=(4, 1)× (1, 1)
(6, 1)
=(4, 5)
(6, 1)=
(23
,1318
)
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example of a Rational Function
f (x) = (x+1)(x−2)x+3
f (3) = 2/3, f ′(3) =?
−→f (−→x ) =
(−→x +
−→1 )(
−→x −−→2 )
(−→x +
−→3 )
=((x , 1) + (1, 0))× ((x , 1)− (2, 0))
((x , 1) + (3, 0))
Inserting the value −→x = (3, 1) into−→f produces
−→f (3, 1) =
((3, 1) + (1, 0))× ((3, 1)− (2, 0))
((3, 1) + (3, 0))
=(4, 1)× (1, 1)
(6, 1)
=(4, 5)
(6, 1)=
(23
,1318
)
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example of a Rational Function
f (x) = (x+1)(x−2)x+3
f (3) = 2/3, f ′(3) =?
−→f (−→x ) =
(−→x +
−→1 )(
−→x −−→2 )
(−→x +
−→3 )
=((x , 1) + (1, 0))× ((x , 1)− (2, 0))
((x , 1) + (3, 0))
Inserting the value −→x = (3, 1) into−→f produces
−→f (3, 1) =
((3, 1) + (1, 0))× ((3, 1)− (2, 0))
((3, 1) + (3, 0))
=(4, 1)× (1, 1)
(6, 1)
=(4, 5)
(6, 1)=
(23
,1318
)
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Derivatives of Element Functions
Chain Rule:
(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)
−→g (−→u ) =
−→g ((u, u′)) = (g(u), u′g′(u))
sin−→u = sin(u, u′) = (sin u, u′ cos u)
cos−→u = cos(u, u′) = (cos u,−u′ sin u)
e−→u = e(u,u′) = (eu, u′eu)
...
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Derivatives of Element Functions
Chain Rule:
(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)
−→g (−→u ) =
−→g ((u, u′)) = (g(u), u′g′(u))
sin−→u = sin(u, u′) = (sin u, u′ cos u)
cos−→u = cos(u, u′) = (cos u,−u′ sin u)
e−→u = e(u,u′) = (eu, u′eu)
...
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Derivatives of Element Functions
Chain Rule:
(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)
−→g (−→u ) =
−→g ((u, u′)) = (g(u), u′g′(u))
sin−→u = sin(u, u′) = (sin u, u′ cos u)
cos−→u = cos(u, u′) = (cos u,−u′ sin u)
e−→u = e(u,u′) = (eu, u′eu)
...
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example of Sin
From ../Intlab/gradient/@gradient/sin.m
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example for Element Functions
Evaluate the derivative at x=0.
f (x) = (1 + x + ex) sin x−→f (−→x ) = (
−→1 +
−→x + e−→x )sin−→x
−→f (0, 1) =
((1, 0) + (0, 1) + e(0,1)
)sin(0, 1)
=((1, 1) + (e0, e0)
)(sin 0, cos 0)
= (2, 2)(0, 1) = (0, 2).
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example for Element Functions
Evaluate the derivative at x=0.
f (x) = (1 + x + ex) sin x−→f (−→x ) = (
−→1 +
−→x + e−→x )sin−→x
−→f (0, 1) =
((1, 0) + (0, 1) + e(0,1)
)sin(0, 1)
=((1, 1) + (e0, e0)
)(sin 0, cos 0)
= (2, 2)(0, 1) = (0, 2).
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Example for Element Functions
Evaluate the derivative at x=0.
f (x) = (1 + x + ex) sin x−→f (−→x ) = (
−→1 +
−→x + e−→x )sin−→x
−→f (0, 1) =
((1, 0) + (0, 1) + e(0,1)
)sin(0, 1)
=((1, 1) + (e0, e0)
)(sin 0, cos 0)
= (2, 2)(0, 1) = (0, 2).
Automatic Differentiation
AD Softwares
AD tools in MATLAB
High-order Derivatives
−→u = (u, u′, u′′),
−→u +−→v = (u + v , u′ + v ′, u′′ + v ′′)
−→u −−→v = (u − v , u′ − v ′, u′′ − v ′′)−→u ×−→v = (uv , uv ′ + u′v , uv ′′ + 2u′v ′ + u′′v ′)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v , (u′′ − 2(u/v)′v ′ − (u/v)v ′′)/v)
· · · · · ·
Automatic Differentiation
AD Softwares
AD tools in MATLAB
INTLab
Developers: Institute for Reliable Computing, HamburgUniversity of Technology
Mode: ForwardMethod: Operator overloading
Language: MATLABURL: http://www.ti3.tu-harburg.de/rump/intlab/
Licensing: Open Source
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Rosenbrock Function
y1 = 400x1(x21 − x2) + 2(x1 − 1)
y2 = 200(x21 − x2)
Automatic Differentiation
AD Softwares
AD tools in MATLAB
One Step of Newton Method with INTLab
Automatic Differentiation
AD Softwares
AD tools in MATLAB
TOMLAB/MAD
Developers: Marcus M. Edvall and Kenneth Holmstrom,Tomlab Optimization Inc. (TOMLAB /MADintegration)Shaun A. Forth and Robert Ketzscher, CranfieldUniversity (MAD)
Mode: ForwardMethod: Operator overloading
Language: MATLABURL: http://tomlab.biz/products/mad/
Licensing: License
Automatic Differentiation
AD Softwares
AD tools in MATLAB
One Step of Newton Method with MAD
Automatic Differentiation
AD Softwares
AD tools in MATLAB
ADiMat
Developers: Andre Vehreschild, Institute for ScientificComputing, RWTH Aachen University
Mode: ForwardMethod: Source transformation
Operator overloadingLanguage: MATLAB
URL: http://www.sc.rwth-aachen.de/vehreschild/adimat.html
Licensing: under discussion
Automatic Differentiation
AD Softwares
AD tools in MATLAB
ADiMat’s Example
function [result1, result2]= f(x)% Compute the sin and square-root of x*2.% Very simple example for ADiMat website.% Andre Vehreschild, Institute for% Scientific Computing,% RWTH Aachen University, D-52056 Aachen,% Germany.% [email protected]
result1= sin(x);result2= sqrt(x*2);
Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html
Automatic Differentiation
AD Softwares
AD tools in MATLAB
ADiMat’s Example (cont.)
>> addiff(@f, ’x’, ’result1,result2’);>> p=magic(5);>> g_p=createFullGradients(p);>> [g_r1, r1, g_r2, r2]= g_f(g_p, p);>> J1= [g_r1{:}]; % and>> J2= [g_r2{:}];
Source: http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html
Automatic Differentiation
AD Softwares
AD tools in MATLAB
ADiMat’s Example (cont.)
function [g_result1, result1, g_result2, result2] = g_f(g_x, x)% Compute the sin and square-root of x*2.% Very simple example for ADiMat website.% Andre Vehreschild, Institute for Scientific Computing,% RWTH Aachen University, D-52056 Aachen, Germany.% [email protected]
g_result1= ((g_x).* cos(x));result1= sin(x);g_tmp_f_00000= g_x* 2;tmp_f_00000= x* 2;g_result2= ((g_tmp_f_00000)./ (2.*sqrt(tmp_f_00000)));result2= sqrt(tmp_f_00000);
Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Matrix Calculus
Definition: If X is p × q and Y is m × n, then dY: = dY/dX dX:where the derivative dY/dX is a large mn × pq matrix.
d(X 2) : = (XdX + dXX ) :
d(det(X )) = d(det(X T )) = det(X )(X−T ) :T dX :
d(ln(det(X ))) = (X−T ) :T dX :
Ref: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Vandermonde Function
Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Vandermonde Function (cont.)
Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5
Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Vandermonde Function (cont.)
Method 10 20 40 80 160 320 640 1280Function 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.000
MAD(Full) 0.070 0.060 0.070 0.130 0.581 2.664 10.535 45.535MAD(Sparse) 0.071 0.050 0.060 0.060 0.060 0.070 0.100 0.881
INTLab 0.050 0.040 0.040 0.090 0.040 0.050 0.071 0.120ADiMat 0.231 0.140 0.271 0.601 1.362 3.044 7.340 21.611
Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)
Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Arrowhead Function
Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Arrowhead Function (cont.)
Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5
Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Arrowhead Function (cont.)
Method 20 40 80 160 320 640 1280Function 0.010 0.000 0.000 0.000 0.000 0.000 0.000
MAD(Full) 0.180 0.050 0.070 0.200 1.111 4.367 17.796MAD(Sparse) 0.060 0.060 0.060 0.070 0.080 0.100 0.160
INTLab 0.090 0.051 0.050 0.050 0.081 0.140 0.340ADiMat 0.911 0.311 0.651 1.262 2.704 6.028 14.581
Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)
Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.
Automatic Differentiation
AD Softwares
AD tools in MATLAB
BDQRTIC mod
Automatic Differentiation
AD Softwares
AD tools in MATLAB
BDQRTIC mod (cont.)
Method 20 40 80 160 320 640 1280Function 12.809 0.010 0.000 0.000 0.000 0.010 0.000
MAD(Full) 2.604 0.121 0.150 0.490 2.513 10.926 43.162MAD(Sparse) 0.270 0.120 0.130 0.150 0.201 0.260 0.371
INTLab 2.293 0.080 0.100 0.110 0.150 0.230 0.481ADiMat 3.455 0.621 1.152 2.544 5.778 14.641 42.671
Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)
Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.
Automatic Differentiation
AD Softwares
AD tools in MATLAB
Summary of AD softwares in MATLab
Operator overloading method for AD forward mode is easyto implement by differentiation arithmetic.All of AD tools in Matlab are easy to use.Sparse storage provides a good way to improve theperformance of AD tools.
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
The Computational Differentiation Group atArgonne National Laboratory
ADIC introduced in 1997 by:
Chrirtian BischofScientific Computing at
RWTH Aachen University
Lucas Rohfounder, president andCEO of Hostway Co.
and the other team mem-bers.
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
State of ADIS
ADIC is an Automatic Differentiation tools In ANSI C/C++.
ADIC was introduced in 1966.
Last updated: June 10, 2005.
Official web site www-new.mcs.anl.gov/adic/down-2.htm.
ADIC is using forward method.
Supported Platforms: Unix/Linux.
Selected Application: NEOS
Related Research Group: Argonne National Laboratory,USA
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
ADICAnatomy
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
ADICProcess
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
func.cUntitled
#include "func.h"#include <math.h>
void func(data_t * pdata){ int i; double *x = pdata->x; double *y = pdata->y; double s, temp;
i=0; for (;i < pdata->len ;){ s = s + x[i]*y[i]; i++; }
temp = exp(s);
pdata->r = temp;}
Page 1
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
driver.c
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
Commands
The first command generates the header file ad_deriv.hand derivative function func.ad.c;
The second command compiles and links all neededfunctions and generates ad_func;
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
Handling Side Effects
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
Handling Side Effects
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
Handling Side Effects
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
Handling Side Effects
Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)
For Further Reading in ADIC
Christian H. Bischof, Paul D. Hovland, Boyana NorrisImplementation of Automatic Differentiation Tools.PEPM Š02, Jan. 1415, 2002 Portland, OR, USA
Paul D. Hovlan and Boyana NorrisUsers’ Guide to ADIC 1.1.UsersŠ Guide to ADIC 1.1
C. H. Bischof, L. Roh, A. J. Mauer-OatsADIC: an extensible automatic differentiation tool forANSI-C.Mathematics and Computer Science Division, ArgonneNational Laboratory, Argonne, IL 60439, USA
Automatic Differentiation
Reference
ReferenceC.H. Bischof and H. M. Bucker. Computing Derivatives of Computer Programs, in Modern Methods andAlgorithms of Quantum Chemistry: Proceedings, Second Edition, edited by J. Grotendorst,NIC-Directors,2000, pages 315-327C. Bischof, A. Carle, P. Khademi, and G. Pusch. Automatic Differentiation: Obtaining Fast and ReliableDerivatives-Fast, in Control Problems in Industry, edited by I. Lasiecka and B. Morton,1995, pages 1-16Andreas Griewank. On Automatic Differentiation, in Mathematical Programming: Recent Developments andApplications, edited by M. Iri and K. Tanabe, Kluwer Academic Publishers, 1989.Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, Penn., 2000.Shaun Forth. Introduction to Automatic Differentiation, presentation slide for The 4th InternationalConference on Automatic Differentiation. July 19-23 University of Chicago, Gleacher Centre, Chicago USA,2004.G. F. Corliss, Automatic Differentiation.Warwick Tucker, http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdfhttp://www.autodiff.org/http://www.ti3.tu-harburg.de/rump/intlab/http://tomopt.com/tomlab/products/mad/http://www.sc.rwth-aachen.de/vehreschild/adimat/index.htmlShaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation inMATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195 − 222Siegfried M. Rump INTLAB −− INTerval LABoratory Developments in Reliable Computing, KluwerAcademic Publishers, 1999, p77 − 104Christian H. Bischof, H. Martin Bucker, Bruno Lang, A. Rasch, Andre Vehreschild Combining SourceTransformation and Operator Overloading Techniques to Compute Derivatives for MATLAB ProgramsConference proceeding, Proceedings of the Second IEEE International Workshop on Source Code Analysisand Manipulation (SCAM 2002), IEEE Computer Society, 2002
Automatic Differentiation
Thanks & Questions
Thanks!
Questions?