recent advancements in differential equation solver software€¦ · adjoint sensitivity analysis....
TRANSCRIPT
Recent Advancements in Differential Equation Solver SoftwareCHRIS RACKAUCKASMASSACHUSETTS INSTITUTE OF TECHNOLOGY
1
Jubilee Symposium 2019: Future Directions of System Modeling and Simulation
DifferentialEquations.jl: Research Platform for Production ~300 methods available, including wrappers to C/Fortran methods
Platform for reproducible research and benchmarking MPI+GPU Compatibility Implicit, IMEX, multirate, symplectic, exponential integrators, etc. Adaptive high order methods for stochastic differential equations Stiff state-dependent delay differential equation discontinuity tracking Mix in Gillespie simulation (Continuous-Time Markov Chains) Automatic sparsity detection and optimization Arbitrary code injection through callbacksNative Julia methods routinely benchmark as one of the fastest libraries in most categories (caveat category: large (>1000) ODE/DAE systems)
2
3 Major Areas
Neural Differential Equations
A Hackable Model Compiler (ModelingToolkit.jl)
Improvements to basic numerical methods
3
There is a new field that is merging AI and domain-specific modeling: Scientific ML/AI
SCIENTIFIC AI: DOMAIN MODELS WITH INTEGRATED MACHINE LEARNING HTTPS://WWW.YOUTUBE.COM/WATCH?V=FGFX8CQHDQATHE ESSENTIAL TOOLS OF SCIENTIFIC MACHINE LEARNINGHTTP://WWW.STOCHASTICLIFESTYLE.COM/THE-ESSENTIAL-TOOLS-OF-SCIENTIFIC-MACHINE-LEARNING-SCIENTIFIC-ML/
4
What is the mathematical structure of machine learning?
5
Neural Networks = Nonlinear Regression Polynomial: 𝑒𝑒𝑥𝑥 = 𝑎𝑎1 + 𝑎𝑎2𝑥𝑥 + 𝑎𝑎3𝑥𝑥2 + ⋯
Nonlinear: 𝑒𝑒𝑥𝑥 = 1 + 𝑎𝑎1 tanh 𝑎𝑎2𝑥𝑥𝑎𝑎3𝑥𝑥−tanh 𝑎𝑎4𝑥𝑥
Neural Network: 𝑒𝑒𝑥𝑥 ≈ 𝑊𝑊3𝜎𝜎 𝑊𝑊2𝜎𝜎 𝑊𝑊1𝑥𝑥 + 𝑏𝑏1 + 𝑏𝑏2 + 𝑏𝑏3. Train the weights (𝑊𝑊, 𝑏𝑏)
6
Universal Approximation
Theorem
NEURAL NETWORKS CAN GET 𝜖𝜖 CLOSE TO ANY 𝑅𝑅𝑛𝑛 → 𝑅𝑅𝑚𝑚 FUNCTION Neural Networks Overcome “the curse of dimensionality”
7
Not Quite a Black Box:Convolutional Neural Networks Encode (Spatial) Structure
8
A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way, Sumit Saha
Now let’s generalize this idea to scientific structures
9
Latent (Neural) Differential Equations
NEURAL ORDINARY DIFFERENTIAL EQUATION: 𝑢𝑢′ = 𝑓𝑓(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)
LET 𝑓𝑓 BE A NEURAL NETWORK
Training a neural differential equation:DiffEqFlux.jl
11
Solve the differential equation
Compute the gradient of the solution with respect to the parameters defining the neural network
Adjoint sensitivity analysisDifferentiable programming
Update the neural network and repeat
Automatically Learning the Model12
The real power comes from incorporating known structure into the ML framework(Mixed Neural Differential Equation)
13
Mix Neural Networks Into DiffEqs!
𝑑𝑑𝑥𝑥𝑑𝑑𝑡𝑡
=𝑑𝑑𝑑𝑑𝑑𝑑𝑡𝑡
= 𝑝𝑝1𝑑𝑑 + 𝑝𝑝2𝑥𝑥
Fit the “mixed neural differential equation” using the same method!
14
ML-Assisted Model Discovery
The chemical reactions imply an evolution of:
?
?
?
? ?
Data
?
15
Biologically-Informed Neural Network
NN(2)
NN(3)
NN(4)
?
DataFind neural networks so the model matches the data
16
Interpretability of Neural Differential Equations
Analyze the Jacobian/Hessian
17
Data-Efficient Physics-Embedded Machine Learning
Nonlinear Optimal Control as a Mixed Neural ODE
𝑥𝑥′ = 𝑓𝑓(𝑥𝑥 𝑡𝑡 ,𝑢𝑢 𝑡𝑡 , 𝑡𝑡)
Minimize 𝐽𝐽 = Φ 𝑥𝑥 𝑡𝑡0 , 𝑡𝑡0, 𝑥𝑥 𝑡𝑡𝑓𝑓 , 𝑡𝑡𝑓𝑓 + ∫𝑡𝑡0𝑡𝑡𝑓𝑓 𝐿𝐿 𝑥𝑥 𝑡𝑡 ,𝑢𝑢 𝑡𝑡 , 𝑡𝑡 𝑑𝑑𝑡𝑡
Example: 𝑥𝑥(𝑡𝑡) is the location of an automated drone, 𝑢𝑢 𝑡𝑡 is the controller, find what the controller should be such that the vehicle goes to the right place for the least energy.
Neural ODE Approach: Make 𝑢𝑢(𝑡𝑡) be a neural network. Find the neural network s.t. 𝑥𝑥 𝑡𝑡 correctly evolves
18
Neural PDEs for Acceleration of Navier-Stokes
Boussinesq Equations (Navier-Stokes) are used in climate models
People attempt to solve this by “parameterizing”, i.e. getting a 1-dimensional approximation through averaging:
where 𝑤𝑤′𝑐𝑐𝑐 is unknown.
Instead of picking a form for 𝑤𝑤′𝑐𝑐𝑐(the current method), replace it with a neural network and learn it from small scale simulations! Discretize. Result: Neural ODE.
19
Solving 1000 dimensional PDEs: Hamilton-Jacobi-Bellman, Nonlinear Black-Scholes
Semilinear Parabolic Form (Diffusion-Advection Equations, Hamilton-Jacobi-Bellman, Black-Scholes)
Make (𝜎𝜎𝑇𝑇∇u) 𝑡𝑡,𝑋𝑋 a neural network. Solve the resulting SDEs and learn 𝜎𝜎𝑇𝑇∇u
via:
Simplified: Transform it into a Backwards SDE. The unknown is a function! Learn the unknown function via neural
network. Once learned, the PDE solution is known.
Solving high-dimensional partial differential equations using deep learning, 2018, PNAS, Han, Jentzen, E
20
Represent 1000 dimensional PDEs as a 1000 dimensional neural SDE
Solving the PDE = training the neural network As a neural SDE, we can solve with higher order (less neural network
evaluations), adaptivity, etc.
21
Submitting to AISTATS 2020
DiffEqFlux.jl
UnifiedFramework
forScientific ML
The first (mixed) Neural Differential Equation solver. Supports: Neural ODEs Neural SDEs (SDDEs) Neural DAEs Neural DDEs Stiff Equations Hybrid Equations Adjoints via reverse-mode AD and adjoint
sensitivity analysis Data-Efficient and Physical Machine Learning
22
ModelingToolkit: An Open Source Compiler + IR for Models and Transformations
23
Defining Principles
A structured and documented IR for describing models
“Model transforms” are compiler passes IR -> IR
Extendable high level “context” system
Let others write domain-specific languages on top DSLs should not have to define
simplification, differentiation, etc!
24
Automatically convert numerical functions to symbolic
25
Current Research
Components and Pantelides for large DAE systems
An extendable framework for automated PDE discretizations
Methods for (nonlinear) model order reduction
Transformations for SDEs
Tearing and other structural optimizations
26
Now assume the form of the differential equation is “good”CAN WE IMPROVE THE SOLVERS?
27
Non-Stiff Methods are still being improved!
28
The Structure of a Runge-KuttaMethod
29
Ways to Judge an RK MethodOptimization of next order coefficients Stability
30
Dormand-Prince 5th Order (1980) 31
Advancements since 2010
Recent methods, Tsit5 and Vern#, reduce the number
of assumptions made in coefficient optimization, leading to more optimal
solutions (>2010)
Methods specialized for wave equations, low-
dispersion results, extended monotonicity equation for PDEs (SSPRK), etc. are hot topics in new high order
Runge-Kutta methods (2017)
32
100x100 Linear ODEs 33
3-Body Problem (CVODE_Adamsfails)
34
And parallelism is not well exploited!
35
Pervasive Allowance of Within-Method parallelism through Julia
Zero GPU/Distributed message passing done by the solver!
36
Multithreading Extrapolation
SimultaneousEuler steppingof differentstep sizes
37
Ketcheson, 2016
Parallel Runge-Kutta methods
5 stagesBut only 3 steps in parallel
38
DiffEqGPU.jl: Automatic GPU-based parameter parallelism of high level code
Demonstrates a 12x-90x speedup over multithreaded ODE solves by using just 1 GPU.
Handles stiff and non-stiff hybrid ODEs/SDEs/DDEs/DAEs with adaptive timestepping.
Support coming soon: Multiple GPUs Mixed with multithreading/distributed
Result: take existing hybrid ODE simulations written in Julia and automatically GPU+distributed+multithreaded parallelize it for the user!
Requires relatively small systems (<50 DEs?)
39
Stiff ODEs: Fall of the BDFWHAT’S COMING TO GET GEAR’S METHOD.
40
Evolution of Gear’s Method
GEAR: Original code. Adaptive order adaptive time via interpolation Lowers the stability!
LSODE series: update of GEAR Adds rootfinding, Krylov, etc
VODE: Variable-coefficient form No interpolation necessary.
CVODE: VODE rewritten in C++ Adds sensitivity analysis
41
42
Problems with BDF
BDF is a multistep method
Needs “Startup Steps”
Inefficient with events
It is only L-stable up to 2nd order
Has high truncation error coefficients
Implicit
Requires good step predictors
43
Orego Benchmarks 44
Rosenbrock Methods
Aren’t new! (ode23s)
Can fix a lot of problems:
Exploit sparse factorization
No step predictions required
Can optimize coefficients to high order
Con: Needs accurate Jacobians
Answer: AD (or symbolic)
45
ODE Problems can fall into different classes
Physical ModelingSecondOrderODEProblem(f,u0,tspan,p)
𝑢𝑢′′ = 𝑓𝑓(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)
PartitionedODEProblem(f1,f2,v0,u0,tspan,p)
𝑣𝑣′ = 𝑓𝑓1(𝑡𝑡,𝑢𝑢)
𝑢𝑢𝑐 = 𝑓𝑓2(𝑣𝑣)
HamiltonianODEProblem(H,p0,q0,tspan,p)
𝐻𝐻(𝑝𝑝, 𝑞𝑞)
PDE Discretizations
SplitODEProblem(f1,f2,u0,tspan,p) (IMEX)
𝑢𝑢′ = 𝑓𝑓1 𝑢𝑢, 𝑝𝑝, 𝑡𝑡 + 𝑓𝑓2(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)
SemilinearODEProblem(A,f2,u0,tspan,p)
𝑢𝑢′ = 𝐴𝐴𝑢𝑢 + 𝑓𝑓(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)
LocalSemilinearODEProblem(A,f2,u0,tspan,p)𝑢𝑢′ = 𝐴𝐴𝑢𝑢 + 𝑓𝑓. (𝑢𝑢, 𝑝𝑝, 𝑡𝑡)
SDIRK Methods can treat half of the problem as explicit, decreasing the nonlinear solver cost
46
Exponential Runge-Kutta
Explicit methods for stiff equations
Small enough: Build matrix exponential
Large enough: Krylov exp(t*A)*v
Crossover Question: Can we automatically divide equations for IMEX/Multirate methods?
47
Current Results (DiffEqBenchmarks.jl)
<20-30 stiff ODEs = High order Rosenbrock <2000 ODEs = Optimized SDIRK Methods >2000 (general) ODEs = SUNDIALS BDF
Stabilized Explicit methods in PDE contexts
IMEX methods when a split is known
…
Ongoing research to crack the code for more types of systems
48
Putting it together for users: polyalgorithms
49
Conclusion
Today you can solve differential equations Tomorrow you will likely be able to solve them much faster
Neural-embedded methods for simplifying models
A compiler for researching transformations methods in context
Ongoing improvements to numerical methods
50
Students, want a paid summer position? Contact me for Google Summer of Code development. No Julia experience is required. https://julialang.org/soc/ideas-page
51
Industry interested in this research? Contact me to help fund development in JuliaDiffEq! We need industry sponsors/interest for CSSI grants, please let us know