recent advancements in differential equation solver software€¦ · adjoint sensitivity analysis....

Recent Advancements in Differential Equation Solver SoftwareCHRIS RACKAUCKASMASSACHUSETTS INSTITUTE OF TECHNOLOGY

1

Jubilee Symposium 2019: Future Directions of System Modeling and Simulation

DifferentialEquations.jl: Research Platform for Production ~300 methods available, including wrappers to C/Fortran methods

Platform for reproducible research and benchmarking MPI+GPU Compatibility Implicit, IMEX, multirate, symplectic, exponential integrators, etc. Adaptive high order methods for stochastic differential equations Stiff state-dependent delay differential equation discontinuity tracking Mix in Gillespie simulation (Continuous-Time Markov Chains) Automatic sparsity detection and optimization Arbitrary code injection through callbacksNative Julia methods routinely benchmark as one of the fastest libraries in most categories (caveat category: large (>1000) ODE/DAE systems)

2

3 Major Areas

Neural Differential Equations

A Hackable Model Compiler (ModelingToolkit.jl)

Improvements to basic numerical methods

3

There is a new field that is merging AI and domain-specific modeling: Scientific ML/AI

SCIENTIFIC AI: DOMAIN MODELS WITH INTEGRATED MACHINE LEARNING HTTPS://WWW.YOUTUBE.COM/WATCH?V=FGFX8CQHDQATHE ESSENTIAL TOOLS OF SCIENTIFIC MACHINE LEARNINGHTTP://WWW.STOCHASTICLIFESTYLE.COM/THE-ESSENTIAL-TOOLS-OF-SCIENTIFIC-MACHINE-LEARNING-SCIENTIFIC-ML/

4

https://www.youtube.com/watch?v=FGfx8CQHdQA

http://www.stochasticlifestyle.com/the-essential-tools-of-scientific-machine-learning-scientific-ml/

What is the mathematical structure of machine learning?

5

Neural Networks = Nonlinear Regression Polynomial: 𝑒𝑒𝑥𝑥 = 𝑎𝑎1 + 𝑎𝑎2𝑥𝑥 + 𝑎𝑎3𝑥𝑥2 + ⋯

Nonlinear: 𝑒𝑒𝑥𝑥 = 1 + 𝑎𝑎1 tanh 𝑎𝑎2𝑥𝑥𝑎𝑎3𝑥𝑥−tanh 𝑎𝑎4𝑥𝑥

Neural Network: 𝑒𝑒𝑥𝑥 ≈ 𝑊𝑊3𝜎𝜎 𝑊𝑊2𝜎𝜎 𝑊𝑊1𝑥𝑥 + 𝑏𝑏1 + 𝑏𝑏2 + 𝑏𝑏3. Train the weights (𝑊𝑊, 𝑏𝑏)

6

Universal Approximation

Theorem

NEURAL NETWORKS CAN GET 𝜖𝜖 CLOSE TO ANY 𝑅𝑅𝑛𝑛 → 𝑅𝑅𝑚𝑚 FUNCTION Neural Networks Overcome “the curse of dimensionality”

7

Not Quite a Black Box:Convolutional Neural Networks Encode (Spatial) Structure

8

A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way, Sumit Saha

Now let’s generalize this idea to scientific structures

9

Latent (Neural) Differential Equations

NEURAL ORDINARY DIFFERENTIAL EQUATION: 𝑢𝑢′ = 𝑓𝑓(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)

LET 𝑓𝑓 BE A NEURAL NETWORK

Training a neural differential equation:DiffEqFlux.jl

11

Solve the differential equation

Compute the gradient of the solution with respect to the parameters defining the neural network

Adjoint sensitivity analysisDifferentiable programming

Update the neural network and repeat

Automatically Learning the Model12

The real power comes from incorporating known structure into the ML framework(Mixed Neural Differential Equation)

13

Mix Neural Networks Into DiffEqs!

𝑑𝑑𝑥𝑥𝑑𝑑𝑡𝑡

=𝑑𝑑𝑑𝑑𝑑𝑑𝑡𝑡

= 𝑝𝑝1𝑑𝑑 + 𝑝𝑝2𝑥𝑥

Fit the “mixed neural differential equation” using the same method!

14

ML-Assisted Model Discovery

The chemical reactions imply an evolution of:

?

?

?

? ?

Data

?

15

Biologically-Informed Neural Network

NN(2)

NN(3)

NN(4)

?

DataFind neural networks so the model matches the data

16

Interpretability of Neural Differential Equations

Analyze the Jacobian/Hessian

17

Data-Efficient Physics-Embedded Machine Learning

Nonlinear Optimal Control as a Mixed Neural ODE

𝑥𝑥′ = 𝑓𝑓(𝑥𝑥 𝑡𝑡 ,𝑢𝑢 𝑡𝑡 , 𝑡𝑡)

Minimize 𝐽𝐽 = Φ 𝑥𝑥 𝑡𝑡0 , 𝑡𝑡0, 𝑥𝑥 𝑡𝑡𝑓𝑓 , 𝑡𝑡𝑓𝑓 + ∫𝑡𝑡0𝑡𝑡𝑓𝑓 𝐿𝐿 𝑥𝑥 𝑡𝑡 ,𝑢𝑢 𝑡𝑡 , 𝑡𝑡 𝑑𝑑𝑡𝑡

Example: 𝑥𝑥(𝑡𝑡) is the location of an automated drone, 𝑢𝑢 𝑡𝑡 is the controller, find what the controller should be such that the vehicle goes to the right place for the least energy.

Neural ODE Approach: Make 𝑢𝑢(𝑡𝑡) be a neural network. Find the neural network s.t. 𝑥𝑥 𝑡𝑡 correctly evolves

18

Neural PDEs for Acceleration of Navier-Stokes

Boussinesq Equations (Navier-Stokes) are used in climate models

People attempt to solve this by “parameterizing”, i.e. getting a 1-dimensional approximation through averaging:

where 𝑤𝑤′𝑐𝑐𝑐 is unknown.

Instead of picking a form for 𝑤𝑤′𝑐𝑐𝑐(the current method), replace it with a neural network and learn it from small scale simulations! Discretize. Result: Neural ODE.

19

Solving 1000 dimensional PDEs: Hamilton-Jacobi-Bellman, Nonlinear Black-Scholes

Semilinear Parabolic Form (Diffusion-Advection Equations, Hamilton-Jacobi-Bellman, Black-Scholes)

Make (𝜎𝜎𝑇𝑇∇u) 𝑡𝑡,𝑋𝑋 a neural network. Solve the resulting SDEs and learn 𝜎𝜎𝑇𝑇∇u

via:

Simplified: Transform it into a Backwards SDE. The unknown is a function! Learn the unknown function via neural

network. Once learned, the PDE solution is known.

Solving high-dimensional partial differential equations using deep learning, 2018, PNAS, Han, Jentzen, E

20

Represent 1000 dimensional PDEs as a 1000 dimensional neural SDE

Solving the PDE = training the neural network As a neural SDE, we can solve with higher order (less neural network

evaluations), adaptivity, etc.

21

Submitting to AISTATS 2020

DiffEqFlux.jl

UnifiedFramework

forScientific ML

The first (mixed) Neural Differential Equation solver. Supports: Neural ODEs Neural SDEs (SDDEs) Neural DAEs Neural DDEs Stiff Equations Hybrid Equations Adjoints via reverse-mode AD and adjoint

sensitivity analysis Data-Efficient and Physical Machine Learning

22

ModelingToolkit: An Open Source Compiler + IR for Models and Transformations

23

Defining Principles

A structured and documented IR for describing models

“Model transforms” are compiler passes IR -> IR

Extendable high level “context” system

Let others write domain-specific languages on top DSLs should not have to define

simplification, differentiation, etc!

24

Automatically convert numerical functions to symbolic

25

Current Research

Components and Pantelides for large DAE systems

An extendable framework for automated PDE discretizations

Methods for (nonlinear) model order reduction

Transformations for SDEs

Tearing and other structural optimizations

26

Now assume the form of the differential equation is “good”CAN WE IMPROVE THE SOLVERS?

27

Non-Stiff Methods are still being improved!

28

The Structure of a Runge-KuttaMethod

29

Ways to Judge an RK MethodOptimization of next order coefficients Stability

30

Dormand-Prince 5th Order (1980) 31

Advancements since 2010

Recent methods, Tsit5 and Vern#, reduce the number

of assumptions made in coefficient optimization, leading to more optimal

solutions (>2010)

Methods specialized for wave equations, low-

dispersion results, extended monotonicity equation for PDEs (SSPRK), etc. are hot topics in new high order

Runge-Kutta methods (2017)

32

100x100 Linear ODEs 33

3-Body Problem (CVODE_Adamsfails)

34

And parallelism is not well exploited!

35

Pervasive Allowance of Within-Method parallelism through Julia

Zero GPU/Distributed message passing done by the solver!

36

Multithreading Extrapolation

SimultaneousEuler steppingof differentstep sizes

37

Ketcheson, 2016

Parallel Runge-Kutta methods

5 stagesBut only 3 steps in parallel

38

DiffEqGPU.jl: Automatic GPU-based parameter parallelism of high level code

Demonstrates a 12x-90x speedup over multithreaded ODE solves by using just 1 GPU.

Handles stiff and non-stiff hybrid ODEs/SDEs/DDEs/DAEs with adaptive timestepping.

Support coming soon: Multiple GPUs Mixed with multithreading/distributed

Result: take existing hybrid ODE simulations written in Julia and automatically GPU+distributed+multithreaded parallelize it for the user!

Requires relatively small systems (<50 DEs?)

39

Stiff ODEs: Fall of the BDFWHAT’S COMING TO GET GEAR’S METHOD.

40

Evolution of Gear’s Method

GEAR: Original code. Adaptive order adaptive time via interpolation Lowers the stability!

LSODE series: update of GEAR Adds rootfinding, Krylov, etc

VODE: Variable-coefficient form No interpolation necessary.

CVODE: VODE rewritten in C++ Adds sensitivity analysis

41

Problems with BDF

BDF is a multistep method

Needs “Startup Steps”

Inefficient with events

It is only L-stable up to 2nd order

Has high truncation error coefficients

Implicit

Requires good step predictors

43

Orego Benchmarks 44

Rosenbrock Methods

Aren’t new! (ode23s)

Can fix a lot of problems:

Exploit sparse factorization

No step predictions required

Can optimize coefficients to high order

Con: Needs accurate Jacobians

Answer: AD (or symbolic)

45

ODE Problems can fall into different classes

Physical ModelingSecondOrderODEProblem(f,u0,tspan,p)

𝑢𝑢′′ = 𝑓𝑓(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)

PartitionedODEProblem(f1,f2,v0,u0,tspan,p)

𝑣𝑣′ = 𝑓𝑓1(𝑡𝑡,𝑢𝑢)

𝑢𝑢𝑐 = 𝑓𝑓2(𝑣𝑣)

HamiltonianODEProblem(H,p0,q0,tspan,p)

𝐻𝐻(𝑝𝑝, 𝑞𝑞)

PDE Discretizations

SplitODEProblem(f1,f2,u0,tspan,p) (IMEX)

𝑢𝑢′ = 𝑓𝑓1 𝑢𝑢, 𝑝𝑝, 𝑡𝑡 + 𝑓𝑓2(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)

SemilinearODEProblem(A,f2,u0,tspan,p)

𝑢𝑢′ = 𝐴𝐴𝑢𝑢 + 𝑓𝑓(𝑢𝑢, 𝑝𝑝, 𝑡𝑡)

LocalSemilinearODEProblem(A,f2,u0,tspan,p)𝑢𝑢′ = 𝐴𝐴𝑢𝑢 + 𝑓𝑓. (𝑢𝑢, 𝑝𝑝, 𝑡𝑡)

SDIRK Methods can treat half of the problem as explicit, decreasing the nonlinear solver cost

46

Exponential Runge-Kutta

Explicit methods for stiff equations

Small enough: Build matrix exponential

Large enough: Krylov exp(t*A)*v

Crossover Question: Can we automatically divide equations for IMEX/Multirate methods?

47

Current Results (DiffEqBenchmarks.jl)

<20-30 stiff ODEs = High order Rosenbrock <2000 ODEs = Optimized SDIRK Methods >2000 (general) ODEs = SUNDIALS BDF

Stabilized Explicit methods in PDE contexts

IMEX methods when a split is known

…

Ongoing research to crack the code for more types of systems

48

Putting it together for users: polyalgorithms

49

Conclusion

Today you can solve differential equations Tomorrow you will likely be able to solve them much faster

Neural-embedded methods for simplifying models

A compiler for researching transformations methods in context

Ongoing improvements to numerical methods

50

Students, want a paid summer position? Contact me for Google Summer of Code development. No Julia experience is required. https://julialang.org/soc/ideas-page

51

Industry interested in this research? Contact me to help fund development in JuliaDiffEq! We need industry sponsors/interest for CSSI grants, please let us know

recent advancements in differential equation solver software€¦ · adjoint sensitivity analysis....

Documents