matthew j johnson ([email protected]) deep learning summer ...duvenaud/talks/johnson... · matthew...
TRANSCRIPT
![Page 1: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/1.jpg)
Automatic differentiationMatthew J Johnson ([email protected])
Deep Learning Summer School Montreal 2017
brain
Dougal Maclaurin David Duvenaud Ryan P Adams
![Page 2: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/2.jpg)
Our awesome new world
![Page 3: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/3.jpg)
Our awesome new world
• TensorFlow, Stan, Theano, Edward, PyTorch, MinPy
• Only need to specify forward model
• Autodiff + optimization / inference done for you
![Page 4: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/4.jpg)
Our awesome new world
• TensorFlow, Stan, Theano, Edward, PyTorch, MinPy
• Only need to specify forward model
• Autodiff + optimization / inference done for you
• loops? branching? recursion? closures? data structures?
![Page 5: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/5.jpg)
Our awesome new world
• TensorFlow, Stan, Theano, Edward, PyTorch, MinPy
• Only need to specify forward model
• Autodiff + optimization / inference done for you
• loops? branching? recursion? closures? data structures?
• debugger?
![Page 6: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/6.jpg)
Our awesome new world
• TensorFlow, Stan, Theano, Edward, PyTorch, MinPy
• Only need to specify forward model
• Autodiff + optimization / inference done for you
• loops? branching? recursion? closures? data structures?
• debugger?
• a second compiler/interpreter to satisfy
![Page 7: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/7.jpg)
Our awesome new world
• TensorFlow, Stan, Theano, Edward, PyTorch, MinPy
• Only need to specify forward model
• Autodiff + optimization / inference done for you
• loops? branching? recursion? closures? data structures?
• debugger?
• a second compiler/interpreter to satisfy
• a new mini-language to learn
![Page 8: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/8.jpg)
Autograd
• github.com/hips/autograd
• differentiates native Python code
• handles most of Numpy + Scipy
• loops, branching, recursion, closures
• arrays, tuples, lists, dicts, classes, …
• derivatives of derivatives
• a one-function API
• small and easy to extend
Dougal Maclaurin
![Page 9: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/9.jpg)
Autograd examplesAutograd examples
import autograd.numpy as np
from autograd import grad
def predict(weights , inputs ):
for W, b in weights:
outputs = np.dot(inputs , W) + b
inputs = np.tanh(outputs)
return outputs
def init_params(scale , sizes):
return [(npr.randn(nin , out) * scale ,
npr.randn(out) * scale)
for nin , out in
zip(sizes[:-1], sizes [1:])]
def logprob_func(weights , inputs , targets ):
preds = predict(weights , inputs)
return np.sum((preds - targets )**2)
gradient_func = grad(logprob_func)
import autograd.numpy as np
import auotgrad.numpy.random as npr
from autograd import grad
def predict(weights, inputs):
for W, b in weights:
outputs = np.dot(inputs, W) + b
inputs = np.tanh(outputs)
return outputs
def init_params(scale, sizes):
return [(npr.randn(m, n) * scale,
npr.randn(n) * scale)
for m, n in zip(sizes[:-1], sizes[1:])]
def logprob_fun(params, inputs, targets):
preds = predict(weights, inputs)
return np.sum((preds - targets)**2)
gradient_fun = grad(logprob_fun)
1
![Page 10: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/10.jpg)
Autograd examples
import autograd.numpy as np
import auotgrad.numpy.random as npr
from autograd import grad
def predict(weights, inputs):
for W, b in weights:
outputs = np.dot(inputs, W) + b
inputs = np.tanh(outputs)
return outputs
def init_params(scale, sizes):
return [(npr.randn(m, n) * scale,
npr.randn(n) * scale)
for m, n in zip(sizes[:-1], sizes[1:])]
def logprob_fun(params, inputs, targets):
preds = predict(weights, inputs)
return np.sum((preds - targets)**2)
gradient_fun = grad(logprob_fun)
import autograd.numpy as np
from autograd import grad
import matplotlib.pyplot as plt
x = np.linspace(-7, 7, 200)
plt.plot(x, np.tanh(x),
x, grad(np.tanh)(x), # first derivative
x, grad(grad(np.tanh))(x), # second derivative
x, grad(grad(grad(np.tanh)))(x), # third derivative
x, grad(grad(grad(grad(np.tanh))))(x), # fourth derivative
x, grad(grad(grad(grad(grad(np.tanh)))))(x), # fifth derivative
x, grad(grad(grad(grad(grad(grad(np.tanh))))))(x)) # sixth derivative
from autograd import grad, jacobian
def hessian(fun, argnum=0):
return jacobian(jacobian(fun, argnum), argnum)
def hvp(fun):
def grad_dot_vector(arg, vector):
return np.dot(grad(fun)(arg), vector)
return grad(grad_dot_vector)
1
![Page 11: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/11.jpg)
Hessians and HVPs
r2f(x) · v = r
x
(rx
f(x) · v)
import autograd.numpy as np
import auotgrad.numpy.random as npr
from autograd import grad
def predict(weights, inputs):
for W, b in weights:
outputs = np.dot(inputs, W) + b
inputs = np.tanh(outputs)
return outputs
def init_params(scale, sizes):
return [(npr.randn(m, n) * scale,
npr.randn(n) * scale)
for m, n in zip(sizes[:-1], sizes[1:])]
def logprob_fun(params, inputs, targets):
preds = predict(weights, inputs)
return np.sum((preds - targets)**2)
gradient_fun = grad(logprob_fun)
import autograd.numpy as np
from autograd import grad
import matplotlib.pyplot as plt
x = np.linspace(-7, 7, 200)
plt.plot(x, np.tanh(x),
x, grad(np.tanh)(x), # first derivative
x, grad(grad(np.tanh))(x), # second derivative
x, grad(grad(grad(np.tanh)))(x), # third derivative
x, grad(grad(grad(grad(np.tanh))))(x), # fourth derivative
x, grad(grad(grad(grad(grad(np.tanh)))))(x), # fifth derivative
x, grad(grad(grad(grad(grad(grad(np.tanh))))))(x)) # sixth derivative
from autograd import grad, jacobian
def hessian(fun, argnum=0):
return jacobian(jacobian(fun, argnum), argnum)
def hvp(fun):
def grad_dot_vector(arg, vector):
return np.dot(grad(fun)(arg), vector)
return grad(grad_dot_vector)
1
![Page 12: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/12.jpg)
Black-box inference in a tweetBut what about inference?Stan also provides inference routines...
... which are a tiny amount of code with autodiff!
![Page 13: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/13.jpg)
Tutorial goals
1. Jacobians and the chain rule
• Forward and reverse accumulation
2. Autograd’s implementation
• Fully closed tracing autodiff in Python
3. Advanced autodiff techniques
• Checkpointing, forward from reverse, differentiating optima and fixed points
![Page 14: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/14.jpg)
Tutorial goals
1. Jacobians and the chain rule
• Forward and reverse accumulation
2. Autograd’s implementation
• Fully closed tracing autodiff in Python
3. Advanced autodiff techniques
• Checkpointing, forward from reverse, differentiating optima and fixed points
![Page 15: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/15.jpg)
![Page 16: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/16.jpg)
F : Rn ! R
![Page 17: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/17.jpg)
7!
x 2 Rn
y 2 RF :F : Rn ! R
![Page 18: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/18.jpg)
7!
x 2 Rn
y 2 RF :
F = D � C �B �A
F : Rn ! R
![Page 19: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/19.jpg)
7!
x 2 Rn
y 2 RF :
F = D � C �B �A
F : Rn ! R
y = F (x) = D(C(B(A(x))))
![Page 20: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/20.jpg)
7!
x 2 Rn
y 2 RF :
F = D � C �B �A
F : Rn ! R
y = F (x) = D(C(B(A(x))))
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 21: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/21.jpg)
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 22: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/22.jpg)
F 0(x) =@y
@x=
h@y
@x1· · · @y
@xn
i
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 23: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/23.jpg)
F 0(x) =@y
@x=
h@y
@x1· · · @y
@xn
i
F 0(x) =@y
@c
@c
@b
@b
@a
@a
@x
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 24: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/24.jpg)
F 0(x) =@y
@x=
h@y
@x1· · · @y
@xn
i
@y
@c= D0(c)
F 0(x) =@y
@c
@c
@b
@b
@a
@a
@x
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 25: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/25.jpg)
F 0(x) =@y
@x=
h@y
@x1· · · @y
@xn
i
@y
@c= D0(c)
@c
@b= C 0(b)
F 0(x) =@y
@c
@c
@b
@b
@a
@a
@x
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 26: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/26.jpg)
F 0(x) =@y
@x=
h@y
@x1· · · @y
@xn
i
@y
@c= D0(c)
@c
@b= C 0(b)
@b
@a= B0(a)
@a
@x= A0(x)
F 0(x) =@y
@c
@c
@b
@b
@a
@a
@x
y = D(c), c = C(b), b = B(a), a = A(x)
![Page 27: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/27.jpg)
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
![Page 28: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/28.jpg)
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
(
@b
@x=
2
64
@b1@x1
· · · @b1@xn
.... . .
...@bm@x1
· · · @bm@xn
3
75
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
![Page 29: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/29.jpg)
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
(
@b
@x=
2
64
@b1@x1
· · · @b1@xn
.... . .
...@bm@x1
· · · @bm@xn
3
75
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
Forward accumulation
![Page 30: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/30.jpg)
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
(
@b
@x=
2
64
@b1@x1
· · · @b1@xn
.... . .
...@bm@x1
· · · @bm@xn
3
75
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
Forward accumulation
F 0(x) =
✓✓@y
@c
@c
@b
◆@b
@a
◆@a
@x
![Page 31: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/31.jpg)
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
(
@b
@x=
2
64
@b1@x1
· · · @b1@xn
.... . .
...@bm@x1
· · · @bm@xn
3
75
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
Forward accumulation
F 0(x) =
✓✓@y
@c
@c
@b
◆@b
@a
◆@a
@xF 0(x) =
✓✓@y
@c
@c
@b
◆@b
@a
◆@a
@x
(
@y
@b=
h@y@b1
· · · @y@bm
i
![Page 32: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/32.jpg)
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
(
@b
@x=
2
64
@b1@x1
· · · @b1@xn
.... . .
...@bm@x1
· · · @bm@xn
3
75
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
@a
@x
◆
Forward accumulation
F 0(x) =
✓✓@y
@c
@c
@b
◆@b
@a
◆@a
@xF 0(x) =
✓✓@y
@c
@c
@b
◆@b
@a
◆@a
@x
(
@y
@b=
h@y@b1
· · · @y@bm
iReverse
accumulation
![Page 33: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/33.jpg)
![Page 34: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/34.jpg)
F 0(x) v =@y
@c
@c
@b
@b
@a
@a
@xv
![Page 35: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/35.jpg)
F 0(x) v =@y
@c
@c
@b
@b
@a
@a
@xv
F 0(x) v =@y
@c
✓@c
@b
✓@b
@a
✓@a
@xv
◆◆◆
![Page 36: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/36.jpg)
F 0(x) v =@y
@c
@c
@b
@b
@a
@a
@xv
F 0(x) v =@y
@c
✓@c
@b
✓@b
@a
✓@a
@xv
◆◆◆
Forward accumulation $ Jacobian-vector productsBuild Jacobian one column at a time
![Page 37: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/37.jpg)
F 0(x) v =@y
@c
@c
@b
@b
@a
@a
@xv
F 0(x) v =@y
@c
✓@c
@b
✓@b
@a
✓@a
@xv
◆◆◆
F 0(x) =@y
@c
✓@c
@b
✓@b
@a
✓@a
@x
@x
@x
◆◆◆
Forward accumulation $ Jacobian-vector productsBuild Jacobian one column at a time
![Page 38: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/38.jpg)
![Page 39: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/39.jpg)
v
TF 0(x) =v
T @y
@c
@c
@b
@b
@a
@a
@x
![Page 40: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/40.jpg)
v
TF 0(x) =v
T @y
@c
@c
@b
@b
@a
@a
@x
v
TF 0(x) =
✓✓✓v
T @y
@c
◆@c
@b
◆@b
@a
◆@a
@x
◆◆◆
![Page 41: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/41.jpg)
Reverse accumulation $ vector-Jacobian productsBuild Jacobian one row at a time
v
TF 0(x) =v
T @y
@c
@c
@b
@b
@a
@a
@x
v
TF 0(x) =
✓✓✓v
T @y
@c
◆@c
@b
◆@b
@a
◆@a
@x
◆◆◆
![Page 42: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/42.jpg)
Reverse accumulation $ vector-Jacobian productsBuild Jacobian one row at a time
v
TF 0(x) =v
T @y
@c
@c
@b
@b
@a
@a
@x
v
TF 0(x) =
✓✓✓v
T @y
@c
◆@c
@b
◆@b
@a
◆@a
@x
◆◆◆
F 0(x) =
✓✓✓@y
@y
@y
@c
◆@c
@b
◆@b
@a
◆@a
@x
◆◆◆
![Page 43: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/43.jpg)
Forward and reverse accumulation
• Forward accumulation
• Jacobian-vector products
• “push-forward”
• build Jacobian matrix one column at a time
• Reverse accumulation
• vector-Jacobian products
• “pull-back”
• build Jacobian matrix one row at a time
![Page 44: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/44.jpg)
Non-chain composition
![Page 45: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/45.jpg)
Non-chain composition
Fan-in y = F (x1,x2)
![Page 46: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/46.jpg)
Non-chain composition
Fan-in y = F (x1,x2)
@y
@x1= F 0
1(x1,x2)@y
@x2= F 0
2(x1,x2)
![Page 47: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/47.jpg)
Non-chain composition
Fan-in y = F (x1,x2)
G(x) =
x
x
�=
II
�x
Fan-out
@y
@x1= F 0
1(x1,x2)@y
@x2= F 0
2(x1,x2)
![Page 48: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/48.jpg)
Non-chain composition
Fan-in y = F (x1,x2)
G(x) =
x
x
�=
II
�x
Fan-out
@y
@x1= F 0
1(x1,x2)@y
@x2= F 0
2(x1,x2)
G0(x) =
II
�v
TG0(x) =
⇥v1
Tv2
T⇤
II
�= v1
T + v2T
![Page 49: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/49.jpg)
Tutorial goals
1. Jacobians and the chain rule
• Forward and reverse accumulation
2. Autograd’s implementation
• Fully closed tracing autodiff in Python
3. Advanced autodiff techniques
• Checkpointing, forward from reverse, differentiating optima and fixed points
![Page 50: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/50.jpg)
Autodiff implementations
1. Read and generate source code ahead-of-time
• source and target language could be Python
• or a “computation graph” language (TensorFlow)
2. Monitor function execution at runtime
![Page 51: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/51.jpg)
Autodiff implementations
1. Read and generate source code ahead-of-time
• source and target language could be Python
• or a “computation graph” language (TensorFlow)
2. Monitor function execution at runtime
![Page 52: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/52.jpg)
Autograd’s ingredients
1. Tracing the composition of primitive functions
2. Defining a vector-Jacobian product (VJP) operator for each primitive
3. Composing VJPs backward
![Page 53: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/53.jpg)
Autograd’s ingredients
1. Tracing the composition of primitive functions
2. Defining a vector-Jacobian product (VJP) operator for each primitive
3. Composing VJPs backward
![Page 54: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/54.jpg)
numpy.sum
![Page 55: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/55.jpg)
numpy.sum
autograd.numpy.sum
primitive
![Page 56: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/56.jpg)
numpy.sum
autograd.numpy.sum
primitive
Node ã
value:function:parents:
aF[x]
![Page 57: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/57.jpg)
numpy.sum
autograd.numpy.sum
primitive
Node ã
value:function:parents:
aF[x]
unboxa
![Page 58: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/58.jpg)
numpy.sum
autograd.numpy.sum
primitive
Node ã
value:function:parents:
aF[x]
Node b
value:function:parents:
banp.sum[ã]
unbox boxa b
˜
˜
![Page 59: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/59.jpg)
class Node(object):
__slots__ = [’value’, ’recipe’, ’progenitors’, ’vspace’]
def __init__(self, value, recipe, progenitors):
self.value = value
self.recipe = recipe
self.progenitors = progenitors
self.vspace = vspace(value)
3
![Page 60: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/60.jpg)
class Node(object):
__slots__ = [’value’, ’recipe’, ’progenitors’, ’vspace’]
def __init__(self, value, recipe, progenitors):
self.value = value
self.recipe = recipe
self.progenitors = progenitors
self.vspace = vspace(value)
class primitive(object):
def __call__(self, *args, **kwargs):
argvals = list(args)
progenitors = set()
parents = []
for argnum, arg in enumerate(args):
if isnode(arg):
argvals[argnum] = arg.value
if argnum in self.zero_vjps: continue
parents.append((argnum, arg))
progenitors.update(arg.progenitors & active_progenitors)
result_value = self.fun(*argvals, **kwargs)
return new_node(result_value, (self, args, kwargs, parents), progenitors)
3
![Page 61: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/61.jpg)
class Node(object):
__slots__ = [’value’, ’recipe’, ’progenitors’, ’vspace’]
def __init__(self, value, recipe, progenitors):
self.value = value
self.recipe = recipe
self.progenitors = progenitors
self.vspace = vspace(value)
class primitive(object):
def __call__(self, *args, **kwargs):
argvals = list(args)
progenitors = set()
parents = []
for argnum, arg in enumerate(args):
if isnode(arg):
argvals[argnum] = arg.value
if argnum in self.zero_vjps: continue
parents.append((argnum, arg))
progenitors.update(arg.progenitors & active_progenitors)
result_value = self.fun(*argvals, **kwargs)
return new_node(result_value, (self, args, kwargs, parents), progenitors)
3
![Page 62: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/62.jpg)
class Node(object):
__slots__ = [’value’, ’recipe’, ’progenitors’, ’vspace’]
def __init__(self, value, recipe, progenitors):
self.value = value
self.recipe = recipe
self.progenitors = progenitors
self.vspace = vspace(value)
class primitive(object):
def __call__(self, *args, **kwargs):
argvals = list(args)
progenitors = set()
parents = []
for argnum, arg in enumerate(args):
if isnode(arg):
argvals[argnum] = arg.value
if argnum in self.zero_vjps: continue
parents.append((argnum, arg))
progenitors.update(arg.progenitors & active_progenitors)
result_value = self.fun(*argvals, **kwargs)
return new_node(result_value, (self, args, kwargs, parents), progenitors)
def forward_pass(fun, args, kwargs, argnum=0):
args = list(args)
start_node = new_progenitor(args[argnum])
args[argnum] = start_node
active_progenitors.add(start_node)
end_node = fun(*args, **kwargs)
active_progenitors.remove(start_node)
return start_node, end_node
3
![Page 63: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/63.jpg)
start_node
x
![Page 64: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/64.jpg)
start_node
x
a = A(x)
![Page 65: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/65.jpg)
b = B(a)
start_node
x
a = A(x)
![Page 66: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/66.jpg)
b = B(a)
c = C(b)
start_node
x
a = A(x)
![Page 67: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/67.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
![Page 68: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/68.jpg)
start_node end_node
![Page 69: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/69.jpg)
start_node end_node
No control flow!
![Page 70: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/70.jpg)
Autograd’s ingredients
1. Tracing the composition of primitive functions
2. Defining a vector-Jacobian product (VJP) operator for each primitive
3. Composing VJPs backward
![Page 71: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/71.jpg)
x
a = A(x)
![Page 72: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/72.jpg)
x
a = A(x)
@y
@a
![Page 73: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/73.jpg)
x
a = A(x)
@y
@a
@y
@x
= ?
![Page 74: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/74.jpg)
x
a = A(x)
@y
@a
@y
@x
=@y
@a
· @a@x
![Page 75: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/75.jpg)
x
a = A(x)
@y
@a
@y
@x
=@y
@a
·A0(x)
![Page 76: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/76.jpg)
x
a = A(x)
@y
@a
@y
@x
=@y
@a
·A0(x)
vector-Jacobian product
![Page 77: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/77.jpg)
class Node(object):
__slots__ = [’value’, ’recipe’, ’progenitors’, ’vspace’]
def __init__(self, value, recipe, progenitors):
self.value = value
self.recipe = recipe
self.progenitors = progenitors
self.vspace = vspace(value)
class primitive(object):
def __call__(self, *args, **kwargs):
argvals = list(args)
progenitors = set()
parents = []
for argnum, arg in enumerate(args):
if isnode(arg):
argvals[argnum] = arg.value
if argnum in self.zero_vjps: continue
parents.append((argnum, arg))
progenitors.update(arg.progenitors & active_progenitors)
result_value = self.fun(*argvals, **kwargs)
return new_node(result_value, (self, args, kwargs, parents), progenitors)
def forward_pass(fun, args, kwargs, argnum=0):
args = list(args)
start_node = new_progenitor(args[argnum])
args[argnum] = start_node
active_progenitors.add(start_node)
end_node = fun(*args, **kwargs)
active_progenitors.remove(start_node)
return start_node, end_node
anp.sinh.defvjp(lambda g, ans, vs, gvs, x: g * anp.cosh(x))
anp.cosh.defvjp(lambda g, ans, vs, gvs, x: g * anp.sinh(x))
anp.tanh.defvjp(lambda g, ans, vs, gvs, x: g / anp.cosh(x)**2)
anp.cross.defvjp(lambda g, ans, vs, gvs, a, b, axisa=-1, axisb=-1, axisc=-1, axis=None:
anp.cross(b, g, axisb, axisc, axisa, axis), argnum=0)
def grad_sort(g, ans, vs, gvs, x, axis=-1, kind=’quicksort’, order=None):
sort_perm = anp.argsort(x, axis, kind, order)
return unpermuter(g, sort_perm)
anp.sort.defvjp(grad_sort)
3
![Page 78: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/78.jpg)
Autograd’s ingredients
1. Tracing the composition of primitive functions
2. Defining a vector-Jacobian product (VJP) operator for each primitive
3. Composing VJPs backward
![Page 79: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/79.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
![Page 80: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/80.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
@y
@y= 1
![Page 81: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/81.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
@y
@c
@y
@y= 1
![Page 82: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/82.jpg)
@y
@b@y
@cb = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
@y
@y= 1
![Page 83: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/83.jpg)
@y
@a
@y
@b@y
@cb = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
@y
@y= 1
![Page 84: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/84.jpg)
@y
@x
@y
@a
@y
@b@y
@cb = B(a)
c = C(b)
end_node
y = D(c)
start_node
x
a = A(x)
@y
@y= 1
![Page 85: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/85.jpg)
higher-order autodiff just works:
the backward pass can itself be traced
![Page 86: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/86.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
@y
@y= 1
start_node
x
a = A(x)
![Page 87: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/87.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
@y
@c@y
@y= 1
start_node
x
a = A(x)
![Page 88: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/88.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
@y
@b
@y
@c@y
@y= 1
start_node
x
a = A(x)
![Page 89: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/89.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
@y
@a
@y
@b
@y
@c@y
@y= 1
start_node
x
a = A(x)
![Page 90: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/90.jpg)
b = B(a)
c = C(b)
end_node
y = D(c)
@y
@x
@y
@a
@y
@b
@y
@c@y
@y= 1
start_node
x
a = A(x)
![Page 91: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/91.jpg)
@y
@y= 1
b = B(a)
c = C(b)
y = D(c)
start_node
x
a = A(x)
end_node
![Page 92: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/92.jpg)
def backward_pass(g, end_node, start_node):
outgrads = {end_node : (g, False)}
assert_vspace_match(outgrads[end_node][0], end_node.vspace, None)
for node in toposort(end_node, start_node):
if node not in outgrads: continue
cur_outgrad = outgrads.pop(node)
function, args, kwargs, parents = node.recipe
for argnum, parent in parents:
outgrad = function.vjp(argnum, cur_outgrad[0], node,
parent.vspace, node.vspace, args, kwargs)
assert_vspace_match(outgrad, parent.vspace, function)
outgrads[parent] = add_outgrads(parent.vspace, outgrads.get(parent),
outgrad)
return cur_outgrad[0]
def grad(fun, argnum=0):
def gradfun(*args,**kwargs):
args = list(args)
args[argnum] = safe_type(args[argnum])
vjp, ans = make_vjp(fun, argnum)(*args, **kwargs)
return vjp(vspace(getval(ans)).ones())
return gradfun
def make_vjp(fun, argnum=0):
def vjp_maker(*args, **kwargs):
start_node, end_node = forward_pass(fun, args, kwargs, argnum)
if not isnode(end_node) or start_node not in end_node.progenitors:
warnings.warn("Output seems independent of input.")
def vjp(g): return start_node.vspace.zeros()
else:
def vjp(g): return backward_pass(g, end_node, start_node)
return vjp, end_node
return vjp_maker
4
![Page 93: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/93.jpg)
def backward_pass(g, end_node, start_node):
outgrads = {end_node : (g, False)}
assert_vspace_match(outgrads[end_node][0], end_node.vspace, None)
for node in toposort(end_node, start_node):
if node not in outgrads: continue
cur_outgrad = outgrads.pop(node)
function, args, kwargs, parents = node.recipe
for argnum, parent in parents:
outgrad = function.vjp(argnum, cur_outgrad[0], node,
parent.vspace, node.vspace, args, kwargs)
assert_vspace_match(outgrad, parent.vspace, function)
outgrads[parent] = add_outgrads(parent.vspace, outgrads.get(parent),
outgrad)
return cur_outgrad[0]
def grad(fun, argnum=0):
def gradfun(*args,**kwargs):
args = list(args)
args[argnum] = safe_type(args[argnum])
vjp, ans = make_vjp(fun, argnum)(*args, **kwargs)
return vjp(vspace(getval(ans)).ones())
return gradfun
def make_vjp(fun, argnum=0):
def vjp_maker(*args, **kwargs):
start_node, end_node = forward_pass(fun, args, kwargs, argnum)
if not isnode(end_node) or start_node not in end_node.progenitors:
warnings.warn("Output seems independent of input.")
def vjp(g): return start_node.vspace.zeros()
else:
def vjp(g): return backward_pass(g, end_node, start_node)
return vjp, end_node
return vjp_maker
4
![Page 94: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/94.jpg)
Autograd’s ingredients
1. Tracing the composition of primitive functions Node, primitive, forward_pass
2. Defining a vector-Jacobian product (VJP) operator for each primitive defvjp
3. Composing VJPs backward backward_pass, make_vjp, grad
![Page 95: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/95.jpg)
Tradeoffs in forward vs reverse
![Page 96: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/96.jpg)
Tradeoffs in forward vs reverse
• Reverse-mode requires tracing a program’s execution
• Memory cost scales like depth of program
• Checkpointing can trade off time and memory
![Page 97: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/97.jpg)
Tradeoffs in forward vs reverse
• Reverse-mode requires tracing a program’s execution
• Memory cost scales like depth of program
• Checkpointing can trade off time and memory
• Forward-mode evaluates a JVP with constant memory overhead
• But requires calls to form Jacobian of
• Autograd forward-mode by @j-towns: github.com/BB-UCL/autograd-forward
n F : Rn ! R
![Page 98: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/98.jpg)
Tradeoffs in forward vs reverse
• Reverse-mode requires tracing a program’s execution
• Memory cost scales like depth of program
• Checkpointing can trade off time and memory
• Forward-mode evaluates a JVP with constant memory overhead
• But requires calls to form Jacobian of
• Autograd forward-mode by @j-towns: github.com/BB-UCL/autograd-forward
• Can use both together (in autograd!) for mixed-mode
n F : Rn ! R
![Page 99: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/99.jpg)
Tutorial goals
1. Jacobians and the chain rule
• Forward and reverse accumulation
2. Autograd’s implementation
• Fully closed tracing autodiff in Python
3. Advanced autodiff techniques
• Checkpointing, forward from reverse, differentiating optima and fixed points
![Page 100: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/100.jpg)
Checkpointing
![Page 101: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/101.jpg)
Checkpointing
![Page 102: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/102.jpg)
Checkpointing
![Page 103: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/103.jpg)
Checkpointing
![Page 104: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/104.jpg)
Checkpointing
![Page 105: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/105.jpg)
Checkpointing
![Page 106: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/106.jpg)
Checkpointing
![Page 107: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/107.jpg)
Checkpointing
![Page 108: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/108.jpg)
Checkpointing
![Page 109: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/109.jpg)
Checkpointing
@y
@y= 1
![Page 110: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/110.jpg)
Checkpointing
@y
@y= 1
![Page 111: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/111.jpg)
Checkpointing
@y
@y= 1
![Page 112: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/112.jpg)
Checkpointing
@y
@y= 1
![Page 113: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/113.jpg)
Checkpointing
@y
@y= 1
@y
@c
![Page 114: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/114.jpg)
Checkpointing
@y
@c
![Page 115: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/115.jpg)
Checkpointing
@y
@c
![Page 116: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/116.jpg)
Checkpointing
@y
@c
@y
@b
![Page 117: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/117.jpg)
Checkpointing
@y
@b
![Page 118: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/118.jpg)
Checkpointing
![Page 119: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/119.jpg)
Checkpointing
from autograd import grad_named
def adam(gradient_fun, init_params, step_sizes,
num_iters=100, b1=0.9, b2=0.999, eps=10**-8):
m = np.zeros(len(x))
v = np.zeros(len(x))
x = init_params
for i in range(num_iters):
g = gradient_fun(x, i)
m = (1 - b1) * g + b1 * m
v = (1 - b2) * (g**2) + b2 * v
mhat = m / (1 - b1**(i + 1))
vhat = v / (1 - b2**(i + 1))
x = x - step_sizes[i] * mhat/(np.sqrt(vhat) + eps)
return x
hypergrad_fun = grad_named(adam, ’step_sizes’)
def make_jvp(fun, argnum=0):
def jvp_maker(*args, **kwargs):
vjp, y = make_vjp(fun, argnum)(*args, **kwargs)
vjp_vjp, _ = make_vjp(vjp)(vspace(getval(y)).zeros()) # dummy vals
return vjp_vjp # vjp_vjp is just jvp by linearity
return jvp_maker
import tensorflow as tf
def fwd_gradients(ys, xs, d_xs):
v = tf.placeholder(ys.dtype, shape=ys.get_shape()) # dummy variable
g = tf.gradients(ys, xs, grad_ys=v)
return tf.gradients(g, v, grad_ys=d_xs)
def checkpoint(fun):
"""Returns a checkpointed version of ‘fun‘, where intermediate values
computed during the forward pass of ‘fun‘ are discarded and then recomputed
for the backward pass. Useful to trade off time and memory."""
def wrapped_grad(argnum, g, ans, vs, gvs, args, kwargs):
return make_vjp(fun, argnum)(*args, **kwargs)[0](g)
wrapped = primitive(fun)
wrapped.vjp = wrapped_grad
return wrapped
1
![Page 120: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/120.jpg)
Getting forward from reverse
from autograd import grad_named
def adam(gradient_fun, init_params, step_sizes,
num_iters=100, b1=0.9, b2=0.999, eps=10**-8):
m = np.zeros(len(x))
v = np.zeros(len(x))
x = init_params
for i in range(num_iters):
g = gradient_fun(x, i)
m = (1 - b1) * g + b1 * m
v = (1 - b2) * (g**2) + b2 * v
mhat = m / (1 - b1**(i + 1))
vhat = v / (1 - b2**(i + 1))
x = x - step_sizes[i] * mhat/(np.sqrt(vhat) + eps)
return x
hypergrad_fun = grad_named(adam, ’step_sizes’)
def make_jvp(fun, argnum=0):
def jvp_maker(*args, **kwargs):
vjp, y = make_vjp(fun, argnum)(*args, **kwargs)
vjp_vjp, _ = make_vjp(vjp)(vspace(getval(y)).zeros()) # dummy valsreturn vjp_vjp # vjp_vjp is just jvp by linearity
return jvp_maker
import tensorflow as tf
def fwd_gradients(ys, xs, d_xs):
v = tf.placeholder(ys.dtype, shape=ys.get_shape()) # dummy variableg = tf.gradients(ys, xs, grad_ys=v)
return tf.gradients(g, v, grad_ys=d_xs)
1
![Page 121: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/121.jpg)
Getting forward from reverse
from autograd import grad_named
def adam(gradient_fun, init_params, step_sizes,
num_iters=100, b1=0.9, b2=0.999, eps=10**-8):
m = np.zeros(len(x))
v = np.zeros(len(x))
x = init_params
for i in range(num_iters):
g = gradient_fun(x, i)
m = (1 - b1) * g + b1 * m
v = (1 - b2) * (g**2) + b2 * v
mhat = m / (1 - b1**(i + 1))
vhat = v / (1 - b2**(i + 1))
x = x - step_sizes[i] * mhat/(np.sqrt(vhat) + eps)
return x
hypergrad_fun = grad_named(adam, ’step_sizes’)
def make_jvp(fun, argnum=0):
def jvp_maker(*args, **kwargs):
vjp, y = make_vjp(fun, argnum)(*args, **kwargs)
vjp_vjp, _ = make_vjp(vjp)(vspace(getval(y)).zeros()) # dummy valsreturn vjp_vjp # vjp_vjp is just jvp by linearity
return jvp_maker
import tensorflow as tf
def fwd_gradients(ys, xs, d_xs):
v = tf.placeholder(ys.dtype, shape=ys.get_shape()) # dummy variableg = tf.gradients(ys, xs, grad_ys=v)
return tf.gradients(g, v, grad_ys=d_xs)
1
x
y
![Page 122: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/122.jpg)
Getting forward from reverse
from autograd import grad_named
def adam(gradient_fun, init_params, step_sizes,
num_iters=100, b1=0.9, b2=0.999, eps=10**-8):
m = np.zeros(len(x))
v = np.zeros(len(x))
x = init_params
for i in range(num_iters):
g = gradient_fun(x, i)
m = (1 - b1) * g + b1 * m
v = (1 - b2) * (g**2) + b2 * v
mhat = m / (1 - b1**(i + 1))
vhat = v / (1 - b2**(i + 1))
x = x - step_sizes[i] * mhat/(np.sqrt(vhat) + eps)
return x
hypergrad_fun = grad_named(adam, ’step_sizes’)
def make_jvp(fun, argnum=0):
def jvp_maker(*args, **kwargs):
vjp, y = make_vjp(fun, argnum)(*args, **kwargs)
vjp_vjp, _ = make_vjp(vjp)(vspace(getval(y)).zeros()) # dummy valsreturn vjp_vjp # vjp_vjp is just jvp by linearity
return jvp_maker
import tensorflow as tf
def fwd_gradients(ys, xs, d_xs):
v = tf.placeholder(ys.dtype, shape=ys.get_shape()) # dummy variableg = tf.gradients(ys, xs, grad_ys=v)
return tf.gradients(g, v, grad_ys=d_xs)
1
x
y
vJTv
![Page 123: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/123.jpg)
Getting forward from reverse
x
y
vJTv
Juu
from autograd import grad_named
def adam(gradient_fun, init_params, step_sizes,
num_iters=100, b1=0.9, b2=0.999, eps=10**-8):
m = np.zeros(len(x))
v = np.zeros(len(x))
x = init_params
for i in range(num_iters):
g = gradient_fun(x, i)
m = (1 - b1) * g + b1 * m
v = (1 - b2) * (g**2) + b2 * v
mhat = m / (1 - b1**(i + 1))
vhat = v / (1 - b2**(i + 1))
x = x - step_sizes[i] * mhat/(np.sqrt(vhat) + eps)
return x
hypergrad_fun = grad_named(adam, ’step_sizes’)
def make_jvp(fun, argnum=0):
def jvp_maker(*args, **kwargs):
vjp, y = make_vjp(fun, argnum)(*args, **kwargs)
vjp_vjp, _ = make_vjp(vjp)(vspace(getval(y)).zeros()) # dummy valsreturn vjp_vjp # vjp_vjp is just jvp by linearity
return jvp_maker
import tensorflow as tf
def fwd_gradients(ys, xs, d_xs):
v = tf.placeholder(ys.dtype, shape=ys.get_shape()) # dummy variableg = tf.gradients(ys, xs, grad_ys=v)
return tf.gradients(g, v, grad_ys=d_xs)
1
![Page 124: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/124.jpg)
Getting forward from reverse
x
y
vJTv
Juu
from autograd import grad_named
def adam(gradient_fun, init_params, step_sizes,
num_iters=100, b1=0.9, b2=0.999, eps=10**-8):
m = np.zeros(len(x))
v = np.zeros(len(x))
x = init_params
for i in range(num_iters):
g = gradient_fun(x, i)
m = (1 - b1) * g + b1 * m
v = (1 - b2) * (g**2) + b2 * v
mhat = m / (1 - b1**(i + 1))
vhat = v / (1 - b2**(i + 1))
x = x - step_sizes[i] * mhat/(np.sqrt(vhat) + eps)
return x
hypergrad_fun = grad_named(adam, ’step_sizes’)
import tensorflow as tf
def fwd_gradients(ys, xs, d_xs):
v = tf.placeholder(ys.dtype, shape=ys.get_shape()) # dummy variableg = tf.gradients(ys, xs, grad_ys=v)
return tf.gradients(g, v, grad_ys=d_xs)
1
![Page 125: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/125.jpg)
Solutions, optima, and fixed points
![Page 126: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/126.jpg)
Solutions, optima, and fixed points
x
⇤(a) = argminx
f(a, x)
rx
⇤(a) = ?
![Page 127: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/127.jpg)
Solutions, optima, and fixed points
x
⇤(a) = argminx
f(a, x)
rx
⇤(a) = ?
solve g(a, x) = 0 for x
g(a, x⇤(a)) = 0
rx
⇤(a) = ?
![Page 128: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/128.jpg)
The implicit function theoremg(a, x⇤(a)) = 0
![Page 129: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/129.jpg)
The implicit function theoremg(a, x⇤(a)) = 0
ra
g(a, x⇤) +rx
⇤(a)rx
g(a, x⇤) = 0
![Page 130: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/130.jpg)
The implicit function theoremg(a, x⇤(a)) = 0
rx
⇤(a) = �ra
g(a, x⇤)rx
g(a, x⇤)�1
ra
g(a, x⇤) +rx
⇤(a)rx
g(a, x⇤) = 0
![Page 131: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/131.jpg)
The implicit function theorem
$differentiate solutions / optima solve linearized systems
g(a, x⇤(a)) = 0
rx
⇤(a) = �ra
g(a, x⇤)rx
g(a, x⇤)�1
ra
g(a, x⇤) +rx
⇤(a)rx
g(a, x⇤) = 0
![Page 132: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/132.jpg)
The implicit function theorem
$differentiate solutions / optima solve linearized systems
automatically generate a linear solver from the forward solver?
g(a, x⇤(a)) = 0
rx
⇤(a) = �ra
g(a, x⇤)rx
g(a, x⇤)�1
ra
g(a, x⇤) +rx
⇤(a)rx
g(a, x⇤) = 0
![Page 133: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/133.jpg)
Differentiating fixed points
![Page 134: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/134.jpg)
Differentiating fixed points
x
⇤(a) solves x = f(a, x) for x
![Page 135: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/135.jpg)
Differentiating fixed points
return make_vjp(grad(fun, argnum), argnum)(*args, **kwargs)[0]
return hvp_maker
def make_ggnvp(f, g=lambda x: 1./2*np.sum(x**2, axis=-1), f_argnum=0):
"""Builds a function for evaluating generalized-Gauss-Newton-vector products
at a point. Slightly more expensive than mixed-mode."""
def ggnvp_maker(*args, **kwargs):
f_vjp, f_x = make_vjp(f, f_argnum)(*args, **kwargs)
g_hvp, grad_g_x = make_vjp(grad(g))(f_x)
f_vjp_vjp, _ = make_vjp(f_vjp)(vspace(getval(grad_g_x)).zeros())
def ggnvp(v): return f_vjp(g_hvp(f_vjp_vjp(v)))
return ggnvp
return ggnvp_maker
from autograd import primitive
from functools import partial
@primitive
def fixed_point(f, a, init, converged, max_iter):
update = partial(f, a)
current, prev = update(init), init
for _ in xrange(max_iter):
if converged(current, prev): break
current, prev = update(current), current
else:
print ’fixed point iteration limit reached’
return current
def grad_fixed_point(g_fp, fp, vs, gvs, f, a, init, converged, max_iter):
vjp, _ = make_vjp(lambda args: f(*args))(make_tuple(a, fp))
g_a_flat, unflatten = flatten(vs.zeros())
for _ in xrange(max_iter):
if normsq(flatten(g)[0]) < 1e-6: break
term, g = vjp(g)
g_a_flat = g_a_flat + flatten(term)[0]
else:
print ’backward fixed point iteration limit reached’
return unflatten(g_a_flat)
fixed_point.defvjp(grad_fixed_point, 1)
fixed_point.defvjp_is_zero([2])
def normsq(flat_val): return np.dot(flat_val, flat_val)
2
x
⇤(a) solves x = f(a, x) for x
![Page 136: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/136.jpg)
Differentiating fixed points
xinit x1 x2 x3
…
xnxn�1xn�2
a
![Page 137: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/137.jpg)
Differentiating fixed points
xinit x1 x2 x3
…
xnxn�1xn�2
a
n ! 1
x
⇤ = xn = xn�1 = xn�2 = · · ·
![Page 138: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/138.jpg)
Differentiating fixed points
return make_vjp(grad(fun, argnum), argnum)(*args, **kwargs)[0]
return hvp_maker
def make_ggnvp(f, g=lambda x: 1./2*np.sum(x**2, axis=-1), f_argnum=0):
"""Builds a function for evaluating generalized-Gauss-Newton-vector products
at a point. Slightly more expensive than mixed-mode."""
def ggnvp_maker(*args, **kwargs):
f_vjp, f_x = make_vjp(f, f_argnum)(*args, **kwargs)
g_hvp, grad_g_x = make_vjp(grad(g))(f_x)
f_vjp_vjp, _ = make_vjp(f_vjp)(vspace(getval(grad_g_x)).zeros())
def ggnvp(v): return f_vjp(g_hvp(f_vjp_vjp(v)))
return ggnvp
return ggnvp_maker
import autograd.numpy as np
from functools import partial
@primitive
def fixed_point(f, a, init, converged, max_iter):
update = partial(f, a)
current, prev = update(init), init
for _ in xrange(max_iter):
if converged(current, prev): break
current, prev = update(current), current
else:
print ’fixed point iteration limit reached’
return current
from autograd import primitive, make_vjp, make_tuple
from autograd.util import flatten
def grad_fixed_point(g_fp, fp, vs, gvs, f, a, init, converged, max_iter):
vjp, _ = make_vjp(lambda args: f(*args))(make_tuple(a, fp))
g_a_flat, unflatten = flatten(vs.zeros())
for _ in xrange(max_iter):
if normsq(flatten(g)[0]) < 1e-6: break
term, g = vjp(g)
g_a_flat = g_a_flat + flatten(term)[0]
else:
print ’backward fixed point iteration limit reached’
return unflatten(g_a_flat)
fixed_point.defvjp(grad_fixed_point, 1)
fixed_point.defvjp_is_zero([2])
def normsq(flat_val): return np.dot(flat_val, flat_val)
2
xinit x1 x2 x3
…
xnxn�1xn�2
a
![Page 139: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/139.jpg)
Differentiating fixed points
• Inherits structure from forward iteration
• Forward is Newton reverse requires only one step
• Forward is block coordinate descent reverse is block Gauss-Seidel
• May be preferable to decouple forward and reverse
• Then choose any linear solver for implicit linearized system
• Can reuse dual variables from forward solver
))
![Page 140: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/140.jpg)
Second-order optimization
def make_hvp(fun, argnum=0):
"""Builds a function for evaluating the Hessian-vector product at a point,
which may be useful when evaluating many Hessian-vector products at the same
point while caching the results of the forward pass."""
def hvp_maker(*args, **kwargs):
return make_vjp(grad(fun, argnum), argnum)(*args, **kwargs)[0]
return hvp_maker
def make_ggnvp(f, g=lambda x: 1./2*np.sum(x**2, axis=-1), f_argnum=0):
"""Builds a function for evaluating generalized-Gauss-Newton-vector products
at a point. Slightly more expensive than mixed-mode."""
def ggnvp_maker(*args, **kwargs):
f_vjp, f_x = make_vjp(f, f_argnum)(*args, **kwargs)
g_hvp, grad_g_x = make_vjp(grad(g))(f_x)
f_vjp_vjp, _ = make_vjp(f_vjp)(vspace(getval(grad_g_x)).zeros())
def ggnvp(v): return f_vjp(g_hvp(f_vjp_vjp(v)))
return ggnvp
return ggnvp_maker
5
![Page 141: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/141.jpg)
Thanks!
![Page 142: Matthew J Johnson (mattjj@google.com) Deep Learning Summer ...duvenaud/talks/Johnson... · Matthew J Johnson (mattjj@google.com) Deep Learning Summer School Montreal 2017 brain Dougal](https://reader030.vdocuments.net/reader030/viewer/2022040204/5ec4716f7de7b60a1b6d78bf/html5/thumbnails/142.jpg)
References
• Dougal Maclaurin. Modeling, Inference and Optimization with Composable Differentiable Procedures. Harvard Physics Ph.D. Thesis, 2016. URL: https://dougalmaclaurin.com/phd-thesis.pdf
• github.com/hips/autograd