researcher edition

83

Upload: others

Post on 27-Jun-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Researcher Edition
Page 2: Researcher Edition

Researcher Edition

Adam Paszke, Sam Gross, Soumith Chintala, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Alban Desmaison, Andreas Kopf, Edward Yang, Zach Devito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy & Team

Page 3: Researcher Edition

What is PyTorch?

Page 4: Researcher Edition

What is PyTorch?

Ndarray library with GPU support

automatic differentiation engine

gradient based optimization package

Deep Learning

Reinforcement LearningNumpy-alternative

Utilities (data loading, etc.)

Page 5: Researcher Edition

ndarray library•np.ndarray <-> torch.Tensor •200+ operations, similar to numpy •very fast acceleration on NVIDIA GPUs

Page 6: Researcher Edition

ndarray library

Numpy PyTorch

Page 7: Researcher Edition

ndarray / Tensor library

Page 8: Researcher Edition

ndarray / Tensor library

Page 9: Researcher Edition

ndarray / Tensor library

Page 10: Researcher Edition

ndarray / Tensor library

Page 11: Researcher Edition

NumPy bridge

Page 12: Researcher Edition

NumPy bridge

Zero memory-copy very efficient

Page 13: Researcher Edition

NumPy bridge

Page 14: Researcher Edition

NumPy bridge

Page 15: Researcher Edition

Seamless GPU Tensors

Page 16: Researcher Edition

automatic differentiation enginefor deep learning and reinforcement learning

Page 17: Researcher Edition

PyTorch Autogradfrom torch.autograd import Variable

Page 18: Researcher Edition

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

PyTorch Autograd

Page 19: Researcher Edition

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t())

MMMM

PyTorch Autograd

Page 20: Researcher Edition

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h

MMMM

PyTorch Autograd

Page 21: Researcher Edition

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h

Add

MMMM

PyTorch Autograd

Page 22: Researcher Edition

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

Add

MMMM

Tanh

PyTorch Autograd

Page 23: Researcher Edition

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

next_h.backward(torch.ones(1, 20))

Add

MMMM

Tanh

PyTorch Autograd

Page 24: Researcher Edition

Neural Networks

Page 25: Researcher Edition

Optimization packageSGD, Adagrad, RMSProp, LBFGS, etc.

Page 26: Researcher Edition

Research workflows Pain Points Core Philosophy

(of PyTorch)Upcoming features

how does PyTorch deal with them?

what do they look like in

the deep learning space

Page 27: Researcher Edition

Research Workflows

Page 28: Researcher Edition

Research Workflow

idea / theory

Implement modelsTraining & Validation

Add results to paper

design experiments pick datasets / environments

Page 29: Researcher Edition

Research Workflow

idea / theory

Implement modelsTraining & Validation

Add results to paper

design experiments pick datasets / environments

Literature Review

Page 30: Researcher Edition

Research Workflow

idea / theory

Implement modelsTraining & Validation

Add results to paper

design experiments pick datasets / environments

Literature Review

I'm watching you

Page 31: Researcher Edition

Work items in practice

Writing Dataset loaders

Building models Implementing Training loop

Checkpointing models

Interfacing with environments

Dealing with GPUsBuilding optimizers Building

Baselines

Page 32: Researcher Edition

Work items in practice

Writing Dataset loaders

Building models Implementing Training loop

Checkpointing models

Interfacing with environments

Dealing with GPUsBuilding optimizers Building

Baselines

Python + PyTorch - an environment to do all of this

Page 33: Researcher Edition

Writing Data Loaders

• every dataset is slightly differently formatted

Page 34: Researcher Edition

Writing Data Loaders

• every dataset is slightly differently formatted • have to be preprocessed and normalized differently

Page 35: Researcher Edition

• every dataset is slightly differently formatted • have to be preprocessed and normalized differently • need a multithreaded Data loader to feed GPUs fast enough

Writing Data Loaders

Page 36: Researcher Edition

PyTorch solution: • share data loaders across the community!

Writing Data Loaders

Page 37: Researcher Edition

PyTorch solution: • share data loaders across the community!

Writing Data Loaders

Page 38: Researcher Edition

PyTorch solution: • use regular Python to write Datasets:

leverage existing Python code

Writing Data Loaders

Page 39: Researcher Edition

PyTorch solution: • use regular Python to write Datasets:

leverage existing Python code Example: ParlAI

Writing Data Loaders

Page 40: Researcher Edition

PyTorch solution: •Code in practice

Writing Data Loaders

Page 41: Researcher Edition

PyTorch solution: •Code in practice

Core Philisophy Upcoming Features

Pain PointsResearch Workflows

Writing Data Loaders

Page 42: Researcher Edition

PyTorch solution: •Code in practice

Core Philisophy Upcoming Features

Pain PointsResearch Workflows

Writing Data Loaders

Page 43: Researcher Edition

Interfacing with environments

Cars Video gamesInternet

Page 44: Researcher Edition

Interfacing with environments

Cars Video gamesPretty much every environment provides a Python API

Page 45: Researcher Edition

Interfacing with environments

Cars Video gamesNatively interact with the environment directly

Page 46: Researcher Edition

Debugging• PyTorch is a Python extension

Page 47: Researcher Edition

Debugging• PyTorch is a Python extension • Use your favorite Python debugger

Page 48: Researcher Edition

Debugging• PyTorch is a Python extension • Use your favorite Python debugger

Page 49: Researcher Edition

Debugging• PyTorch is a Python extension • Use your favorite Python debugger • Use the most popular debugger:

Page 50: Researcher Edition

Debugging• PyTorch is a Python extension • Use your favorite Python debugger • Use the most popular debugger:

print(foo)

Page 51: Researcher Edition

Identifying bottlenecks• PyTorch is a Python extension • Use your favorite Python profiler

Page 52: Researcher Edition

Identifying bottlenecks• PyTorch is a Python extension • Use your favorite Python profiler: Line_Profiler

Page 53: Researcher Edition

Compilation Time• PyTorch is written for the impatient

Page 54: Researcher Edition

Compilation Time• PyTorch is written for the impatient • Absolutely no compilation time when writing your scripts whatsoever

Page 55: Researcher Edition

Compilation Time• PyTorch is written for the impatient • Absolutely no compilation time when writing your scripts whatsoever

• All core kernels are pre-compiled

Page 56: Researcher Edition

Ecosystem• Use the entire Python ecosystem at your will

Page 57: Researcher Edition

Ecosystem• Use the entire Python ecosystem at your will • Including SciPy, Scikit-Learn, etc.

Page 58: Researcher Edition

Ecosystem• Use the entire Python ecosystem at your will • Including SciPy, Scikit-Learn, etc.

Page 59: Researcher Edition

Ecosystem• Use the entire Python ecosystem at your will • Including SciPy, Scikit-Learn, etc.

Page 60: Researcher Edition

Ecosystem• A shared model-zoo:

Page 61: Researcher Edition

Ecosystem• A shared model-zoo:

Page 62: Researcher Edition

Linear style of programming• PyTorch is an imperative / eager computational toolkit

Page 63: Researcher Edition

Linear style of programming• PyTorch is an imperative / eager computational toolkit

Page 64: Researcher Edition

Linear style of programming• PyTorch is an imperative / eager computational toolkit

- Not unique to PyTorch

Page 65: Researcher Edition

Linear style of programming• PyTorch is an imperative / eager computational toolkit

- Not unique to PyTorch - Chainer, Dynet, MXNet-Imperative, TensorFlow-imperative, TensorFlow-eager, etc.

Page 66: Researcher Edition

Linear style of programming• PyTorch is an imperative / eager computational toolkit

- Not unique to PyTorch - Chainer, Dynet, MXNet-Imperative, TensorFlow-imperative, TensorFlow-eager, etc.

- Least overhead, designed with this in mind - 20 to 30 microseconds overhead per node creation - vs several milliseconds / seconds in other options

Page 67: Researcher Edition

Go Through an example

Page 68: Researcher Edition

The Philosophy of PyTorch

Page 69: Researcher Edition

The Philosophy of PyTorch

• Stay out of the way • Cater to the impatient • Promote linear code-flow • Full interop with the Python ecosystem • Be as fast as anything else

Page 70: Researcher Edition

Upcoming features

Page 71: Researcher Edition

Distributed PyTorch• MPI style distributed communication • Broadcast Tensors to other nodes • Reduce Tensors among nodes - for example: sum gradients among all nodes

Page 72: Researcher Edition

Higher order derivatives• grad(grad(grad(grad(grad(grad(torch.norm(x)))))))

Page 73: Researcher Edition

Higher order derivatives• grad(grad(grad(grad(grad(grad(torch.norm(x))))))) • Useful to implement crazy ideas

Page 74: Researcher Edition

Broadcasting and Advanced Indexing• Numpy-style broadcasting • Numpy-style indexing that covers advanced cases

Page 75: Researcher Edition

JIT Compilation•Possible in imperative frameworks •The key idea is deferred or lazy evaluation

- y = x + 2 - z = y * y - # nothing is executed yet, but the graph is being constructed - print(z) # now the entire graph is executed: z = (x+2) * (x+2)

•We can do just in time compilation on the graph before execution

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

Page 76: Researcher Edition

Lazy Evaluation

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

Add

MMMM

Tanh

Page 77: Researcher Edition

Lazy Evaluation

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

Add

MMMM

TanhGraph built but not actually executed

Page 78: Researcher Edition

Lazy Evaluation

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

from torch.autograd import Variable

x = Variable(torch.randn(1, 10)) prev_h = Variable(torch.randn(1, 20)) W_h = Variable(torch.randn(20, 20)) W_x = Variable(torch.randn(20, 10))

i2h = torch.mm(W_x, x.t()) h2h = torch.mm(W_h, prev_h.t()) next_h = i2h + h2h next_h = next_h.tanh()

print(next_h)

Add

MMMM

Tanh

Data accessed. Execute graph.

Page 79: Researcher Edition

Lazy Evaluation

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

•A little bit of time between building and executing graph - Use it to compile the graph just-in-time

Add

MMMM

Tanh

Page 80: Researcher Edition

JIT Compilation

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

•Fuse and optimize operations

Add

MMMM

Tanh

Fuse operations. Ex:

Page 81: Researcher Edition

JIT Compilation

Computation Graphs

Deep Learning Frameworks

Imperative Frameworks

JIT Compilation

•Cache subgraphs

Add

MMMM

Tanh

I've seen this part of the graph before let me pull up the compiled version from cache

Page 82: Researcher Edition

Compilation benefitsOut-of-order

execution

1 2 3 3 1 2

BatchNorm

ReLU

Conv2d

Kernel fusion Automatic work placement

Node 0 GPU 0

Node 1 GPU 1

Node 1 CPU

Node 0 GPU 1

Node 1 GPU 0

Node 0 GPU 0

Page 83: Researcher Edition

http://pytorch.org Released Jan 18th 100,000+ downloads 950+ community repos 8200+ user posts 245 contributors

With ❤ from