what is the state of neural network pruning?02-14-30)-02-14-30-1413-what_is_the.pdfmeta-analysis of...

27
What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan Frankle John V. Guttag *equal contribution

Upload: others

Post on 27-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

What is the State of Neural Network Pruning?

Davis Blalock*Jose Javier Gonzalez*Jonathan FrankleJohn V. Guttag

*equal contr ibut ion

Page 2: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez 2

Overview

Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and prunedhundreds of networks in controlled conditions

• Some surprising findings…

ShrinkBench Open source library to facilitate development and standardized evaluation of neural network pruning methods

Page 3: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Part0:Background

3

Page 4: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

• Neural networks are often accurate but large• Pruning: Systematically removing parameters from a network

4

Neural Network Pruning

Page 5: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Typical Pruning Pipeline

• Scoring importance of parameters• Schedule of pruning, training /

finetuning

• Structure of induced sparsity• Finetuning details — optimizer,

duration, hyperparameters

5

Data

Model

Pruning Algorithm Finetuning Evaluation

Many design choices:

Page 6: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

• Goal: Increase efficiency of network as much as possible with minimal drop in quality

• Metrics• Quality = Accuracy• Efficiency = FLOPs,

compression, latency…

• Must use comparable tradeoffs

6

Evaluating Neural Network Pruning

Accuracy of Pruned Network

6

Page 7: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Part1:Meta-Analysis

7

Page 8: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

•We aggregated results across 81 pruning papers

•Mostly published in top venues

•Corpus closed under experimental comparison

8

Overview of Meta-Analysis

Venue #ofPapersarXivonly 22NeurIPS 16ICLR 11CVPR 9ICML 4ECCV 4BMVC 3IEEEAccess 2Other 10

Page 9: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

•Pruning works•Almost any heuristic improves efficiency with little performance drop•Many methods better than random pruning

•Don’t prune all layers uniformly

•Sparse models better for fixed # of parameters

9

Robust Findings

Page 10: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez 10

Better Pruning vs Better Architecture

Page 11: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Ideal Results Over Time

20152016201720182019

(Dataset,Architecture,Xmetric,Ymetric,Hyperparameters)→Curve

11

CompressionRatio

Page 12: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Ideal Results Over Time

20152016201720182019

VGG-16onImageNet AlexNetonImageNet ResNet-50onImageNet

12

CompressionRatio CompressionRatio CompressionRatio

TheoreticalSpeedup TheoreticalSpeedup TheoreticalSpeedup

Page 13: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Actual Results Over Time

20152016201720182019

VGG-16onImageNet AlexNetonImageNet ResNet-50onImageNet

13

CompressionRatio CompressionRatio CompressionRatio

TheoreticalSpeedup TheoreticalSpeedup TheoreticalSpeedup

Page 14: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

• Among 81 papers:• 49 datasets• 132 architectures• 195 (dataset, architecture) pairs

14

Quantifying the Problem

Dataset Architecture #ofPapersUsingPair

ImageNet VGG-16 22CIFAR-10 ResNet-56 14ImageNet ResNet-50 14ImageNet CaffeNet 11ImageNet AlexNet 9CIFAR-10 CIFAR-VGG 8ImageNet ResNet-34 6ImageNet ResNet-18 6CIFAR-10 ResNet-110 5CIFAR-10 PreResNet-164 4CIFAR-10 ResNet-32 4

All(dataset,architecture)pairsusedinatleast4papers

• Vicious cycle: extreme burden to compare to existing methods

Page 15: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

• Presence of comparisons:•Most papers compare to at most 1 other method • 40% papers have never been compared to• Pre-2010s methods almost completely ignored

•Reinventing the wheel:• Magnitude-based pruning: Janowsky (1989)• Gradient times magnitude: Mozer & Smolensky (1989)• “Reviving” pruned weights: Tresp et al. (1997)

15

Dearth of Reported Comparisons

Page 16: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

•Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper?A. 80%B. 20%C. 5xD. No reported compression ratio

16

Pop quiz!

Page 17: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

•Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper?A. 80%B. 20%C. 5xD. No reported compression ratio

17

Pop quiz!

Page 18: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

•According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet?A. 371 millionB. 500 millionC. 724 millionD. 1.5 billion

18

Pop quiz!

Page 19: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

•According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet?A. 371 millionB. 500 millionC. 724 millionD. 1.5 billion

19

Pop quiz!

Page 20: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Part2:ShrinkBench

20

Page 21: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Why ShrinkBench?

•Want to hold everything but pruning algorithm constant• Improved rigor, development time

21

Data

Model

Pruning Algorithm Finetuning Evaluation

Potential confounding factors

Page 22: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Masking API

Model (+ Data) Pruning Masks -2.1 4.6 0.8 -0.1

0.2 1.5 -4.9 2.3

-2.5 2.7 4.2 -1.1

-0.3 5.0 3.1 4.7

0 1 0 0

0 0 1 0

1 1 1 0

0 1 0 1

-2.1 4.6 0.8 -0.1

0.2 1.5 -4.9 2.3

-2.5 2.7 4.2 -1.1

-0.3 5.0 3.1 4.7

-2.1 4.6 0.8 -0.1

0.2 1.5 -4.9 2.3

-2.5 2.7 4.2 -1.1

-0.3 5.0 3.1 4.7

0 0 1 0

0 0 1 1

1 1 1 1

0 1 0 0

0 1 0 0

0 0 1 0

1 1 1 0

0 1 0 1

Accuracy Curve

• Lets algorithm return arbitrary masks for weight tensors• Standardizes all other aspects of training and evaluation

22

Page 23: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Crucial to Vary Amount of Pruning & Architecture

23

CIFAR-VGG ResNet-56

Page 24: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Compression and Speedup are not Interchangeable

24

ResNet-18 on ImageNet

Page 25: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Using Identical Initial Weights is essential

25

ResNet-56 on CIFAR-10

Page 26: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

• Pruning works• But not as well as improving architecture

•But we have no idea what methods work the best• Field suffers from extreme fragmentation in experimental setups

•We introduce a library/benchmark to address this• Faster progress in the future, interesting findings already

26

Conclusion

https://github.com/jjgo/shrinkbench

Page 27: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Blalock&Gonzalez

Questions?

27