neural optimizer search with reinforcement learning2017/11/09 · 1(op 1);u 2(op 2)) irwan bello,...
TRANSCRIPT
Neural Optimizer Search with Reinforcement Learning
Irwan Bello1 Barret Zoph1 Vijay Vasudevan1 Quoc V. Le1
1Google Brain
ICLR, 2017/ Presenter: Anant Kharkar
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 1
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 2
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 3
/ 20
Motivation
Classical optimizers:
SGD
SGD w/Momentum
Adam
RMSProp
Combination of stochastic methods and heuristic approximations
Want to automate process of generating update rulesProduce equation, not just numerical updates
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 4
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 5
/ 20
Approach
RNN controller produces update rule string
Controller updated based on performance of optimizer
RL approach to training
How to generate update rules? First define space of update rules
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 6
/ 20
Previous Work
LSTM for numerical updates (Andrychowicz et al., 2016)
Equations are more transferrable
Genetic programming for update equations (Orchard & Wang, 2016)
Slow and needs heuristics
Neural Architecture Search (Zoph & Le, 2017) - seen earlier
RNN produces network architecture
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 7
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 8
/ 20
Domain-Specific Language
Each optimizer has computational graph - binary expression tree
Components:
2 operands
Unary function for each operand
Binary function to combine
∆w = λ ∗ b(u1(op1), u2(op2))
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 9
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 10
/ 20
Controller RNN
Trained with Adam
Objective function: J(θ) = E∆∼pθ(.)[R(∆)]Optimize reward (accuracy of target model)
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 11
/ 20
Search Space
Operands
Gradients: g , g2, g3, sign(g)Moving averages: m, v , y , sign(m)Weights: 10−4w , 10−3w , 10−2w , 10−1wADAM, RMSProp, 1, small noise
Unary Functions
x ,−x , ex , log |x |, clip, drop, signBinary Functions
x + y , x − y , x ∗ y , xy+ε
Optimizers tested on 3x3 ConvNet (32 filters) for 5 epochs
Favors early progress
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 12
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 13
/ 20
Optimizer Discovery
Recurring element:esign(g)∗sign(m) ∗ g
If sign(g) agrees with running average, scale e - g keeps decreasing
Else scale 1e - gradient direction changed
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 14
/ 20
Outline
1 IntroductionMotivationApproach
2 MethodsDomain-Specific LanguageController RNN
3 ExperimentsOptimizer DiscoveryTransfer Experiment
4 Summary
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 15
/ 20
Rosenbrock Function
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 16
/ 20
CIFAR-10
Wide ResNet (Zagoruyko & Komodakis, 2016) - 300 epochs
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 17
/ 20
CIFAR-10
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 18
/ 20
Neural Machine Translation
Completely different model & task: WMT 2014 English → German taskGNMT model - 8 LSTM layers
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 19
/ 20
Summary
RNN generates optimizer equations
Train RNN via RL setup
Optimizers tested on small ConvNet
New optimizers on par with state of the art
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 20
/ 20