neural optimizer search with reinforcement learning2017/11/09 · 1(op 1);u 2(op 2)) irwan bello,...

Neural Optimizer Search with Reinforcement Learning

Irwan Bello1 Barret Zoph1 Vijay Vasudevan1 Quoc V. Le1

1Google Brain

ICLR, 2017/ Presenter: Anant Kharkar

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 1

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary


/ 20

Outline




4 Summary


/ 20

Motivation

Classical optimizers:

SGD

SGD w/Momentum

Adam

RMSProp

Combination of stochastic methods and heuristic approximations

Want to automate process of generating update rulesProduce equation, not just numerical updates


/ 20

Outline




4 Summary


/ 20

Approach

RNN controller produces update rule string

Controller updated based on performance of optimizer

RL approach to training

How to generate update rules? First define space of update rules


/ 20

Previous Work

LSTM for numerical updates (Andrychowicz et al., 2016)

Equations are more transferrable

Genetic programming for update equations (Orchard & Wang, 2016)

Slow and needs heuristics

Neural Architecture Search (Zoph & Le, 2017) - seen earlier

RNN produces network architecture


/ 20

Outline




4 Summary


/ 20

Domain-Specific Language

Each optimizer has computational graph - binary expression tree

Components:

2 operands

Unary function for each operand

Binary function to combine

∆w = λ ∗ b(u1(op1), u2(op2))


/ 20

Outline




4 Summary


/ 20

Controller RNN

Trained with Adam

Objective function: J(θ) = E∆∼pθ(.)[R(∆)]Optimize reward (accuracy of target model)


/ 20

Search Space

Operands

Gradients: g , g2, g3, sign(g)Moving averages: m, v , y , sign(m)Weights: 10−4w , 10−3w , 10−2w , 10−1wADAM, RMSProp, 1, small noise

Unary Functions

x ,−x , ex , log |x |, clip, drop, signBinary Functions

x + y , x − y , x ∗ y , xy+ε

Optimizers tested on 3x3 ConvNet (32 filters) for 5 epochs

Favors early progress


/ 20

Outline




4 Summary


/ 20

Optimizer Discovery

Recurring element:esign(g)∗sign(m) ∗ g

If sign(g) agrees with running average, scale e - g keeps decreasing

Else scale 1e - gradient direction changed


/ 20

Outline




4 Summary


/ 20

Rosenbrock Function


/ 20

CIFAR-10

Wide ResNet (Zagoruyko & Komodakis, 2016) - 300 epochs


/ 20

CIFAR-10


/ 20

Neural Machine Translation

Completely different model & task: WMT 2014 English → German taskGNMT model - 8 LSTM layers


/ 20

Summary

RNN generates optimizer equations

Train RNN via RL setup

Optimizers tested on small ConvNet

New optimizers on par with state of the art


/ 20

neural optimizer search with reinforcement learning2017/11/09 · 1(op 1);u 2(op 2)) irwan bello,...

Documents