robot learning, perception, and control optimization ...aaa/public/teaching/orf363_cos...robots in...
TRANSCRIPT
![Page 1: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/1.jpg)
Confidential + ProprietaryConfidential + Proprietary
Optimization Perspectives on Robot Learning, Perception, and Control
Vikas SindhwaniGoogle Brain, [email protected]
![Page 2: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/2.jpg)
Confidential + Proprietary
There have been stunning advances in Machine Learning and Perception recently.
![Page 3: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/3.jpg)
Confidential + Proprietary
Computer Vision: The ImageNet Challenge
![Page 4: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/4.jpg)
Confidential + Proprietary
Speech Recognition
5.1%~Human
![Page 5: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/5.jpg)
Confidential + Proprietary
Machine Translation Japanese2English
![Page 6: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/6.jpg)
Confidential + Proprietary
Mastering Go and Chess -- from scratch
● Mastering the Game of Go without Prior Knowledge● Mastering Chess and Shogi by Self-Play with a General RL algorithm
Train p(next-move|board), value(board) networks.
![Page 7: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/7.jpg)
Confidential + Proprietary
Whats behind this success?
● Tons of Data● Distributed Computation● Optimization!● Complex end-to-end pipelines
○ Still an art with lots of open questions.○ Mainly supervised learning.
![Page 8: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/8.jpg)
Confidential + Proprietary
What does this mean for the emerging world of Robotics?
![Page 9: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/9.jpg)
Confidential + Proprietary
Robots in Factories
● Body Shop○ 380 robots, 450 humans○ 6 hours per car (7500 spot welds)
● Paint shop○ 100 robots
● Assembly line ○ ? robots, mostly manual
● 1400 cars per day
BMW Manufacturing PlantSpartanburg, South Carolina
SENSE-PLAN-ACT paradigm
![Page 10: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/10.jpg)
Confidential + Proprietary
The Real World
PERCEPTIONLEARNING & ADAPTATIONCONTROL
![Page 11: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/11.jpg)
Confidential + Proprietary
Robotics at AlphabetSelf-driving Cars
Arm Farm
Self-flying Vehicles
● Early Rider Program in Phoenix, AZ● 3M+ miles, 1B+ miles in sim (2016)● 1.25M deaths (2014), 32K in US ● ~22K+ speeding, alcohol, distraction
![Page 12: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/12.jpg)
Confidential + Proprietary
From Smartphones to Robots: the Safety Challenge
Asimov’s “Three Laws of Robotics” Handbook of Robotics, 2058 (1942)● A robot may not injure a human being or, through inaction, allow a
human being to come to harm.
● A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
● A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
![Page 13: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/13.jpg)
Confidential + Proprietary
Preliminaries and Background
![Page 14: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/14.jpg)
Confidential + Proprietary
7 things about Deep Learning
1. Training and Test data drawn i.i.d from same distribution.
2. Optimization Problem
![Page 15: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/15.jpg)
Confidential + Proprietary
7 things about Deep Learning
3. “Shallow” vs “Deep” Predictors
4. Computation Graphs & Backprop
Tensors flowing (B x H x W x D)
![Page 16: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/16.jpg)
Confidential + Proprietary
7 things about Deep Learning
5. Stochastic Gradient Descent
6. Mature Autodiff Libraries
E.g. TensorFlow, pytorch, caffe, ...
Parallel comp; PS architecture
![Page 17: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/17.jpg)
Confidential + Proprietary
And the Last thing
7. “Alchemy”
- Choice of loss function- Choice of architectures: CNNs, RNNs, DNNs,...- Orchestration of the Optimization (e.g., learning rates)
7. It works!Surprisingly effective despite non-convexity, with millions
of parameters -- many local mins of similar quality.
![Page 18: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/18.jpg)
Confidential + Proprietary
Robots (+world) as Dynamical Systems
● States
● Controls
● Dynamics
![Page 19: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/19.jpg)
Confidential + Proprietary
Preliminaries: Robots (+world) as Dynamical Systems
RL lingo: “Agent”
![Page 20: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/20.jpg)
Confidential + Proprietary
Ingredients: Perception-Control LoopSteeringThrottleBreak
Policy
Dynamics/”Model”
Rewards/Costs
Controls/Actions
Observation
![Page 21: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/21.jpg)
Confidential + Proprietary
Where are the Learning Problems?SteeringThrottleBreak
Policy
Dynamics/”Model”
Rewards/Costs
Controls/Actions
Observation
![Page 22: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/22.jpg)
Confidential + Proprietary
Learning and Optimization
(from Sergey Levine’s slides.)
![Page 23: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/23.jpg)
Confidential + Proprietary
Many Sources of Complexity
● What is the current state? ○ State Estimation from high-dimensional observations from noisy sensors
● What is the Dynamics of the system?○ Known (games, factories)○ Stochastic○ Discontinuous -- problems involving contact○ May be completely unknown
● What costs/rewards should we use to elicit desired behavior?● What policy parameterization to use?● Exploitation-vs-Exploration?● How to optimize?
![Page 24: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/24.jpg)
Confidential + Proprietary
Optimization for Robot Learning, Perception, Control
Behavior Cloning & Imitation LearningWing, Arm Farm, Self-driving cars
Non-convex optimization
Nonlinear Optimal ControlIterative LQR, TROSS, MPCGuided Policy Search
Structured Nonlinear Programming
Learning policies in Simulation
Derivative Free Optimization
Safety and StabilityStability of Dynamical SystemsPolynomial Safety Shields
Polynomial Optimization & SDPs
Reinforcement LearningValue-based methodsPolicy Gradient Model-based RL
Dynamic Programming, Optimization & Sampling
Research Tools
![Page 25: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/25.jpg)
Confidential + Proprietary
Imitation Learning and Behavior Cloning
● Learning policies by mimicking human decisions and behaviors
Driving: large datasets
Manipulation (Daydream coffee study)
Doesnt work for all problems, e.g. bipedal locomotion
● Does not need dynamics-free -- and supervised learning works well!
![Page 26: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/26.jpg)
Confidential + Proprietary
Imitation Learning and Behavior Cloning
● Optimization Problem -- supervised learning
Demonstrations
Training
SGD-based Optimization
Distributed Training: Parameter-server architecture
![Page 27: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/27.jpg)
Confidential + Proprietary
Naive Supervised LearningRoss, Gordon and Bagnell, 2011 -- Cart racing experiments
![Page 28: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/28.jpg)
Confidential + Proprietary
Imitation Learning =/= Supervised Learning● Training/Test Distribution mismatch
○ With perception in the loop, encountered states become functions of the policy -- contrast with supervised learning (e.g. photo tagging).
● Dealing with cascading errors● DAGGER algorithm
![Page 29: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/29.jpg)
Confidential + Proprietary
NVIDIA Self-driving How do they deal with cascadling errors?
72 hours driving data, 250K parameters, 2 interventions every 10 mins
![Page 30: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/30.jpg)
Confidential + Proprietary
Safer Air Sensing on Self-flying Vehicles go/aerial_robotics
Pitot Tube (Henri Pitot, 1695-1771)
● Accurate sensing of relative motion wrt air is critical for safe & efficient control of UAVs on high-speed outdoor missions.
● Pitot tubes are airspeed sensors, prone to failure (Air France 2009) and hard toMaintain on small UAVs.
● Can we clone the Pitot tube using a neural net trained on hundreds of flight logs?
![Page 31: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/31.jpg)
Confidential + Proprietary
Safer Air Sensing on Self-flying Vehicles go/aerial_robotics
![Page 32: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/32.jpg)
Confidential + Proprietary
The Arm Farm: Self supervision
Training in 800K grasp attempts on 14 robots.
![Page 33: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/33.jpg)
Confidential + Proprietary
Tools 1: Supervised Learning in TensorFlow
● Colaboratory● TensorFlow
![Page 34: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/34.jpg)
Confidential + Proprietary
Optimal Control and Trajectory Optimization
states
controls
dynamics
subject to:
(example due to tassa@)
dynamics: stochastic, discontinuous, complex simulations, unknown
![Page 35: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/35.jpg)
Confidential + Proprietary
Obstacle Avoidance with Safety Shields
obstacle
obstacle
START
GOAL
● Nested Optimization● Nonlinear Programming (e.g.,
SQP) with rich structure.● Need for real-time optimization
(model predictive control)
![Page 36: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/36.jpg)
Confidential + Proprietary
Constructing Safety Shields in 3D Environments (RSS-2017)
convex, increasing degree increasing non-convexity
Polynomial Optimization ⇒ Semi-definite Programming relaxations.
![Page 37: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/37.jpg)
Confidential + Proprietary
Back to Optimal Control: Linear Quadratic Regulators
subject to:
![Page 38: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/38.jpg)
Confidential + Proprietary
LQR: Value Function
Define sequence of functions,
that are the minimum values achieved for the “tail subproblems” from a given state,
Notice,
![Page 39: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/39.jpg)
Confidential + Proprietary
LQR Dynamic Programming
Bellman Equation
Assume:
- cost at time t for taking action u at state z.- min cost-to-go for where you land at t+1 as a consequence
![Page 40: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/40.jpg)
Confidential + Proprietary
LQR Algebra
![Page 41: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/41.jpg)
Confidential + Proprietary
LQR solution
![Page 42: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/42.jpg)
Confidential + Proprietary
Extensions: Time-varying LQR, Infinite-horizon LQR
Steady State Solution for Infinite Horizon Problems: Algebraic Ricatti Equation
![Page 43: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/43.jpg)
Confidential + Proprietary
Tools: Cartpole Balancing on OpenAI Gym
● Power of Linearization
● Surprisingly large basins of attraction● Why does it fail outside that basin?
![Page 44: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/44.jpg)
Confidential + Proprietary
Other Gym Tasks
● Average reward per episode over N episodes● Total number of episodes
![Page 45: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/45.jpg)
Confidential + Proprietary
Iterative LQR/Differential Dynamic Programming
Nonlinear Optimal Control Problem:
Equivalent unconstrained problem:
Newton’s Method:
![Page 46: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/46.jpg)
Confidential + Proprietary
Iterative LQR/DDP
● Initialize● For k=0, ...until convergence
○ Simulate○ Linearize Dynamics around current trajectory:
○ Solve Time-Varying LQR Problem
○ Set
![Page 47: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/47.jpg)
Confidential + Proprietary
Examples With an extension of iLQR to handle constraints.
path lengthcontrol sparsityobstacle avoidance
Car parking with control limits.
In Mujoco: Torque control of a manipulator.
![Page 48: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/48.jpg)
Confidential + Proprietary
Model Predictive Control / Receding Horizon Control
● Due to errors in dynamics/ change in the environment, executing “open-loop” controls
may no longer be optimal.
● Replanning into the future at every time-step, defines a closed-loop policy:
![Page 49: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/49.jpg)
Confidential + Proprietary
Model-based Reinforcement Learning
● Collect a bunch of trajectories by executing e.g. random controls
● Fit a Dynamics Model
● Solve Optimal Control wrt current dynamics● Collect new trajectories● Refit new dynamics model● Repeat
Nagabandi et al, 2017
![Page 50: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/50.jpg)
Confidential + Proprietary
Guided Policy Search: Vision from Optimal Control
End-to-End Training of Deep Visuomotor Policies
Key idea: Use perfect-state optimal controllers to supervise learning of visual policies.
![Page 51: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/51.jpg)
Confidential + Proprietary
Derivative-Free Optimization
● Optimization problems where gradients are unavailable are pervasive.○ Complex simulators○ Legacy Code○ Inner optimization routines
● How can we still do gradient descent?
![Page 52: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/52.jpg)
Confidential + Proprietary
DFO for Policy Optimization and Optimal Control
● Policy Optimization
● Optimal Control: Need Jacobians for Linearization Step
return = 0.0x = initial_statefor t in range(0, T): action = reward, x =env.step(action) return = return + reward
![Page 53: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/53.jpg)
Confidential + Proprietary
DFO: Finite Differences
Taylor Approximation: For any perturbation direction p,
So,
Classic Finite Differences:
![Page 54: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/54.jpg)
Confidential + Proprietary
DFO: Linear Regression
● Choose a set of perturbation directions● Compute finite differences
● Solve least squares regression problem: can improve sample efficiency by using priors such as sparsity.
![Page 55: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/55.jpg)
Confidential + Proprietary
Training Neural Net Policies with DFOEvolution Strategies as a Scalable Alternative to RL
http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
![Page 56: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/56.jpg)
Confidential + Proprietary
Quadruped Locomotion with DFO: Sim2Reality Transfer
● Improved Finite-difference derivative approximations (ICRA-2018)○ Quadruped Locomotion with Speed Limits
S = A sin(t v)E = B sin(t v + phi_leg)
![Page 57: Robot Learning, Perception, and Control Optimization ...aaa/Public/Teaching/ORF363_COS...Robots in Factories Body Shop 380 robots, 450 humans 6 hours per car (7500 spot welds) Paint](https://reader033.vdocuments.net/reader033/viewer/2022042005/5e6fb0540e3b3b211a215cbd/html5/thumbnails/57.jpg)
Confidential + Proprietary
Summary
● Exciting advances in Machine Learning● Robotics is a great source of ML and Optimization Problems● Topics we discussed
○ Imitation Learning■ Supervised learning with care
○ Optimal Control■ Structured QPs, Nonlinear Programming
○ Model-based Reinforcement Learning■ Iterative Learning and Optimization
○ Derivative Free Optimization■ Common situation when working with simulators