control of a quadrotor with reinforcement learningppoupart/teaching/cs885-spring...control of a...
TRANSCRIPT
![Page 1: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/1.jpg)
Control of a Quadrotor with Reinforcement Learning
Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter
Robotic Systems Lab, ETH Zurich
Presented by Nicole McNabb
University of Waterloo
June 27, 2018
1 / 15
![Page 2: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/2.jpg)
Overview
1 Introduction
2 The Method
3 Empirical Results
4 Summary and Future Work
2 / 15
![Page 3: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/3.jpg)
Introduction
What is a quadrotor?
Figure: Quadrotor [1]
3 / 15
![Page 4: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/4.jpg)
Introduction
What is a quadrotor?
Figure: Quadrotor [1]
High-level goal:
Train the quadrotor to performtasks with varying initializations
A policy optimization problem.
4 / 15
![Page 5: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/5.jpg)
Introduction
Related Approaches
Deep Deterministic PolicyGradient (DDPG)
Actor-critic architecture
Off-policy, model-free
Deterministic
Insufficient exploration
Very slow (if any)convergence
Trust Region Policy Optimization(TRPO)
Actor-critic architecture
Off-policy, model-free
Stochastic
Computationally intensive
Slow, unreliable convergence
5 / 15
![Page 6: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/6.jpg)
Introduction
A New Approach
Goal:A deterministic model with
Fast and stable convergence
Model-free training
Extensive exploration
Solution:A method combining the actor-critic architecture with an on-policydeterministic policy gradient algorithm and a new exploration strategy.
6 / 15
![Page 7: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/7.jpg)
The Method
Setup
Continuous State-Action Space
State Space
18-D states, model:
Orientation (or rotation)
Position
Linear velocity of system
Angular velocity of system
Action Space
4-D actions, dictate rotor thrust for each rotor
7 / 15
![Page 8: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/8.jpg)
The Method
Exploration
Figure: Exploration Strategy [2]
8 / 15
![Page 9: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/9.jpg)
The Method
Network Training
Figure: Value Network [2]
Value function training:Approximate with Monte-Carlosamples obtained from currenttrajectory
Figure: Policy Network [2]
Policy optimization:Same idea as TRPO, replacingKL-divergence with Mahalanobismetric
9 / 15
![Page 10: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/10.jpg)
The Method
Learning Algorithm
Algorithm 1 Policy optimization
1: Input: Initial value function approximation, initial policy2: for j = 1,2,. . . do3: Perform exploration, take action4: Compute MC estimates from current trajectory5: Do approximate value function update6: Do policy gradient update7: end for
10 / 15
![Page 11: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/11.jpg)
Empirical Results
Empirical Results
Training done in simulation
Testing on two main tasks done on a real quadrotor
11 / 15
![Page 12: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/12.jpg)
Summary and Future Work
Summary
Primary contributions:
A new deterministic, model-free neural network policy for training aquadrotor
Stable and reliable performance on hard tasks, even under harshinitial conditions
12 / 15
![Page 13: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/13.jpg)
Summary and Future Work
Future Research
Also compare model against PPO
Introducing more accurate model of the system into simulation
Train an RNN to adapt to model errors automatically
13 / 15
![Page 14: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/14.jpg)
Summary and Future Work
References
https://www.seeedstudio.com/Crazyflie-2.0-p-2103.html
Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter
Control of a Quadrotor with Reinforcement Learning
IEEE Robotics and Automation Letters, June 2017.
14 / 15
![Page 15: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and](https://reader030.vdocuments.net/reader030/viewer/2022040919/5e9653a6486b750e4349634e/html5/thumbnails/15.jpg)
Summary and Future Work
Questions?
15 / 15