machine learning and robotics · machine learning & robotics tutorial = mission impossible! –...

84
Machine Learning and Robotics ICML 2011 tutorial Bellevue, 28th June 2011 Marc Toussaint FU Berlin

Upload: others

Post on 17-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Machine Learning

and Robotics

ICML 2011 tutorialBellevue, 28th June 2011

Marc ToussaintFU Berlin

Page 2: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• Please ask questions!

Machine Learning & Robotics tutorial = mission impossible!

– impossible to cover all topics → biased selection– impossible to cover all literature → sorry if I miss your work!

• no emphasis on SLAM & vision– more emphasis on contol, articulated robots, manipulation

• Goals of this tutorial:– Provide an overview on learning problems in Robotics

... where ML can/has contributed and mention literature– Encourage Machine Learners to think more about Robotics

... to understand inherent problems in robotics

... to consider formalizing the specific structure of robotic problems

2/82

Page 3: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

First, two little comments on Roboticists vs. MLers...

3/82

Page 4: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Sure!Where's the data?What data do

you need?

I'm tired of programming my robot!Can't you make it learn?

Shouldn'tyou know?

4/82

Page 5: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• Robotics is about interaction with the environment

– Collected data depends on actions– Goal of learning: enable behavior!

(sequential decision making, long horizon control)

5/82

Page 6: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• Implications:– Benchmarking a method involves running the system!– Different to ML in Computer Vision (Pascal challenge, Middlebury),

or other standard benchmarking in ML

– no pipeline: dataapplication

expert

ML

expertresults

– only few examples where pure “learning from a data set”is useful in robotics (e.g, calibration, system identification, SLAM)

• This slows learning research in Robotics!

6/82

Page 7: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

2011 IEEE International Conference on Robotics and Automation

ICRA 2011 Technical Program Tuesday May 10, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Interac Track T1

08:20-09:35Room 3B

Regular SessionsTuA101

Aerial Robotics I

08:20-09:35Room 3C

Regular SessionsTuA102

Agent-Based SystemsI

08:20-09:35Room 3D

Regular SessionsTuA103

AutonomousNavigation I

08:20-09:35Room 3E

Invited SessionsTuA104

ICRA RobotChallenge: Advancing

Research throughCompetitions

08:20-09:35Room 3G

Regular SessionsTuA105

Advanced RobotControl

08:20-09:35Room 5A

Regular SessionsTuA106

Behaviour-BasedSystems

08:20-09:35Room 5B

Regular SessionsTuA107

Biologically-InspiredRobots I

08:20-09:35Room 5C

Regular SessionsTuA108

Calibration andIdentification I

08:20-09:35Room 5D

Regular SessionsTuA109

Cellular and ModularRobots I

08:20-09:35Room 5E

Regular SessionsTuA110

Localization andMapping I

08:20-09:35Room 5F

Regular SessionsTuA111

Flexible Arms/Robots

08:20-09:35Room 5H

Regular SessionsTuA112

Distributed RobotSystems I

08:20-09:35Room 5I

Regular SessionsTuA113

Medical Robots andSystems I

08:20-09:35Room 5J

Regular SessionsTuA114

Visual Navigation I

08:20-09:35Hall

Poster SessionsTuA1-InteracInterac

Interactive Session I:Robotic Technology

10:05-11:20Room 3B

Regular SessionsTuA201

Aerial Robotics II

10:05-11:20Room 3C

Regular SessionsTuA202

Climbing Robots

10:05-11:20Room 3D

Regular SessionsTuA203

AutonomousNavigation II

10:05-11:20Room 3E

Regular SessionsTuA204

Human Detectionand Tracking I

10:05-11:20Room 3G

Regular SessionsTuA205

Teleoperation I

10:05-11:20Room 5A

Regular SessionsTuA206

Haptics and HapticInterfaces I

10:05-11:20Room 5B

Regular SessionsTuA207

Biologically-InspiredRobots II

10:05-11:20Room 5C

Regular SessionsTuA208

Calibration andIdentification II

10:05-11:20Room 5D

Regular SessionsTuA209

Cellular and ModularRobots II

10:05-11:20Room 5E

Regular SessionsTuA210

Localization andMapping II

10:05-11:20Room 5F

Regular SessionsTuA211

Direct/InverseDynamics

Formulation

10:05-11:20Room 5H

Regular SessionsTuA212

Force and TactileSensing

10:05-11:20Room 5I

Regular SessionsTuA213

Medical Robots andSystems II

10:05-11:20Room 5J

Regular SessionsTuA214

Visual Navigation II

10:05-11:20Room 3A

Regular SessionsTuA215

CommunicationSession I:

Architecture andSoftware for Robotic

Systems

13:40-14:55Room 3B

Regular SessionsTuP101

Personal and ServiceRobots

13:40-14:55Room 3C

Regular SessionsTuP102

Multi-Legged Robots

13:40-14:55Room 3D

Regular SessionsTuP103

Humanoid Robots I

13:40-14:55Room 3E

Regular SessionsTuP104

Human Detectionand Tracking II

13:40-14:55Room 3G

Regular SessionsTuP105

Teleoperation II

13:40-14:55Room 5A

Regular SessionsTuP106

Haptics and HapticInterfaces II

13:40-14:55Room 5B

Regular SessionsTuP107

Biologically-InspiredRobots III

13:40-14:55Room 5C

Regular SessionsTuP108

RehabilitationRobotics I

13:40-14:55Room 5D

Regular SessionsTuP109

Motion and PathPlanning I

13:40-14:55Room 5E

Regular SessionsTuP110

Localization andMapping III

13:40-14:55Room 5F

Regular SessionsTuP111

Marine andUnderwater Robotics

I

13:40-14:55Room 5H

Regular SessionsTuP112

Field and UnderwaterRobotics I

13:40-14:55Room 5I

Regular SessionsTuP113

Medical Robots andSystems III

13:40-14:55Room 5J

Regular SessionsTuP114

Visual Navigation III

13:40-14:55Room 3A

Regular SessionsTuP115

CommunicationSession II: Industrial

Manipulators

13:40-14:55Hall

Poster SessionsTuP1-InteracInterac

Interactive Session II:Systems, Control and

Automation

13:40-14:55Room T1

Poster SessionsTuP1-InteracT1

Buffer Session 1

15:25-16:55Room 3B

Regular SessionsTuP201

Human andMulti-RobotInteraction

15:25-16:55Room 3C

Regular SessionsTuP202

Legged Locomotion

15:25-16:55Room 3D

Regular SessionsTuP203

Humanoid Robots II

15:25-16:55Room 3E

Regular SessionsTuP204

Recognition I

15:25-16:55Room 3G

Regular SessionsTuP205

Teleoperation III

15:25-16:55Room 5A

Regular SessionsTuP206

Robust/AdaptiveControl of Robotic

Systems

15:25-16:55Room 5B

Regular SessionsTuP207

Space Robotics

15:25-16:55Room 5C

Regular SessionsTuP208

RehabilitationRobotics II

15:25-16:55Room 5D

Regular SessionsTuP209

Motion and PathPlanning II

15:25-16:55Room 5E

Regular SessionsTuP210

Localization andMapping IV

15:25-16:55Room 5F

Regular SessionsTuP211

Marine andUnderwater Robotics

II

15:25-16:55Room 5H

Regular SessionsTuP212

Field and UnderwaterRobotics II

15:25-16:55Room 5I

Regular SessionsTuP213

Medical Robots andSystems IV

15:25-16:55Room 5J

Regular SessionsTuP214

Visual Navigation IV

15:25-16:55Room 3A

Regular SessionsTuP215

CommunicationSession III:Automation

Technologies

ICRA 2011 Technical Program Wednesday May 11, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

08:20-09:35Room 3B

Regular SessionsWeA101

Aerial Robotics III

08:20-09:35Room 3C

Regular SessionsWeA102

Agent-Based Systems II

08:20-09:35Room 3D

Regular SessionsWeA103

Autonomous NavigationIII

08:20-09:35Room 3E

Regular SessionsWeA104

Range Sensing I

08:20-09:35Room 3G

Regular SessionsWeA105

Slam I

08:20-09:35Room 5A

Regular SessionsWeA106

Micro-Nano Robots I

08:20-09:35Room 5B

Regular SessionsWeA107

Biologically-InspiredRobots IV

08:20-09:35Room 5C

Regular SessionsWeA108

Parallel Robots I

08:20-09:35Room 5D

Regular SessionsWeA109

Novel Actuators I

08:20-09:35Room 5E

Regular SessionsWeA110

Mapping and NavigationI

08:20-09:35Room 5F

Regular SessionsWeA111

Grasping I

08:20-09:35Room 5H

Regular SessionsWeA112

Distributed RobotSystems II

08:20-09:35Room 5I

Regular SessionsWeA113

Medical Robots I

08:20-09:35Room 5J

Regular SessionsWeA114

Computer Vision forRobotics andAutomation I

08:20-09:40Room 3A

Video SessionsWeAV115

Video Session I: Aerialand Mobile Robotics

10:05-11:20Room 3B

Regular SessionsWeA201

Aerial Robotics IV

10:05-11:20Room 3C

Regular SessionsWeA202

Underactuated andTendon/WireMechanisms I

10:05-11:20Room 3D

Regular SessionsWeA203

Autonomous NavigationIV

10:05-11:20Room 3E

Regular SessionsWeA204

Range Sensing II

10:05-11:20Room 3G

Regular SessionsWeA205

Slam Ii

10:05-11:20Room 5A

Regular SessionsWeA206

Micro-Nano Robots II

10:05-11:20Room 5B

Regular SessionsWeA207

Biologically-InspiredRobots V

10:05-11:20Room 5C

Regular SessionsWeA208

Parallel Robots II

10:05-11:20Room 5D

Regular SessionsWeA209

Novel Actuators II

10:05-11:20Room 5E

Regular SessionsWeA210

Mapping and NavigationII

10:05-11:20Room 5F

Regular SessionsWeA211

Grasping II

10:05-11:20Room 5H

Regular SessionsWeA212

Cooperative Control forMultiple Robots

10:05-11:20Room 5I

Regular SessionsWeA213

Medical Robots II

10:05-11:20Room 5J

Regular SessionsWeA214

Computer Vision forRobotics andAutomation II

10:05-11:25Room 3A

Video SessionsWeAV215

Video Session II:Humanoid and Service

Robotics

13:40-14:55Room 3B

Regular SessionsWeP101

Collision Avoidance

13:40-14:55Room 3C

Regular SessionsWeP102

Underactuated andTendon/Wire

Mechanisms II

13:40-14:55Room 3D

Regular SessionsWeP103

Humanoid Robots III

13:40-14:55Room 3E

Regular SessionsWeP104

Range Sensing III

13:40-14:55Room 3G

Regular SessionsWeP105

Slam Iii

13:40-14:55Room 5A

Regular SessionsWeP106

Micro-Nano Robots III

13:40-14:55Room 5B

Regular SessionsWeP107

Biologically-InspiredRobots VI

13:40-14:55Room 5C

Regular SessionsWeP108

Rehabilitation RoboticsIII

13:40-14:55Room 5D

Regular SessionsWeP109

Motion and PathPlanning III

13:40-14:55Room 5E

Regular SessionsWeP110

Industrial Automation

13:40-14:55Room 5F

Regular SessionsWeP111

Physical Human-RobotInteraction I

13:40-14:55Room 5H

Regular SessionsWeP112

Learning and AdaptiveSystems I

13:40-14:55Room 5I

Regular SessionsWeP113

Networked Robots

13:40-14:55Room 5J

Regular SessionsWeP114

Computer Vision forRobotics andAutomation III

13:40-14:55Room 3A

Regular SessionsWeP115

Communication SessionIV: Robotic Applications

I

15:25-16:55Room 3B

Regular SessionsWeP201

Cognitive Human-RobotInteraction

15:25-16:55Room 3C

Regular SessionsWeP202

Kinematics of Serial andParallel Robots

15:25-16:55Room 3D

Regular SessionsWeP203

Humanoid Robots IV

15:25-16:55Room 3E

Regular SessionsWeP204

Recognition II

15:25-16:55Room 3G

Regular SessionsWeP205

Slam Iv

15:25-16:55Room 5A

Regular SessionsWeP206

Micro-Nano Robots IV

15:25-16:55Room 5B

Regular SessionsWeP207

Micro-Nano Robots andApplications to Life

Science

15:25-16:55Room 5C

Regular SessionsWeP208

Rehabilitation RoboticsIV

15:25-16:55Room 5D

Regular SessionsWeP209

Motion and PathPlanning IV

15:25-16:55Room 5E

Regular SessionsWeP210

Surveillance, Search andRescue Robotics

15:25-16:55Room 5F

Regular SessionsWeP211

Physical Human-RobotInteraction II

15:25-16:55Room 5H

Regular SessionsWeP212

Learning and AdaptiveSystems II

15:25-16:55Room 5I

Regular SessionsWeP213

Factory Automation

15:25-16:55Room 5J

Regular SessionsWeP214

Visual Tracking

15:25-16:55Room 3A

Regular SessionsWeP215

Communication SessionV: Robotic Applications

II

ICRA 2011 Technical Program Thursday May 12, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14

08:20-09:35Room 3B

Regular Sessions ThA101

Aerial Robotics V

08:20-09:35Room 3C

Regular Sessions ThA102

Planning, Scheduling andCoordination

08:20-09:35Room 3D

Regular Sessions ThA103

Path Planning for MultipleRobots I

08:20-09:35Room 3E

Regular Sessions ThA104

Sensor Fusion I

08:20-09:35Room 3G

Regular Sessions ThA105

Manipulation Planning I

08:20-09:35Room 5A

Regular Sessions ThA106

Mechanism Design ofMobile Robots I

08:20-09:35Room 5B

Regular Sessions ThA107

Variable StiffnessActuators I

08:20-09:35Room 5C

Regular Sessions ThA108

Rehabilitation Robotics V

08:20-09:35Room 5D

Regular Sessions ThA109

Redundant Robots

08:20-09:35Room 5E

Regular Sessions ThA110

Localization I

08:20-09:35Room 5F

Regular Sessions ThA111

Grasping, Tactile Sensingand Force Control

08:20-09:35Room 5H

Regular Sessions ThA112

Distributed Robot SystemsIII

08:20-09:35Room 5I

Regular Sessions ThA113

Medical Robots andSystems V

08:20-09:35Room 5J

Regular Sessions ThA114

Computer Vision I: Model

10:05-11:20Room 3B

Regular Sessions ThA201

Nonholonomic MotionPlanning

10:05-11:20Room 3C

Regular Sessions ThA202

Robot Design

10:05-11:20Room 3D

Regular Sessions ThA203

Path Planning for MultipleRobots II

10:05-11:20Room 3E

Regular Sessions ThA204

Sensor Fusion II

10:05-11:20Room 3G

Regular Sessions ThA205

Manipulation Planning II

10:05-11:20Room 5A

Regular Sessions ThA206

Mechanism Design ofMobile Robots II

10:05-11:20Room 5B

Regular Sessions ThA207

Variable StiffnessActuators II

10:05-11:20Room 5C

Regular Sessions ThA208

Soft Material Robotics

10:05-11:20Room 5D

Regular Sessions ThA209

New Sensing andMechanism for Robots

10:05-11:20Room 5E

Regular Sessions ThA210

Localization II

10:05-11:20Room 5F

Regular Sessions ThA211

Vision: 3D

10:05-11:20Room 5H

Regular Sessions ThA212

Dexterous Manipulation

10:05-11:20Room 5I

Regular Sessions ThA213

Medical Robots andSystems VI

10:05-11:20Room 5J

Regular Sessions ThA214

Computer Vision II:Recognition

13:40-14:55Room 3B

Regular Sessions ThP101

Multifingered Hands

13:40-14:55Room 3C

Regular Sessions ThP102

Distributed and NetworkedRobot Systems

13:40-14:55Room 3D

Regular Sessions ThP103

Robot Safety

13:40-14:55Room 3E

Regular Sessions ThP104

Reactive andSensor-Based Planning

13:40-14:55Room 3G

Regular Sessions ThP105

Manipulation Planning III

13:40-14:55Room 5A

Regular Sessions ThP106

Mechanism Design ofMobile Robots III

13:40-14:55Room 5B

Regular Sessions ThP107

Variable Stiffness andImpedance Control I

13:40-14:55Room 5C

Regular Sessions ThP108

Robotic Software,Middleware andProgramming

Environments I

13:40-14:55Room 5D

Regular Sessions ThP109

Motion and Path PlanningV

13:40-14:55Room 5E

Regular Sessions ThP110

Localization III

13:40-14:55Room 5F

Regular Sessions ThP111

Visual Servoing I

13:40-14:55Room 5H

Regular Sessions ThP112

Learning and AdaptiveSystems III

13:40-14:55Room 5I

Regular Sessions ThP113

Medical Robots andSystems VII

13:40-14:55Room 5J

Regular Sessions ThP114

Computer Vision IIINavigation

15:25-16:55Room 3B

Regular Sessions ThP201

Motion Control ofManipulators

15:25-16:55Room 3C

Regular Sessions ThP202

Robot Design forAdvanced Applications

15:25-16:55Room 3D

Regular Sessions ThP203

Sensor Networks

15:25-16:55Room 3E

Regular Sessions ThP204

Tactile Sensing andMultifingered Grasping

15:25-16:55Room 3G

Regular Sessions ThP205

Vision for ObjectRecognition

15:25-16:55Room 5A

Regular Sessions ThP206

Micro and NanoscaleAutomation

15:25-16:55Room 5B

Regular Sessions ThP207

Variable Stiffness andImpedance Control II

15:25-16:55Room 5C

Regular Sessions ThP208

Robotic Software,Middleware andProgramming

Environments II

15:25-16:55Room 5D

Regular Sessions ThP209

Motion and Path PlanningVI

15:25-16:55Room 5E

Regular Sessions ThP210

Wheeled Robots

15:25-16:55Room 5F

Regular Sessions ThP211

Visual Servoing II

15:25-16:55Room 5H

Regular Sessions ThP212

Learning and AdaptiveSystems IV

15:25-16:55Room 5I

Regular Sessions ThP213

Medical Robots andSystems VIII

15:25-16:55Room 5J

Regular Sessions ThP214

Omnidirectional Vision forRobotics

Sessions with “learning papers” (keyword in title or abstract): Learning & AdaptingSystems, Recognition, Motion & Path planning, Grasping, Adaptive Control 7/82

Page 8: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• The field of robotics is huge!

If Robotics was a huge Pizza...ML = chillies; rest = “infrastructure”

• bits and pieces of learning here and thereML on the system level?

• Implications:– good: many possibilities for adaptivity and learning– the integrated system is hard to “formalize”/become subject to ML

8/82

Page 9: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

OutlinePart I: Learning problems in Robotics – the RL view

• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics

• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL

• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning

Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion

9/82

Page 10: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Part I:

Learning problems in Robotics – the RL view

10/82

Page 11: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• “The Reinforcement Learning view”:Viewed in the framework of Markov Decision Processes / StochasticControl

→ Unifying notation, accessible to MLers

• BUT:solving the Mountain Car problem 6= solving Robotics

(→ Discussion)

11/82

Page 12: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Markov Decision Process & Optimal Control

state variable

control variable

rewards/costs

u0 u1 u2 uT

xTx2x1x0

r0 r1 r2 rT

P (x0:T , u0:T , r0:T ;π) =

P (x0)P (u0|x0;π)P (r0|u0, x0)∏Tt=1 P (xt|ut-1, xt-1)P (ut|xt;π)P (rt|ut, xt)

State xt

Control/Action ut

Process P (x0) and P (xt+1 |ut, xt)Reward/Cost rt(xt, ut) or ct(xt, ut)

Control policy π(ut |xt) or ut = π(xt) (deterministic)

12/82

Page 13: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Markov Decision Process & Optimal Control

• typical MDP case:– infinite time horizon: T →∞– stationary world & rewards: P (x′ |u, x) and r(x, u) indep. of t– discounting: total return

∑Tt=0 γ

tr(xt, ut)

→ stationary policy π(u|x) indep. of t

• typical Stochastic Optimal Control case:– finite time horizon T– costs depend on absolut time ct(xt, ut)– total costs non-discounted C(x0:T , u0:T ) =

∑Tt=0 ct(xt, ut)

→ non-stationary control policy πt(ut|xt)

13/82

Page 14: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

One way to introduce to Robotics basics is to consider three basiccontrol problems:

• 1-step kinematic process

• 1-step dynamic process

• T -step dynamic process

14/82

Page 15: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Kinematics

• We consider a 1-step kinematic control problem, where we know thestate at time t and optimize controls ut to minimize costs at t+1.

State xt ∈ Rn = joint angles

Control ut ∈ Rn = command change in joint angles

Process (deterministic): xt+1 = xt + ut

Costs in the 1-step kinematic case is of the formc(xt+1, ut) = ||ut||2W + ||φ(xt+1)− y∗||2/σ2

15/82

Page 16: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Kinematicsc(xt+1, u) = ||u||2W + ||φ(xt+1)− y∗||2/σ2

• The word kinematics refers to φ(q):– defines to-be-controlled state features (task variables)

LeftHandposition

RightHandposition

– φ is determined by the geometry (“kinematics”) of the robot– Roboticists know how to compute φ(q) and J = ∂

∂qφ(q)

– y∗ says what the targets are16/82

Page 17: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Kinematics

• Using a local linearization of φ, this has a simple solution:

c(xt+1, u) = ||u||2W + ||φ(xt+1)− y∗||2/σ2 , xt+1 = xt + u

c(u) = ||u||2W + ||φ(xt + u)− y∗||2/σ2

= ||u||2W + ||φ(xt)− y∗ + Ju||2/σ2 , J := ∂∂qφ

argminu

c(u) = (J>J + σ2W )-1J> (y∗ − φ(xt))

This choice of control is called inverse kinematics

MLers note: The problem and solution is identical to that of Ridge regression

17/82

Page 18: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Kinematics

[demo: roboticsCourse2 ./x.exe -mode 2 3 4]

What can ML contribute?

18/82

Page 19: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Learning the kinematics

• If the kinematics φ are unknown, learn them from data!

Literature:

Todorov: Probabilistic inference of multi-joint movements, skeletal parameters andmarker attachments from diverse sensor data. (IEEE Transactions on BiomedicalEngineering 2007)

Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manipulator usingData-Efficient Reinforcement Learning (RSS 2011)

19/82

Page 20: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Todorov: Probabilistic inference of multi-joint movements, skeletal parameters andmarker attachments from diverse sensor data. (IEEE Transactions on BiomedicalEngineering 2007)

Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manipulator usingData-Efficient Reinforcement Learning (RSS 2011)

20/82

Page 21: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Dynamics

• We consider a 1-step dynamic control problem, where we know thestate at time t = 0 and optimize the controls ut to minimize costs att = 1.

State xt = (qt, qt) ∈ R2n joint angles & velocities

Control ut ∈ Rn = torques (angular forces) applied in each joint

Process (deterministic): xt+1 = (I +A) xt +B ut + aA and B are local linearization of the system dynamics, see appendix

Costs in the 1-step dynamic case is of the formc(xt+1, u) = ||u||2H + ||φ(xt+1)− y∗||2/s2

y∗ determines (e.g.) desired accelerations in state features φ

21/82

Page 22: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Dynamics

• Using the local linearizations of the process and the kinematics φ, thishas a simple solution:

c(xt+1, u) = ||u||H2 + ||φ(xt+1)− y∗||2/s2

= ||u||H2 + ||φ(xt) + J(Axu +Bu+ a)− y∗||2/s2

argminu

c(u) = (B>J>JB + σ2W )-1B>J> [y∗ − φ(xt)− J(Axt + a)]

This is optimal 1-step dynamic control

(It includes so-called optimal operational space control as special case, see also Peterset al: A unifying framework for the control of robotics systems (IROS 2005))

22/82

Page 23: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Dynamics

[demo: roboticsCourse3 ./x.exe -control 0 1]

What can ML contribute?

23/82

Page 24: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Learning the dynamics

• If the dynamics x = f(x, u) are unknown, learn them from data!

Literature:

Moore: Acquisition of Dynamic Control Knowledge for a Robotic Manipulator (ICML1990)

Atkeson, Moore & Schaal: Locally weighted learning for control. Artificial IntelligenceReview, 1997.

Schaal, Atkeson & Vijayakumar: Real-Time Robot Learning with Locally WeightedStatistical Learning. (ICRA 2000)

Vijayakumar et al: Statistical learning for humanoid robots, Autonomous Robots, 2002.

24/82

Page 25: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

(Schaal, Atkeson, Vijayakumar)

• Use a simple regression method (locally weighted Linear Regression)to estimate x = f(x, u)

25/82

Page 26: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Beyond 1-step horizon: Motion Planning & Skills

• The 1-step processes with quadratic costs can, with local linearization,be solved analytically.– Basic role of learning: learning the model (kinematic or dynamics)

• Planning (multi-step processes) cannot be solved in that way – weneed the notion of a value function or cost-to-go function.

26/82

Page 27: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Stochastic Optimal Control

• In the multi-step case of horizon T , the cost function is of the form

C(x0:T , u0:T ) =

T∑t=0

ct(xt, ut)

• Define the optimal value function (aka cost-to-go function)

Vt(xt) = minut:T

∑Tk=t 〈ck(xk, uk)〉xk|ut:k,xt

= minut

[ct(xt, ut) + min

ut+1:T

∑Tk=t-1 〈ck(xk, uk)〉xk|ut:k,xt

]= min

ut

[ct(xt, ut) +

∫P (xt+1|ut, xt) Vt+1(xt+1) dxt+1

]u∗t = argmin

ut

[ct(xt, ut) +

∫P (xt+1|ut, xt) Vt+1(xt+1) dxt+1

]Bellman optimality principle

• Dynamic Programming: Compute Vt(x) backward, starting withVT+1(x)=0

27/82

Page 28: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Stochastic Optimal Control

[demo: git/mlr/share/robot/10-optimizationBenchmarks]

(Here, we optimized the control using probabilistic inference – see later.)

What can ML contribute?

28/82

Page 29: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Five approaches to learning optimal control

Po

lic

y S

ea

rch

Mo

de

l−fr

ee

RL

Mo

de

l−b

as

ed

RL

Inv

ers

e R

L

Imit

ati

on

Le

arn

ing

learn value fct.V (x)

policyπ(x)

optimize policy learn latent costsc(x)

dynamic prog.

π(x)policy

learn policyπ(x)

policy

learn model

πt(x)

P (x′|u, x)c(x, u)

dynamic prog.Vt(x) Vt

π(x)

demonstration dataexperience dataD = {(xt, ut, ct)}Tt=0 D = {(x0:T , u0:T )

d}nd=1

29/82

Page 30: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

OutlinePart I: Learning problems in Robotics – the RL view

• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics

• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL

• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning

Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion

30/82

Page 31: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

1. Model learning (model-based RL)

D = {(xt, ut, ct)}Tt=0learn→ P (x′|u, x)

DP→ Vt(x) → πt(x)

Literature:

Exactly as for Learning Dynamics & Kinematics in 1-step case

31/82

Page 32: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

2. Model-free RL

D = {(xt, ut, rt)}Tt=0learn→ Vt(x) → πt(x)

• Use ML to directly estimate the value function V (x)

Literature:

Gordon: Stable function approximation in dynamic programming. DTIC Document, 1995.

Lagoudakis & Parr: Least-Squares Policy Iteration (JMLR 2003).

Rasmussen & Kuss: Gaussian Processes in Reinforcement Learning (NIPS 2004)

Engel, Mannor & Meir: Reinforcement Learning with Gaussian Processes. (ICML 2005)

Mahadevan & Maggioni: Proto-Value Functions: A Laplacian Framework for LearningRepresentation and Control in Markov Decision Processes (JMLR 2007)

32/82

Page 33: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

LSPI: Least Squares Policy IterationLagoudakis & Parr: Least-Squares Policy Iteration (JMLR 2003).

(I’ll explain it here in terms of the value function instead of Q-function.)

• The value function fulfils

V (x) = r(x, π(x)) + γ∑x′

P (x′ |π(x), x) V (x′)

• If we have n data points D = {(xt, ut, rt, xt+1)}nt=1, we require that thisequation holds (approximatly) for these n data points:

∀t : V (xt) = rt + γV (xt+1)

• Written in vector notation: V = R+ gV with N -dim data vectorsV ,R, V

• Written as optmization: minimize the Bellman residual error

L(V ) =

T∑t=1

[V (xt)− rt − γV (xt+1)]2 = ||R− V + γV ||2

33/82

Page 34: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

LSPI: Least Squares Policy Iteration• Approximate V (x) as linear in k features φj :

V (x) =∑kj=1 φj(x)βj = φ(x)>β

Then

V = φβ , φtj = φj(xt) ∈ RT×k

T = φβ , φtj = φt(xt+1) ∈ RT×k

• the loss becomes

L(β) = ||R− (φ− γφ)>β||2

→ has analytic solution!

• Like regression, butsquared error in supervised learning → Bellman residual error

details on simplifications made: see Appendix34/82

Page 35: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

LSPI: Riding a bike

from Lagoudakis & Parr (JMLR 2003)35/82

Page 36: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

LSPI: Riding a bike

from Lagoudakis & Parr (JMLR 2003)36/82

Page 37: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

3. Policy Search

D = {(xt, ut, ct)}Tt=0optimize→ πt(x)

• Use ML to directly optimize the policy π(u|x) based on data

Literature:

Peters & Schaal: Reinforcement Learning of Motor Skills with Policy Gradients (NeuralNetworks 2008)

Kober & Peters: Policy Search for Motor Primitives in Robotics (NIPS 2008)

Moriarty, Schultz & Grefenstette: Evolutionary algorithms for reinforcement learning(JAIR 1999)

37/82

Page 38: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Policy Search using policy gradients

• In continuous state/action case, represent the policy as linear inarbitrary state features:

π(x) =

k∑j=1

φj(x)βj = φ(x)>β (deterministic)

π(u |x) = N(u |φ(x)>β,Σ) (stochastic)

with k features φj .

• Given Data D = {(xt, ut, rt)}Tt=0 we want to estimate

∂V (β)

∂β

38/82

Page 39: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Policy Search using policy gradients• One approach is called REINFORCE:

∂V (β)

∂β= ∂

∂β

∫P (ξ|β) R(ξ) dξ =

∫P (ξ|β) ∂

∂β logP (ξ|β)R(ξ)dξ

= Eξ|β{ ∂∂β logP (ξ|β)R(ξ)}

= Eξ|β{T∑t=0

γt∂ log π(ut|xt)

∂β

T∑t′=t

γt′−trt′︸ ︷︷ ︸

Qπ(xt,ut,t)

}

• PoWER (Kober & Peters) and Monte Carlo EM (Vlassis & Toussaint)are similar, but try to make a full “M-step” instead of only a gradientstep.

See: Peters & Schaal (2008): Reinforcement learning of motor skills with policygradients, Neural Networks.

Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.

Vlassis, Toussaint (2009): Learning Model-free Robot Control by a Monte Carlo EMAlgorithm. Autonomous Robots 27, 123-130. 39/82

Page 40: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Policy Search using policy gradients

Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.

40/82

Page 41: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Five approaches to learning optimal control

Po

lic

y S

ea

rch

Mo

de

l−fr

ee

RL

Mo

de

l−b

as

ed

RL

Inv

ers

e R

L

Imit

ati

on

Le

arn

ing

learn value fct.V (x)

policyπ(x)

optimize policy learn latent costsc(x)

dynamic prog.

π(x)policy

learn policyπ(x)

policy

learn model

πt(x)

P (x′|u, x)c(x, u)

dynamic prog.Vt(x) Vt

π(x)

demonstration dataexperience dataD = {(xt, ut, ct)}Tt=0 D = {(x0:T , u0:T )

d}nd=1

41/82

Page 42: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

4. Imitation Learning

D = {(x0:T , u0:T )d}nd=1

learn/copy→ πt(x)

• Use ML to imitate demonstrated state trajectories x0:T

Literature:

Atkeson & Schaal: Robot learning from demonstration (ICML 1997)

Schaal, Ijspeert & Billard: Computational approaches to motor learning by imitation(Philosophical Transactions of the Royal Society of London. Series B: BiologicalSciences 2003)

Grimes, Chalodhorn & Rao: Dynamic Imitation in a Humanoid Robot throughNonparametric Probabilistic Inference. (RSS 2006)

Rudiger Dillmann: Teaching and learning of robot tasks via observation of humanperformance (Robotics and Autonomous Systems, 2004)

42/82

Page 43: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Imitation Learning

• There a many ways to imitate/copy the oberved policy:

Learn a density model P (ut |xt)P (xt) (e.g., with mixture of Gaussians)from the observed data and use it as policy (Billard et al.)

Or trace observed trajectories by minimizing perturbation costs(Atkeson & Schaal 1997)

43/82

Page 44: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Imitation Learning

Atkeson & Schaal44/82

Page 45: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

5. Inverse RL

D = {(x0:T , u0:T )d}nd=1learn→ r(x, u)

DP→ Vt(x) → πt(x)

• Use ML to “uncover” the latent reward function in observed behavior

Literature:

Pieter Abbeel & Andrew Ng: Apprenticeship learning via inverse reinforcement learning(ICML 2004)

Andrew Ng & Stuart Russell: Algorithms for Inverse Reinforcement Learning (ICML2000)

Nikolay Jetchev & Marc Toussaint: Task Space Retrieval Using Inverse Feedback Control(ICML 2011).

45/82

Page 46: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Inverse RL (Apprenticeship Learning)

• Given: demonstrations D = {xd0:T }nd=1

• Try to find a reward function that discriminates demonstrations fromother policies– Assume the reward function is linear in some features R(x) = w>φ(x)

– Iterate:

1. Given a set of candidate policies {π0, π1, ..}2. Find weights w that maximize the value margin between teacher and all

other candidates

maxw,ξ

ξ

s.t. ∀πi : w>〈φ〉D︸ ︷︷ ︸value of demonstrations

≥ w>〈φ〉πi︸ ︷︷ ︸value of πi

||w||2 ≤ 1

3. Compute a new candidate policy πi that optimizes R(x) = w>φ(x) andadd to candidate list.

(Abbeel & Ng, ICML 2004)46/82

Page 47: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

47/82

Page 48: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

OutlinePart I: Learning problems in Robotics – the RL view

• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics

• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL

• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning

Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion

48/82

Page 49: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

6. Exploration

• Active Learning is a form of ML where the algorithm can query the nextdata point.

→ explore where “smoothed” empirical distribution is low

49/82

Page 50: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Exploration

• Exploration in robotics is tricky: can’t just pick the next point, but needto control system into state of interest.

Literature (very diverse selection):

Schaal & Atkeson: Robot juggling: An implementation of memory-based learning[Shifting Setpoint Algorithm] (Control Systems Magazine 1994)

Nouri & Littman: Dimension reduction and its application to model-based exploration incontinuous spaces (Machine Learning 2010)

Jong & Stone: Model-Based Exploration in Continuous State (SARA 2007)

Katz, Pyuro & Brock: Learning to Manipulate Articulated Objects in UnstructuredEnvironments Using a Grounded Relational Representation (RSS 2008)

Hsiao, Kaelbling & Lozano-Perez: Grasping POMDPs (ICRA 2007)

Saxena et al: Learning to grasp novel objects using vision (ISER 2006)

Oudeyer et al: The playground experiment: Task-independent development of a curiousrobot (Symposium on Developmental Robotics 2005)

50/82

Page 51: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Exploration

• R-max is a simple exploration strategy, that associates high value tostate-action pairs not often visites (optimism)

• We can use ML to approximate this optimistic value function.

Fitted R-max on the Mountain Car problem:

Jong & Stone: Model-Based Exploration in Continuous State (SARA 2007)

51/82

Page 52: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Physical ExplorationDov Katz and Oliver Brock 2010: Manipulating Articulated Objects WithInteractive Perception

52/82

Page 53: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

7. Probabilistic Inference for Control & Planning

• Bellman’s optimality principle is one approach to optimal control:

Vt(x) = minut

[ct(xt, ut) + Ex′|ut,x{V (x′)}

]

• Reductions to Probabilistic Inference:

Toussaint & Goerick: Probabilistic inference for structured planning in robotics (IROS2007).

Todorov: General duality between optimal control and estimation (Decision and Control,2008)

Toussaint: Robot Trajectory Optimization using Approximate Inference (ICML 2009).

Kappen, Gomez & Opper: Optimal control as a graphical model inference problem(arXiv:0901.0633, 2009)

Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control(arXiv:1009.3958, 2010)

53/82

Page 54: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Approximate Inference Control

state variable

control variable

rewards/costs

u0 u1 u2 uT

xTx2x1x0

r0 r1 r2 rT

• Introduce a binary auxiliary variable zt withP (zt=1 |ut, xt) = exp{−ct(xt, ut)}

• For a given trajectory x0:T , u0:T :logP (z0:T =1 |x0:T , u0:T ) = −C(x0:T , u0:T )

• W.r.t. a distribution q(x0:T , u0:T ):expected log-likelihood = expected neg-costs

〈logP (z0:T =1 |x0:T , u0:T )〉q(x0:T ,u0:T ) = −〈C(x0:T , u0:T )〉q(x0:T ,u0:T )

Expectation Maximization↔ Stochastic Optimal Control

54/82

Page 55: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Approximate Inference Control

state variable

control variable

rewards/costs

uTu2u1u0

x0 x1 x2 xT

zTz2z1z0

• Introduce a binary auxiliary variable zt withP (zt=1 |ut, xt) = exp{−ct(xt, ut)}

• For a given trajectory x0:T , u0:T :logP (z0:T =1 |x0:T , u0:T ) = −C(x0:T , u0:T )

• W.r.t. a distribution q(x0:T , u0:T ):expected log-likelihood = expected neg-costs

〈logP (z0:T =1 |x0:T , u0:T )〉q(x0:T ,u0:T ) = −〈C(x0:T , u0:T )〉q(x0:T ,u0:T )

Expectation Maximization↔ Stochastic Optimal Control

54/82

Page 56: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Approximate Inference Control

state variable

control variable

rewards/costs

uTu2u1u0

x0 x1 x2 xT

zTz2z1z0• Distinguish 3 different processes:

prior P (x0:T , u0:T ) := P (x0)T∏t=0

P (ut |xt)T∏t=1

P (xt |ut-1xt-1)

controlled qπ(x0:T , u0:T ) := P (x0)T∏t=0

δut=πt(xt)

T∏t=1

P (xt |ut-1xt-1)

posterior p(x0:T , u0:T ) :=P (x0)

P (z0:T =1)

T∏t=0

P (ut |xt)T∏t=1

P (xt |ut-1xt-1)T∏t=0

exp{−ct(xt, ut)}

For uniform P (ut |xt)D(qπ∣∣∣∣ p) = logP (z0:T =1) + Eqπ(x0:T ){C(x0:T , π(x0:T ))}

Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control(arXiv:1009.3958, 2010)

55/82

Page 57: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Approximate Inference Control

Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in ablocks world using probabilistic inference (ICRA 2010)

56/82

Page 58: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

OutlinePart I: Learning problems in Robotics – the RL view

• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics

• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL

• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning

Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion

57/82

Page 59: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• Most (if not all) examples concerned control of own or attached DoFs

What about really controlling a natural environment?

58/82

Page 60: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

• That might be a natural environment.How can we control (e.g., clean) this?

59/82

Page 61: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Learning a model from object interactionsD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...

.

.

.

}• How can we learn a predictive model P (x′ |u, x) for this data?

60/82

Page 62: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Learning a model from object interactions

• Differences to model learning in motion control:– symbolic representation of state– exponential state space

Object Abstraction Assumption: The world is made up ofobjects, and the effects of actions on these objects generallydepend on their attributes rather than their identities.

Pasula, Zettlemoyer & Kaelbling (ICAPS 2004)

• Wanted: generalization across objects

→ Statistical Relational Learning

61/82

Page 63: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Statistical Relational Learning (SRL)

• See ECML/PKDD 2007 tutorial by Lise Getoorhttp://www.ecmlpkdd2007.org/CD/tutorials/T3_Getoor/Getoor_CD.pdf

• “Probabilistic learning & inference on 1st order logic representations”

– very strong generalization across objects– in my view: the currently only way to express & learn uncertainknowledge about environments with objects & properties/relations

SRL + Robotics = perfect match!

62/82

Page 64: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Learning a model from interaction dataD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...

.

.

.

}• How can we learn a predictive model P (x′ |u, x) for this data?

63/82

Page 65: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

A form of Statistical Relational LearningPasula, Zettlemoyer & Kaelbling: Learning probabilistic relational planning rules (ICAPS2004)

• compress this data into probabilistic relational rules (Pasula et al.):pickup(X,Y) : on(X,Y), clear(X), inhand(NIL),block(Y)

.7 :inhand(X),¬clear(X),¬inhand(NIL),¬on(X,Y), clear(Y)

.2 : on(X, TABLE),¬on(X,Y), clear(Y)

.1 : no changepickup(X, TABLE) : on(X,TABLE), clear(X), inhand(NIL)

(.66 :

inhand(X),¬clear(X),¬inhand(NIL),¬on(X,TABLE)

.34 : no changeputon(X,Y) : clear(Y), inhand(X),block(Y)

.7 :inhand(NIL),¬clear(Y),¬inhand(X),on(X,Y), clear(X)

.2 :on(X,TABLE), clear(X), inhand(NIL),¬inhand(X)

.1 : no changeputon(X,TABLE) : inhand(X)

(.8 :

on(X,TABLE), clear(X), inhand(NIL),¬inhand(X)

.2 : no change

• Find a rule set that maximizes (likelihood - description length)• Opportunities for reducing description length:

– Frame Assumption:Actions only influence few predicates; most predicates remain unchanged

– Abstraction: Introducing novel predicates.– Uncertainty! 64/82

Page 66: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Role of uncertainty in these rules

(b)(a)

⇒ uncertainty↔ regularization↔ compression & abstraction

• Introducing uncertainty in the rules not only allows us to modelstochastic worlds, it enables to compress/regularize and therebylearn strongly generalizing models!

65/82

Page 67: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Planning by inference in relational domains

• Once the model is learnt, using it (planning) is hard

• SST & UCT do not scale with # objects

→ Use Planning-by-Inference:

model

depending on:situation

relevance

modelrule-based DBN

one representation good for learning, another good for planning

(Lang & Toussaint, JAIR 2010)66/82

Page 68: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Planning by inference in relational domains

(we’re using factored frontier for approx. inference)

→ Advances in Lifted Inference could translate to better robotmanipulation planning.

67/82

Page 69: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

ApplicationRandom exploration:

Planning:

Real-world:

Lang & Toussaint: Planning with Noisy Probabilistic Relational Rules (JAIR 2010)

Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in ablocks world using probabilistic inference (ICRA 2010)

68/82

Page 70: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Relational Exploration

• The state space is inherently exponential in the # objects. How couldwe realize strategies like E3 or R-max in relational domains?

• Key insight:

strong generalization of model↔

strong implication on what is considered novel / is explored

For instance, if you’ve seen a red, green and yellow ball rolling, will you explore whetherthe blue ball also rolls? Or rather explore something totally different, like dropping a bluebox?

69/82

Page 71: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Relational Exploration

• Transfer Explicit Explore or Exploit (E3) to Relational Domains

• Representations to formulate an “empirical distribution” (non-novelty)

propositional P (s) ∝ cD(s)

distance based Pd(s) ∝ exp{− min(se,ae,s′e)∈D

d(s, se)2}

predicate-based Pp(s) ∝ cp(s) I(s |= p) + c¬p(s) I(s |= ¬p)context-based Pφ(s) ∝

∑φ∈Φ

cD(φ) I(∃σ : s |= σ(φ))

(contexts ↔ set of LHSs of rules)

70/82

Page 72: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

ApplicationOnline Relational explore-exploit:

Lang, Toussaint & Kersting: Exploration in Relational Worlds (ECML 2010)

71/82

Page 73: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

OutlinePart I: Learning problems in Robotics – the RL view

• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics

• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL

• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning

Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion

72/82

Page 74: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Three conclucing comments:

– Scaling RL?– Relational Learning in Robotics– Whole Pizza vs. Chillies

73/82

Page 75: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Scaling RL?

• Recommended: Satinder Singh’s “Myths of RL”:http://umichrl.pbwiki.com/Myths-of-Reinforcement-Learning

1. Large state spaces are hard for RL2. RL is slow3. RL does not have (m)any success stories since TDgammon4. RL does not work well with function approximation5. Value function approximation does not work6. Non-Markovianness invalidates standard RL methods7. POMDPs are hard for RL to deal with8. RL is about learning optimal policies

• The first half of this tutorial discussed many success stories of the RLapproach to Robotics.

74/82

Page 76: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Wolfgang Kohler (1917)Intelligenzprufungen amMenschenaffenThe Mentality of Apes

[movie]

75/82

Page 77: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Scaling RL?

The real world is not a scaled up versionof the Mountain Car Problem.

• In other terms: What should we scale with?– The size of the state space?

It’ll always be exponential – is this the right view?

– The number of objects?How many objects can humans mentally manipulate (=plan with)?

– Scaling with the planning horizon?But on the right level of abstraction, horizons are short.

• What we need in Robotics areExploration, learning and goal-directed behavior that exploit thestructure of natural environments.

76/82

Page 78: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Relational Learning in Robotics

• Many of the successful learning methods concern motion control. There is (inmy impression) less methods for learning to control/manipulate worlds of manyobjects – worlds as they are classically described in AI.

• What Robotics needs is a fusion of “this type of AI”, Machine Learning andControl methods

77/82

Page 79: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Relational Learning in Robotics

• A popular science article:I, algorithm: A new dawn for artificial intelligence

(Anil Ananthaswamy, NewScientist, January 2011)

Talks of “probabilistic programming, which combines the logical underpinningsof the old AI with the power of statistics and probability.” Cites Stuart Russel as“It’s a natural unification of two of the most powerful theories that have beendeveloped to understand the world and reason about it.” and Josh Tenenbaumas “It’s definitely spring”.

• My impression: Exactly these kinds of developments give new hope forRobots to explore, learn and plan in our natural world, composed ofobjects.

78/82

Page 80: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Whole Pizza vs. Chillies2011 IEEE International Conference on Robotics and Automation

ICRA 2011 Technical Program Tuesday May 10, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Interac Track T1

08:20-09:35Room 3B

Regular SessionsTuA101

Aerial Robotics I

08:20-09:35Room 3C

Regular SessionsTuA102

Agent-Based SystemsI

08:20-09:35Room 3D

Regular SessionsTuA103

AutonomousNavigation I

08:20-09:35Room 3E

Invited SessionsTuA104

ICRA RobotChallenge: Advancing

Research throughCompetitions

08:20-09:35Room 3G

Regular SessionsTuA105

Advanced RobotControl

08:20-09:35Room 5A

Regular SessionsTuA106

Behaviour-BasedSystems

08:20-09:35Room 5B

Regular SessionsTuA107

Biologically-InspiredRobots I

08:20-09:35Room 5C

Regular SessionsTuA108

Calibration andIdentification I

08:20-09:35Room 5D

Regular SessionsTuA109

Cellular and ModularRobots I

08:20-09:35Room 5E

Regular SessionsTuA110

Localization andMapping I

08:20-09:35Room 5F

Regular SessionsTuA111

Flexible Arms/Robots

08:20-09:35Room 5H

Regular SessionsTuA112

Distributed RobotSystems I

08:20-09:35Room 5I

Regular SessionsTuA113

Medical Robots andSystems I

08:20-09:35Room 5J

Regular SessionsTuA114

Visual Navigation I

08:20-09:35Hall

Poster SessionsTuA1-InteracInterac

Interactive Session I:Robotic Technology

10:05-11:20Room 3B

Regular SessionsTuA201

Aerial Robotics II

10:05-11:20Room 3C

Regular SessionsTuA202

Climbing Robots

10:05-11:20Room 3D

Regular SessionsTuA203

AutonomousNavigation II

10:05-11:20Room 3E

Regular SessionsTuA204

Human Detectionand Tracking I

10:05-11:20Room 3G

Regular SessionsTuA205

Teleoperation I

10:05-11:20Room 5A

Regular SessionsTuA206

Haptics and HapticInterfaces I

10:05-11:20Room 5B

Regular SessionsTuA207

Biologically-InspiredRobots II

10:05-11:20Room 5C

Regular SessionsTuA208

Calibration andIdentification II

10:05-11:20Room 5D

Regular SessionsTuA209

Cellular and ModularRobots II

10:05-11:20Room 5E

Regular SessionsTuA210

Localization andMapping II

10:05-11:20Room 5F

Regular SessionsTuA211

Direct/InverseDynamics

Formulation

10:05-11:20Room 5H

Regular SessionsTuA212

Force and TactileSensing

10:05-11:20Room 5I

Regular SessionsTuA213

Medical Robots andSystems II

10:05-11:20Room 5J

Regular SessionsTuA214

Visual Navigation II

10:05-11:20Room 3A

Regular SessionsTuA215

CommunicationSession I:

Architecture andSoftware for Robotic

Systems

13:40-14:55Room 3B

Regular SessionsTuP101

Personal and ServiceRobots

13:40-14:55Room 3C

Regular SessionsTuP102

Multi-Legged Robots

13:40-14:55Room 3D

Regular SessionsTuP103

Humanoid Robots I

13:40-14:55Room 3E

Regular SessionsTuP104

Human Detectionand Tracking II

13:40-14:55Room 3G

Regular SessionsTuP105

Teleoperation II

13:40-14:55Room 5A

Regular SessionsTuP106

Haptics and HapticInterfaces II

13:40-14:55Room 5B

Regular SessionsTuP107

Biologically-InspiredRobots III

13:40-14:55Room 5C

Regular SessionsTuP108

RehabilitationRobotics I

13:40-14:55Room 5D

Regular SessionsTuP109

Motion and PathPlanning I

13:40-14:55Room 5E

Regular SessionsTuP110

Localization andMapping III

13:40-14:55Room 5F

Regular SessionsTuP111

Marine andUnderwater Robotics

I

13:40-14:55Room 5H

Regular SessionsTuP112

Field and UnderwaterRobotics I

13:40-14:55Room 5I

Regular SessionsTuP113

Medical Robots andSystems III

13:40-14:55Room 5J

Regular SessionsTuP114

Visual Navigation III

13:40-14:55Room 3A

Regular SessionsTuP115

CommunicationSession II: Industrial

Manipulators

13:40-14:55Hall

Poster SessionsTuP1-InteracInterac

Interactive Session II:Systems, Control and

Automation

13:40-14:55Room T1

Poster SessionsTuP1-InteracT1

Buffer Session 1

15:25-16:55Room 3B

Regular SessionsTuP201

Human andMulti-RobotInteraction

15:25-16:55Room 3C

Regular SessionsTuP202

Legged Locomotion

15:25-16:55Room 3D

Regular SessionsTuP203

Humanoid Robots II

15:25-16:55Room 3E

Regular SessionsTuP204

Recognition I

15:25-16:55Room 3G

Regular SessionsTuP205

Teleoperation III

15:25-16:55Room 5A

Regular SessionsTuP206

Robust/AdaptiveControl of Robotic

Systems

15:25-16:55Room 5B

Regular SessionsTuP207

Space Robotics

15:25-16:55Room 5C

Regular SessionsTuP208

RehabilitationRobotics II

15:25-16:55Room 5D

Regular SessionsTuP209

Motion and PathPlanning II

15:25-16:55Room 5E

Regular SessionsTuP210

Localization andMapping IV

15:25-16:55Room 5F

Regular SessionsTuP211

Marine andUnderwater Robotics

II

15:25-16:55Room 5H

Regular SessionsTuP212

Field and UnderwaterRobotics II

15:25-16:55Room 5I

Regular SessionsTuP213

Medical Robots andSystems IV

15:25-16:55Room 5J

Regular SessionsTuP214

Visual Navigation IV

15:25-16:55Room 3A

Regular SessionsTuP215

CommunicationSession III:Automation

Technologies

ICRA 2011 Technical Program Wednesday May 11, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

08:20-09:35Room 3B

Regular SessionsWeA101

Aerial Robotics III

08:20-09:35Room 3C

Regular SessionsWeA102

Agent-Based Systems II

08:20-09:35Room 3D

Regular SessionsWeA103

Autonomous NavigationIII

08:20-09:35Room 3E

Regular SessionsWeA104

Range Sensing I

08:20-09:35Room 3G

Regular SessionsWeA105

Slam I

08:20-09:35Room 5A

Regular SessionsWeA106

Micro-Nano Robots I

08:20-09:35Room 5B

Regular SessionsWeA107

Biologically-InspiredRobots IV

08:20-09:35Room 5C

Regular SessionsWeA108

Parallel Robots I

08:20-09:35Room 5D

Regular SessionsWeA109

Novel Actuators I

08:20-09:35Room 5E

Regular SessionsWeA110

Mapping and NavigationI

08:20-09:35Room 5F

Regular SessionsWeA111

Grasping I

08:20-09:35Room 5H

Regular SessionsWeA112

Distributed RobotSystems II

08:20-09:35Room 5I

Regular SessionsWeA113

Medical Robots I

08:20-09:35Room 5J

Regular SessionsWeA114

Computer Vision forRobotics andAutomation I

08:20-09:40Room 3A

Video SessionsWeAV115

Video Session I: Aerialand Mobile Robotics

10:05-11:20Room 3B

Regular SessionsWeA201

Aerial Robotics IV

10:05-11:20Room 3C

Regular SessionsWeA202

Underactuated andTendon/WireMechanisms I

10:05-11:20Room 3D

Regular SessionsWeA203

Autonomous NavigationIV

10:05-11:20Room 3E

Regular SessionsWeA204

Range Sensing II

10:05-11:20Room 3G

Regular SessionsWeA205

Slam Ii

10:05-11:20Room 5A

Regular SessionsWeA206

Micro-Nano Robots II

10:05-11:20Room 5B

Regular SessionsWeA207

Biologically-InspiredRobots V

10:05-11:20Room 5C

Regular SessionsWeA208

Parallel Robots II

10:05-11:20Room 5D

Regular SessionsWeA209

Novel Actuators II

10:05-11:20Room 5E

Regular SessionsWeA210

Mapping and NavigationII

10:05-11:20Room 5F

Regular SessionsWeA211

Grasping II

10:05-11:20Room 5H

Regular SessionsWeA212

Cooperative Control forMultiple Robots

10:05-11:20Room 5I

Regular SessionsWeA213

Medical Robots II

10:05-11:20Room 5J

Regular SessionsWeA214

Computer Vision forRobotics andAutomation II

10:05-11:25Room 3A

Video SessionsWeAV215

Video Session II:Humanoid and Service

Robotics

13:40-14:55Room 3B

Regular SessionsWeP101

Collision Avoidance

13:40-14:55Room 3C

Regular SessionsWeP102

Underactuated andTendon/Wire

Mechanisms II

13:40-14:55Room 3D

Regular SessionsWeP103

Humanoid Robots III

13:40-14:55Room 3E

Regular SessionsWeP104

Range Sensing III

13:40-14:55Room 3G

Regular SessionsWeP105

Slam Iii

13:40-14:55Room 5A

Regular SessionsWeP106

Micro-Nano Robots III

13:40-14:55Room 5B

Regular SessionsWeP107

Biologically-InspiredRobots VI

13:40-14:55Room 5C

Regular SessionsWeP108

Rehabilitation RoboticsIII

13:40-14:55Room 5D

Regular SessionsWeP109

Motion and PathPlanning III

13:40-14:55Room 5E

Regular SessionsWeP110

Industrial Automation

13:40-14:55Room 5F

Regular SessionsWeP111

Physical Human-RobotInteraction I

13:40-14:55Room 5H

Regular SessionsWeP112

Learning and AdaptiveSystems I

13:40-14:55Room 5I

Regular SessionsWeP113

Networked Robots

13:40-14:55Room 5J

Regular SessionsWeP114

Computer Vision forRobotics andAutomation III

13:40-14:55Room 3A

Regular SessionsWeP115

Communication SessionIV: Robotic Applications

I

15:25-16:55Room 3B

Regular SessionsWeP201

Cognitive Human-RobotInteraction

15:25-16:55Room 3C

Regular SessionsWeP202

Kinematics of Serial andParallel Robots

15:25-16:55Room 3D

Regular SessionsWeP203

Humanoid Robots IV

15:25-16:55Room 3E

Regular SessionsWeP204

Recognition II

15:25-16:55Room 3G

Regular SessionsWeP205

Slam Iv

15:25-16:55Room 5A

Regular SessionsWeP206

Micro-Nano Robots IV

15:25-16:55Room 5B

Regular SessionsWeP207

Micro-Nano Robots andApplications to Life

Science

15:25-16:55Room 5C

Regular SessionsWeP208

Rehabilitation RoboticsIV

15:25-16:55Room 5D

Regular SessionsWeP209

Motion and PathPlanning IV

15:25-16:55Room 5E

Regular SessionsWeP210

Surveillance, Search andRescue Robotics

15:25-16:55Room 5F

Regular SessionsWeP211

Physical Human-RobotInteraction II

15:25-16:55Room 5H

Regular SessionsWeP212

Learning and AdaptiveSystems II

15:25-16:55Room 5I

Regular SessionsWeP213

Factory Automation

15:25-16:55Room 5J

Regular SessionsWeP214

Visual Tracking

15:25-16:55Room 3A

Regular SessionsWeP215

Communication SessionV: Robotic Applications

II

ICRA 2011 Technical Program Thursday May 12, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14

08:20-09:35Room 3B

Regular Sessions ThA101

Aerial Robotics V

08:20-09:35Room 3C

Regular Sessions ThA102

Planning, Scheduling andCoordination

08:20-09:35Room 3D

Regular Sessions ThA103

Path Planning for MultipleRobots I

08:20-09:35Room 3E

Regular Sessions ThA104

Sensor Fusion I

08:20-09:35Room 3G

Regular Sessions ThA105

Manipulation Planning I

08:20-09:35Room 5A

Regular Sessions ThA106

Mechanism Design ofMobile Robots I

08:20-09:35Room 5B

Regular Sessions ThA107

Variable StiffnessActuators I

08:20-09:35Room 5C

Regular Sessions ThA108

Rehabilitation Robotics V

08:20-09:35Room 5D

Regular Sessions ThA109

Redundant Robots

08:20-09:35Room 5E

Regular Sessions ThA110

Localization I

08:20-09:35Room 5F

Regular Sessions ThA111

Grasping, Tactile Sensingand Force Control

08:20-09:35Room 5H

Regular Sessions ThA112

Distributed Robot SystemsIII

08:20-09:35Room 5I

Regular Sessions ThA113

Medical Robots andSystems V

08:20-09:35Room 5J

Regular Sessions ThA114

Computer Vision I: Model

10:05-11:20Room 3B

Regular Sessions ThA201

Nonholonomic MotionPlanning

10:05-11:20Room 3C

Regular Sessions ThA202

Robot Design

10:05-11:20Room 3D

Regular Sessions ThA203

Path Planning for MultipleRobots II

10:05-11:20Room 3E

Regular Sessions ThA204

Sensor Fusion II

10:05-11:20Room 3G

Regular Sessions ThA205

Manipulation Planning II

10:05-11:20Room 5A

Regular Sessions ThA206

Mechanism Design ofMobile Robots II

10:05-11:20Room 5B

Regular Sessions ThA207

Variable StiffnessActuators II

10:05-11:20Room 5C

Regular Sessions ThA208

Soft Material Robotics

10:05-11:20Room 5D

Regular Sessions ThA209

New Sensing andMechanism for Robots

10:05-11:20Room 5E

Regular Sessions ThA210

Localization II

10:05-11:20Room 5F

Regular Sessions ThA211

Vision: 3D

10:05-11:20Room 5H

Regular Sessions ThA212

Dexterous Manipulation

10:05-11:20Room 5I

Regular Sessions ThA213

Medical Robots andSystems VI

10:05-11:20Room 5J

Regular Sessions ThA214

Computer Vision II:Recognition

13:40-14:55Room 3B

Regular Sessions ThP101

Multifingered Hands

13:40-14:55Room 3C

Regular Sessions ThP102

Distributed and NetworkedRobot Systems

13:40-14:55Room 3D

Regular Sessions ThP103

Robot Safety

13:40-14:55Room 3E

Regular Sessions ThP104

Reactive andSensor-Based Planning

13:40-14:55Room 3G

Regular Sessions ThP105

Manipulation Planning III

13:40-14:55Room 5A

Regular Sessions ThP106

Mechanism Design ofMobile Robots III

13:40-14:55Room 5B

Regular Sessions ThP107

Variable Stiffness andImpedance Control I

13:40-14:55Room 5C

Regular Sessions ThP108

Robotic Software,Middleware andProgramming

Environments I

13:40-14:55Room 5D

Regular Sessions ThP109

Motion and Path PlanningV

13:40-14:55Room 5E

Regular Sessions ThP110

Localization III

13:40-14:55Room 5F

Regular Sessions ThP111

Visual Servoing I

13:40-14:55Room 5H

Regular Sessions ThP112

Learning and AdaptiveSystems III

13:40-14:55Room 5I

Regular Sessions ThP113

Medical Robots andSystems VII

13:40-14:55Room 5J

Regular Sessions ThP114

Computer Vision IIINavigation

15:25-16:55Room 3B

Regular Sessions ThP201

Motion Control ofManipulators

15:25-16:55Room 3C

Regular Sessions ThP202

Robot Design forAdvanced Applications

15:25-16:55Room 3D

Regular Sessions ThP203

Sensor Networks

15:25-16:55Room 3E

Regular Sessions ThP204

Tactile Sensing andMultifingered Grasping

15:25-16:55Room 3G

Regular Sessions ThP205

Vision for ObjectRecognition

15:25-16:55Room 5A

Regular Sessions ThP206

Micro and NanoscaleAutomation

15:25-16:55Room 5B

Regular Sessions ThP207

Variable Stiffness andImpedance Control II

15:25-16:55Room 5C

Regular Sessions ThP208

Robotic Software,Middleware andProgramming

Environments II

15:25-16:55Room 5D

Regular Sessions ThP209

Motion and Path PlanningVI

15:25-16:55Room 5E

Regular Sessions ThP210

Wheeled Robots

15:25-16:55Room 5F

Regular Sessions ThP211

Visual Servoing II

15:25-16:55Room 5H

Regular Sessions ThP212

Learning and AdaptiveSystems IV

15:25-16:55Room 5I

Regular Sessions ThP213

Medical Robots andSystems VIII

15:25-16:55Room 5J

Regular Sessions ThP214

Omnidirectional Vision forRobotics

ICRA 2011 schedule79/82

Page 81: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Whole Pizza vs. Chillies• Integrated robotic systems are huge in terms of

– lines of code– methods/disciplines/formalizations involved

Beetz et al: The Assistive Kitchen – A Demonstration Scenario for Cognitive TechnicalSystems

• Existing learning methods tend to address only isolated aspects –identified and formalized by the engineer.

• Machine Learning is used to work with a “full formalization” of thedomain.Will this ever be possible for Robotics?Can we apply ML on the full system level? 80/82

Page 82: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Robotics as big graphical model?(where “graphical model” would include relational structures...)

• Many aspects (Computer Vision, Perception, Feedback Control,Stochastic Optimal Control, MDPs) can already be formalized in termsof probabilistic models and inference.

Would it help to view it all as one big graphical model?

81/82

Page 83: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Thanks for your attention!

82/82

Page 84: Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Appendix• Robot dynamics can be desrcribed by the differential equationM(q) q + C(q, q) q + F (q) = u with mass matrix M(q), Coriolis forces C(q, q) andgravity forces F (q). The Newton-Euler algorithm can efficiently (numerically) computeM,C and F for any specific robot configuration (q, q). Using (e.g.) Leap Frog integrationwith time step size τ , the process becomes

qt+1 = qt + τ(qt+1 + qt)/2

qt+1 = qt + τM-1(ut + Cqt + F ) ,qt+1

qt+1

= (I+A)

qtqt

+Bu+ a

A =

0 τ + τ2M-1C/20 τM-1C

, B =

τ2M-1/2τM-1

, a =

τ2M-1F/2τM-1F

• LSPI simplifications made: Actually, LSPI estimates the Q(s, a)-function instead of theV (s)-function, which represents the expected return given the current state and selectedaction. Second, I skipped explaining Policy Iteration: once you estimated the Q-function,you need to update the policy (perhaps collect new data) and iterate re-estimating theQ-function.

83/82