ヒューマノイドロボット頂上決戦と人工知能：実践と理論のギャップを探る...

ヒューマノイドロボット頂上決戦と人工知能：実践と理論のギャップを探る

Akihiko Yamaguchi*

* Robotics Institute, Carnegie Mellon University

簡単な自己紹介…-2006: 京大，松山研究室（松山先生）2006-2008: NAIST, ATR 脳情報研究所（川人先生）2008-2011: NAIST, ロボティクス（小笠原先生）2011-2014: NAIST, 特任助教2014-current: CMU, ポスドク何を研究している人？KW: robot learning, machine learning, robotics, artificial intelligence, motion planning, manipulation, …最近は人工知能が流行ってるのでAI for robots とか言っておくと（非研究者には）（最先端っぽくって）ウケが良いようです人工知能が下火の頃（～2012?）は robot learning本音：ロボットを題材に人間レベルの知能を持った○○を作りたいhttp://akihikoy.net/ (青下線はリンクでクリックすれば開けます．以下同)

2

http://akihikoy.net/

講義の目的

最近流行りの話題を話します + 背後の技術

DRC (DARPA Robotics Challenge)NHKの特集番組では自律性(autonomy)：米国>日本と日本が奮わなかった理由を解説（本当か？）

猫も杓子もうちの教授もディープラーニングただのニューラルネットです（←悪いとは言ってない）

自動走行車，ワトソン，東大入試に受かるAI何でもかんでも人工知能？

ロボットと人工知能 - robot learning -の観点から3

バズワード

人工知能ディープラーニングビッグデータ

「政治用語」だと考えています --- 使い過ぎるとはずかしい企業が食いついてくれる論文が目を引きやすくなるグラントが取りやすい・・・？

「バズワード」を使うことが悪いとは思わないお金が循環しているし我々研究者にも

今学習している人：惑わされないように注意してください

4

[wiki/バズワード]バズワード（英: buzzword）とは、一見、説得力があるように見えるが、具体性がなく明確な合意や定義

のないキーワードのことである。ただし、「バズワード」という用語自体の定義が曖昧なので、「バズワード自体がバズワードである」とする説もある。

そろそろ「ディープラーニングは(強い)AIじゃねーよ」みたいな批

判が出かけてますが，そんなん当たり前です．くれぐれも批判的な意見に萎縮しないように．

AIとロボット

5

AIとロボットってつながるの？

Deep learning

画像認識

音声認識翻訳

自然言語処理

知識ベース

推論

機械学習

全部入れたい！

音声認識clarifai (指定した画像にタグ付けするデモ)http://www.clarifai.com/Deep Posehttp://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/42237.pdfhttps://drive.google.com/open?id=0B5U3jvqDZnxNSGw2UGpycGFQX2s【衝撃】Googleの人工知能が描いた絵が凄すぎる! 絵を見た人「ぎゃあああああ怖すぎる!!」「芸術的だ!!」http://buzzplus.com/article/2015/06/22/googleart/ゲームやったり(Deep Q-network; 後述)

質問応答システム“Watson”がクイズ番組に挑戦!http://www.ibm.com/smarterplanet/jp/ja/ibmwatson/quiz/IBMの人工知能「Watson」、料理本を発売へ考案したレシピを収録http://japan.cnet.com/news/service/35063063/Cognitive Cooking with Chef Watsonhttp://www.ice.edu/aboutus/ibmcognitivecookingwithchefwatsonpartnershipがん治療を変える、Pepperへの搭載も人工知能「IBM Watson」の可能性http://japan.cnet.com/news/business/35068179/これは医療革命が起きそう。IBMが膨大な医療画像を人工知能「ワトソン」に追加 : ギズモード・ジャパンhttp://www.gizmodo.jp/2015/08/ibmmerge_healthcare10watson.htmlウィンブルドンで人工知能「Watson」が“テニス専門家”として活躍http://japan.cnet.com/news/service/35066278/THE AI BEHIND WATSON — THE TECHNICAL ARTICLEhttp://www.aaai.org/Magazine/Watson/watson.php

http://www.clarifai.com/

http://static.googleusercontent.com/media/research.google.com/ja/pubs/archive/42237.pdf

https://drive.google.com/open?id=0B5U3jvqDZnxNSGw2UGpycGFQX2s

http://buzzplus.com/article/2015/06/22/googleart/

http://www.ibm.com/smarterplanet/jp/ja/ibmwatson/quiz/

http://japan.cnet.com/news/service/35063063/

http://www.ice.edu/aboutus/ibmcognitivecookingwithchefwatsonpartnership

http://japan.cnet.com/news/business/35068179/

http://www.gizmodo.jp/2015/08/ibmmerge_healthcare10watson.html

http://japan.cnet.com/news/service/35066278/

http://www.aaai.org/Magazine/Watson/watson.php

チェスとモーションAI v.s. 人間チェス： 1996 世界チャンピオンに勝った（IBM ディープ・ブルー）将棋： 2015 「もう同レベルだし数年後には圧勝」コンピューター将棋「目的達した」終了宣言へ NHKニュースhttp://www3.nhk.or.jp/news/html/20151010/k10010265711000.html

ロボットのモーションプラニング（行動計画）移動ロボット（ルンバとか）をぶつからずにゴールまで動かす経路は？

ロボットの腕をぶつからずにテーブルの下まで移動させるには？

どちらも推論みんな知ってる推論アルゴリズム：A*サーチパズルが解けるモーションプラニングもできる

では「目的」を「歩け」とか「ハンバーグ作れ」にすると？7

or_ompl - OpenRAVE bindings for OMPLhttps://www.youtube.com/watch?v=6qRRbvNzHG8

https://ja.wikipedia.org/wiki/%E3%83%87%E3%82%A3%E3%83%BC%E3%83%97%E3%83%BB%E3%83%96%E3%83%AB%E3%83%BC_(%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF)

http://www3.nhk.or.jp/news/html/20151010/k10010265711000.html

https://www.youtube.com/watch?v=6qRRbvNzHG8

推論における基本的な要素

状態空間と行動空間状態：現在の状況を説明する変数チェス：盤上の駒配置，ロボット：位置姿勢，関節角など行動：AIが自由に決められる，状態を変化させる変数チェス：駒をどう動かすか，ロボット：目標関節角など

ダイナミクス（状態遷移）状態遷移を規定する関数：状態，行動→次の状態チェス：．．．，ロボット：．．．

評価関数：各（状態，行動，次の状態）に対して報酬orコスト関数を定義，その和を評価関数とする，行動全体に対して何らかの評価関数を定義する，など

8

ロボットにおける難しさ状態・行動空間が膨大状態・・・位置，姿勢，関節角，ビジョンセンサ入力，力覚センサ入力，音センサ入力，など

そもそも画像などをそのまま使うことは現状困難で，「意味のある情報」に加工する必要がある行動・・・ヒューマノイドクラスになると自由度が30+であることもしばしば→状態空間に比べると少なく見えるが，機械学習や推論アルゴリズムにとっては致命的に大きい

ダイナミクスの解析的モデル化が困難これまで剛体

完全な剛体は存在せず，接触力のモデル化が歩行研究者などの間で問題になることが多い柔軟物（紐，服，液体，粉，．．．）

でも人間はいとも簡単にやっているのですそれってどんな「知能」？→ Robot learning のモチベーション！

9

次元の呪い

Deep learning

画像認識

音声認識翻訳

自然言語処理

知識ベース

推論

機械学習

全部入れたい！

10

上の世界は，確かにロボットに必要でも真の知的ロボットにはそれだけでは不十分リアルボディを持って，現実世界と相互作用し，現実世界で目的を達成させられるAIが必要

よってロボットを含めたAIの研究がもっとも広いAIの研究であり，真のAIの実現には多くの研究者が関わるべきなのだ

現実世界の理解（シンボルと現実の物の対応付けやダイナミクスの理解）は上の世界の研究も促進させる（例：ワトソンの料理に関わるダイナミクスを考えよう）

AIとロボットってつながるの？

11

人間はどうしてる？→スキルの模倣学習

ダイナミクスが未知→強化学習

ほかのロボット学習部分的に機械学習を使ったり

汎用性・汎化性汎用性・汎化性が高いものほどAIっぽい“汎用性のコスト”

歩行もある意味AI ←弱いAI（歩行研究者はそう言わない→ ほかのタスクにそのまま使えないから

i.e. 汎用性が低い）ロボットのAIについて，AIかそうでないかの議論はあまり意味がない（見方によって異なる）→ それでみんなAIと呼ぶのでしょう．．．

ロボットのAI - Robot learning

自律性（人間が制御しなくていい）（人間がプログラムしなくていい）AIっぽい！

学習能力（ロボットが勝手に覚える）AIっぽい！

モーションプラニング（推論）・RRT・動的計画法・最適化

[AIの評価尺度]

ロボットが「できる」とロボットで「やってみた」の違いは大きい→研究者に惑わされるな！

行動計画いろいろ

モーションプラニング（推論）単純な最適化アルゴリズム (e.g. A* search)でも行動計画はできる問題：次元が大きい場合，メモリを確保できなかったり，探索に膨大な時間が掛かったりする

RRT (Rapidly exploring random tree) https://en.wikipedia.org/wiki/Rapidly_exploring_random_tree動的計画法 (Dynamic programming)

Differential Dynamic Programminghttps://en.wikipedia.org/wiki/Differential_dynamic_programming

12

https://en.wikipedia.org/wiki/Rapidly_exploring_random_tree

https://en.wikipedia.org/wiki/Differential_dynamic_programming

機械学習の復習教師あり学習https://en.wikipedia.org/wiki/Supervised_learningサポートベクタマシンhttps://en.wikipedia.org/wiki/Support_vector_machineGaussian Processニューラルネットhttps://en.wikipedia.org/wiki/Artificial_neural_network

教師なし学習https://en.wikipedia.org/wiki/Unsupervised_learning

PCAクラスタリングhttps://en.wikipedia.org/wiki/Cluster_analysis

強化学習https://en.wikipedia.org/wiki/Reinforcement_learningSutton, Barto Reinforcement Learning: An Introduction, The MIT Press, 1998.https://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html

13

Artificial neural networkhttps://en.wikipedia.org/wiki/Artificial_neural_network

https://en.wikipedia.org/wiki/Supervised_learning

https://en.wikipedia.org/wiki/Support_vector_machine

https://en.wikipedia.org/wiki/Artificial_neural_network

https://en.wikipedia.org/wiki/Unsupervised_learning

https://en.wikipedia.org/wiki/Cluster_analysis

https://en.wikipedia.org/wiki/Reinforcement_learning

https://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html


模倣学習 Learning from demonstrationスキル：物事をうまくやるための知識

スキルライブラリ：スキルの集合

Demonstration は軌道だけではない（人間や操作対象物の動きを取ってスキルモデルを作る・・・というのはひとつのやり方であって，それが本質ではない）現状，Robot learning における複雑さ（次元の呪いなど）を解決する最も強力な手段Robot (learning) の研究者は常に「人はどうするか」を考えている人間の知能にも模倣学習が大きな影響を与えているふたつの学習ステージ

ロボットが人から学習（転移）自分でさらに学習

Yasuo Kuniyoshi and Masayuki Inaba and Hirochika Inoue: Learning by Watching: Extracting Reusable Task Knowledge from Visual Observation of Human Performance, IEEE Transactions on Robotics and Automation, 1994.Tetsunari Inamura and Iwaki Toshima and Hiroaki Tanie and Yoshihiko

Nakamura: Embodied Symbol Emergence Based on Mimesis Theory, The International Journal of Robotics Research, 2004.Jakel, R. and Schmidt-Rohr, S.R. and Losch, M. and Dillmann, R.:

Representation and constrained planning of manipulation strategies in the context of Programming by Demonstration, ICRA 2010.Aude Billard and Daniel Grollman: Robot learning by demonstration,

Scholarpedia, Vol. 8, No. 12, 2013.http://www.scholarpedia.org/article/Robot_learning_by_demonstration

14

Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara: Pouring Skills with Planning and Learning Modeled from Human Demonstrations, International Journal of Humanoid Robotics, Vol.12, No.3, July, 2015.

http://www.scholarpedia.org/article/Robot_learning_by_demonstration

Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara: Pouring Skills with Planning and Learning Modeled from Human Demonstrations, International Journal of Humanoid Robotics, Vol.12, No.3, July, 2015.

強化学習

15

テキストSutton, Barto Reinforcement Learning: An Introduction, The MIT Press, 1998.https://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html

モデルベースS.~Schaal and C.~Atkeson, ``Robot juggling: implementation of memory-based learning,'' in the IEEE International Conference on Robotics and Automation (ICRA'94), vol.~14, no.~1, 1994, pp. 57--71.J.~Morimoto, G.~Zeglin, and C.~Atkeson, ``Minimax differential dynamic programming: Application to a biped walking robot,'' in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'03), vol.~2, 2003, pp. 1927--1932.

モデルフリーJ.~Kober and J.~Peters, ``Policy search for motor primitives in robotics,'' Machine Learning, vol.~84, no. 1-2, pp. 171--203, 2011.E.~Theodorou, J.~Buchli, and S.~Schaal, ``Reinforcement learning of motor skills in high dimensions: A path integral approach,'' in the IEEE International Conference on Robotics and Automation (ICRA'10), may 2010, pp. 2397--2403.D.~Ernst, P.~Geurts, and L.~Wehenkel, ``Tree-based batch mode reinforcement learning,'' Journal of Machine Learning Research, vol.~6, pp. 503--556, 2005.P.~Kormushev, S.~Calinon, and D.~G. Caldwell, ``Robot motor skill coordination with EM-based reinforcement learning,'' in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'10), 2010, pp. 3232--3237.A.~Yamaguchi, J.~Takamatsu, and T.~Ogasawara, ``DCOB: Action space for reinforcement learning of high dof robots,'' Autonomous Robots, vol.~34, no.~4, pp. 327--346, 2013.J.~Kober, A.~Wilhelm, E.~Oztop, and J.~Peters, ``Reinforcement learning to adjust parametrized motor primitives to new situations,'' Autonomous Robots, vol.~33, pp. 361--379, 2012.S.~Levine, N.~Wagener, and P.~Abbeel, ``Learning contact-rich manipulation skills with guided policy search,'' in the IEEE International Conference on Robotics and Automation (ICRA'15), 2015.

組み合わせR.~S. Sutton, ``Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,'' in the Seventh International Conference on Machine Learning. Morgan Kaufmann, 1990, pp. 216--224.R.~S. Sutton, C.~Szepesv¥'{a}ri, A.~Geramifard, and M.~Bowling, ``Dyna-style planning with linear function approximation and prioritized sweeping,'' in Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, 2008, pp. 528--536.

Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013

YouTube:RL_MotionLearning (by myself)https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS

https://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html

https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS

強化学習

16

By J. Kober and J. PetersLearning Motor Primitives for Robotics (Ball-in-cup)http://www.ausy.tu-darmstadt.de/Research/LearningMotorPrimitiveshttps://www.youtube.com/watch?v=cNyoMVZQdYM

By P. Kormushev et al.Video: Robot Arm Wants Nothing More Than To Master The Art Of The Flapjack-Fliphttp://www.popsci.com/technology/article/2010-07/after-50-attempts-hard-working-flapjack-bot-learns-flip-pancakes-videohttp://programming-by-demonstration.org/showPubli.php?publi=3018https://vimeo.com/13387420#at=NaN

http://www.ausy.tu-darmstadt.de/Research/LearningMotorPrimitives

https://www.youtube.com/watch?v=cNyoMVZQdYM

http://www.popsci.com/technology/article/2010-07/after-50-attempts-hard-working-flapjack-bot-learns-flip-pancakes-video

http://programming-by-demonstration.org/showPubli.php?publi=3018

https://vimeo.com/13387420#at=NaN

その他のロボット学習E.~Magtanong, A.~Yamaguchi, K.~Takemura, J.~Takamatsu, and T.~Ogasawara, ``Inverse kinematics solver for android faces with elastic skin,'' in Latest Advances in Robot Kinematics, Innsbruck, Austria, 2012, pp. 181--188.

17

DRCの話

18

DARPA Robotics Challenge (DRC)

19

DARPA Robotics Challenge Finals: Rules and Coursehttp://spectrum.ieee.org/automaton/robotics/humanoids/drc-finals-course

DARPA Robotics Challenge (DRC)http://www.darpa.mil/program/darpa-robotics-challenge

DRC Trialshttp://archive.darpa.mil/roboticschallengetrialsarchive/

DRC Finalshttp://www.theroboticschallenge.org/

オペレータルーム

http://spectrum.ieee.org/automaton/robotics/humanoids/drc-finals-course

http://www.darpa.mil/program/darpa-robotics-challenge

http://archive.darpa.mil/roboticschallengetrialsarchive/

http://www.theroboticschallenge.org/

WPI-CMU DRC Finals Day 1: Time Lapse X20https://www.youtube.com/watch?v=AvyGzqwOPSM

https://www.youtube.com/watch?v=AvyGzqwOPSM

https://www.youtube.com/watch?v=AvyGzqwOPSM

21

Trials 2013 Dec Finals 2015 Jun• 8 KAIST

• 8 IHMC

• 8 CHIMP

• 7 NimbRo

• 7 RoboSimian

• 7 MIT

• 7 WPI-CMU

• 6 DRC-HUBO UNLV

• 5 TRACLabs

• 5 AIST-NEDO

• 4 NEDO-JSK

• 27 Schaft

• 20 IHMC

• 18 CHIMP

• 16 MIT

• 14 RoboSimian

• 11 TRACLabs

• 11 WPI-CMU

• 9 Trooper

• 8 Thor

• 8 Vigir

• 8 KAIST

• 3 HKU

• 3 DRC-HUBO-UNLV

Team WPI-CMU: Darpa Robotics Challengehttp://www.cs.cmu.edu/~cga/drc/(cmu-drc-final-public.zip)

http://www.cs.cmu.edu/~cga/drc/

22

DRC finals – teams:http://www.theroboticschallenge.org/teams

http://www.theroboticschallenge.org/teams

使われた技術の例(WPI-CMU)

23

・Did well (14/16 points over 2 days, drill)・Did not fall・Did not require physical human intervention歩行制御・不整地はLRFで検出・複数レベル（階層型）の最適化・フットステップの最適化・軌道の最適化・最適化ベースの逆動力学（QPを全身に対して）

LIPM Trajectory Optimization

Team WPI-CMU: Darpa Robotics Challengehttp://www.cs.cmu.edu/~cga/drc/(cmu-drc-final-public.zip, dw1.pptx)


使われた技術の例(AIST-NEDO)

24

Shin’ichiro Nakaoka, Mitsuharu Morisawa, Kenji Kaneko, Shuuji Kajita and Fumio Kanehiro,"Development of an Indirect-type Teleoperation Interface for Biped Humanoid Robots“, 2014http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7028105

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7028105

A Compilation of Robots Falling Down at the DARPA Robotics Challengehttps://www.youtube.com/watch?v=g0TaYhjpOfo

https://www.youtube.com/watch?v=g0TaYhjpOfo

https://www.youtube.com/watch?v=g0TaYhjpOfo

なぜ多くのロボットが失敗したかAISTの1日目：ドア開けの直前に転倒

・・・・・・

AISTの2日目：不整地歩行終了間際に転倒・・・・・・

東大 – JAXONバルブの誤検出により，掴んでいると想定しているのに実際には掴んでおらずに動作を実行し，結果肘がバルブにぶつかった

WPI-CMU1日目にドリルを落としたのは，ステートマシンのバグ2日目にドリルを落としたのは，腕のアクチュエータがオーバーヒートしてダウンしたため

オペレータの操作ミスIHMC, CHIMP, MIT, WPI-CMU, …

ref. What Happened at the DARPA Robotics Challenge? http://www.cs.cmu.edu/~cga/drc/events/ref. DARPA ロボティクスチャレンジ Finals 2015 http://akihikoy.net/notes/?article%2FDRC-finals-2015cf. 日経Robotics 2015年8月号(No.01), 9月号(No.02) http://techon.nikkeibp.co.jp/ROBO/

26

http://www.cs.cmu.edu/~cga/drc/events/

http://akihikoy.net/notes/?article%2FDRC-finals-2015

http://techon.nikkeibp.co.jp/ROBO/

学習で改善できるか

可能性としてはYES失敗要因が複合的でそれほど簡単ではないロボットのモデル化誤差→これは学習でどうにか（多くのRobot learning の研究が扱っている）ロボットの関節がオーバーヒートしてダウン→ダウンするモデルを学習できる可能性はある + プラニングで避ける（ハードを改良する方が早い； cf. JAXONは液冷を採用）環境の計測誤差→エラーの確率分布が複雑プログラムのバグ→バグの程度によるが一般的に複雑

↑そもそもAI・機械学習・制御以前の課題が多い

失敗の「サンプル」も多くを手に入れるのは難しい転倒しても壊れないロボットを作っている人も

27

DRC とAIどのくらい自律性があったか少なくともオペレータルームとの通信遅延・帯域制限を補えるくらいにはモーションプラニングは多くのチームが使っていたしかしオペレータの指示は重要だった（どこに行くか，何をするか e.g. バルブをつかむ，操作対象の大まかな位置 e.g. バルブの位置）

どのくらい学習技術が使われたかロボット学習のスペシャリストも参加していた（e.g. Christopher Atkeson, Russ Tedrake）C. G. Atkeson et al.: “NO FALLS, NO RESETS: Reliable Humanoid Behavior in the DARPA Robotics Challenge” http://www.cs.cmu.edu/~cga/drc/paper14.pdf"The absence of horizontal force and yaw torque sensing in the Atlas feet limited our ability to avoid foot slip, reduce the risk of falling, and optimize gait using learning.“"Learning to plan better is hard. To be computationally feasible for real-time control, a cascade of smaller optimizations is usually favored, which for us and many others eventually boiled down to reasoning about long term goals using a simplified model and a one-step constrained convex optimization that uses the full kinematics and dynamics"

どのくらい汎用性が高いAI技術が使われたかモーションプラニング最適化アルゴリズム

28

http://www.cs.cmu.edu/~cga/drc/paper14.pdf

DRCから研究者が学んだこと手すりや壁を使うロボットはいなかった手すりや壁を使ってロボットを支えると安定化する最近，Multi contact planning の話題が増えてきているcf. Humanoids 2015 technical program:

http://www.humanoids2015.org/sub/sub03_12.asp Humanoids 2015 Workshop on Whole-Body Multi-Task Multi-Contact Humanoid Control

http://cs.stanford.edu/groups/manips/humanoid2015/index.html

IKもまだまだ改良が必要オペレータとロボットの協調(HRI)は重要オペレータ（ロボティクスのプロ）も多くのミスをした！ソフトウェアはオペレータの失敗を検出できるべき

センサ・状態推定は（AI・制御よりも？）重要手首，膝にもカメラをつけよ

オーバーヒートの扱いは非常に重要（SCHAFT：水冷，Hubo：空冷，Atlas：手首のモータがしょっちゅうオーバーヒートしていた）失敗からの復帰を考慮したデザインは重要

ref. Humanoids 2015 Panel: Lessons Learned, http://www.cs.cmu.edu/~cga/drc/29

http://www.humanoids2015.org/sub/sub03_12.asp

http://cs.stanford.edu/groups/manips/humanoid2015/index.html


なんでこんなにギャップがあるの？課題は山積み – AI・機械学習とロボティクスのそれぞれに状態空間・行動空間が膨大制約条件の多様性サンプルが十分集まらないIKの課題接触力のモデル化Multi contact planning: 全身のバランスを考えながらだとすごく難しい柔軟物のモデル化・操作多様性の考慮・・・

クリティカルな解法は？→ 「よくわからない」わかっているのは「魔法の手法」はないこと「魔法の手法」を信じている人はディープラーニングに期待するが・・・30

AI・機械学習の研究者が考えていること

ロボティクスの研究者が考えていること

再びAIの話

31

で，ディープラーニングって何？（層が深い？）ニューラルネット多くの機械学習の大会を制覇．応用：画像認識，音声認識，翻訳，・・・なぜ（層が深いと）うまくいくかよくわかってないらしい

Cf. Deep v.s. shallow:Lei Jimmy Ba, Rich Caruana: Do Deep Nets Really Need to be Deep?, NIPS 2014.

何が成功の秘訣？Convolution (畳み込み)Dropout (確率的に隠れ層の出力を無視) →過学習を防止ReLU (Linear Rectified Unit; max(x,0)) が良かった？非線形のアクティベーション関数LSTM (RNN)（Pre-training (隠れ層の事前学習)・Auto Encorder→層が深い場合の学習テクニック）ビッグデータ e.g. ImageNet

何がすごいの？これまで：画像→特徴量抽出→ニューラルネットDNN (Deep Neural Network)：画像→ニューラルネット特徴量抽出のデザインが不要になった（ただしConvolution などの細工は必要？）

Jurgen Schmidhuber: Deep Learning in Neural Networks: An Overview, Technical Report IDSIA-03-14 / arXiv:1404.7828 v2 [cs.NE], 2014. http://arxiv.org/abs/1404.7828深層学習基本語彙（ 40 分で！図付き！） by NAIST http://www.phontron.com/slides/neubig14deeplunch11-ja.pdfLarge Scale Deep Learning by Jeff Dean (Google)

http://static.googleusercontent.com/media/research.google.com/ja//people/jeff/CIKM-keynote-Nov2014.pdfHinton (talk): Brains, Sex, and Machine Learning https://youtu.be/DleXA5ADG78岡谷貴之 (PFN), ディープラーニングと画像認識 --基礎と最近の動向-- http://www.orsj.or.jp/archive2/or60-

4/or60_4_198.pdf32

http://arxiv.org/abs/1404.7828

http://www.phontron.com/slides/neubig14deeplunch11-ja.pdf

http://static.googleusercontent.com/media/research.google.com/ja/people/jeff/CIKM-keynote-Nov2014.pdf

https://youtu.be/DleXA5ADG78

http://www.orsj.or.jp/archive2/or60-4/or60_4_198.pdf

Deep Q network – DNN X 強化学習V. Mnih, et al.: Playing Atari with Deep Reinforcement Learning, NIPS Deep Learning Workshop, 2013.

Fitted Q iteration: 安定な行動価値関数(Q(x,a))の学習手法；任意の回帰手法を行動価値関数の近似器としてそのまま使える

Damien Ernst, Pierre Geurts, and Louis Wehenkel: Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, Vol.6, pp.503-556, 2005.

Neural Fitted QI: Fitted QIのニューラルネットを使った派生版Martin Riedmiller: Neural fitted Q iteration -- first experiences with a data efficient neural reinforcement learning method, In 16th European Conference on Machine Learning, pp.317-328 2005.

DQN: Fitted Q iteration の関数近似器にDNNを使った（だけ）DNNのおかげで，入力の状態を画像のように高次元にしても学習できた（DNNの性質をうまく利用）逆に，行動空間の複雑性はそれほどないことに注意（ロボットの難しいタスクでそのまま使えるかはかなり疑問）

33

入力：84 x 84の画像をx 4フレーム分

出力：コマンド(行動)ごとの価値

Q(x,a) = Q(画像列,コマンド)を学習(i.e. 価値関数ベースの強化学習)

ディープラーニング成功の背景と教訓

計算機の進歩とビッグデータが成功の理由との見方もあるが，画像認識の研究と機械学習の研究の積み重ねがあり，

Convolution Layer は特徴点抽出を汎化したもの（と見られる）

一連の研究が結びついて，DLの成功へつながったと考える

汎用性が高い（より知的な？）手法は一夜にして生まれるのではなく，多数の問題をひとつひとつ解決した積み重ねから生まれる→ロボットでも「使える」AIを作るためには，DRCのようなタスクに挑戦し問題を発掘・解決することを積み重ねる必要がある

34

35

これからのAIとロボットの方向性は？

Deep learning

画像認識

音声認識翻訳

自然言語処理

知識ベース

推論

機械学習

個々の「挑戦的な」課題を解くことを大事にするAI・機械学習の手法で解決できないロボティクスの問題が山積み→そこに問題が潜んでいるはず

手法ドリブンより問題ドリブンコネクションを意識フォーカスしていない問題に時間をかけすぎていないか？将来的に研究がどう広がっていくか？

Beetzらの研究Lars Kunze, Michael Beetz: Envisioning the qualitative effects of robot manipulation actions using simulation-based projections, Artificial Intelligence, 2014.Karinne Ramirez-Amaro and Michael Beetz and Gordon Cheng: Transferring skills to humanoid robots by extracting semantic representations from observations of human activities, Artificial Intelligence, 2015.The RoboHow projecthttps://robohow.eu/videos

36 The RoboHow project https://youtu.be/0eIryyzlRwA

https://robohow.eu/videos

https://youtu.be/0eIryyzlRwA

Robot pouringAkihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa

Ogasawara: Pouring Skills with Planning and Learning Modeled from Human Demonstrations, International Journal of Humanoid Robotics, Vol.12, No.3, pp.1550030, July, 2015.http://akihikoy.net/info/wdocs/Yamaguchi,Atkeson,2015-Pouring%20Skills%20with%20Planning%20and%20Learning..-IJHR.pdfvideo: https://www.youtube.com/watch?v=GjwfbOur3CQAkihiko Yamaguchi, Christopher G. Atkeson: Differential

Dynamic Programming with Temporally Decomposed Dynamics, in Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids2015), Seoul, 2015.https://www.researchgate.net/publication/282157952_Differential_Dynamic_Programming_with_Temporally_Decomposed_Dynamicsvideo: https://youtu.be/OrjTHw0CHew

39

http://reflectionsintheword.files.wordpress.com/2012/08/pouring-water-into-glass.jpg

http://schools.graniteschools.org/edtech-canderson/files/2013/01/heinz-ketchup-old-bottle.jpg

http://old.post-gazette.com/images2/20021213hosqueeze_230.jpg http://img.diytrade.com/cdimg/1352823/17809917/

0/1292834033/shampoo_bottle_bodywash_bottle.jpghttp://www.nescafe.com/upload/golden_roast_f_711.png

http://akihikoy.net/info/wdocs/Yamaguchi,Atkeson,2015-Pouring Skills with Planning and Learning..-IJHR.pdf

https://www.youtube.com/watch?v=GjwfbOur3CQ

https://www.researchgate.net/publication/282157952_Differential_Dynamic_Programming_with_Temporally_Decomposed_Dynamics

https://youtu.be/OrjTHw0CHew

Guputaらの研究Lerrel Pinto, Abhinav Gupta: Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours, arXiv:1509.06825 [cs.LG].http://arxiv.org/abs/1509.06825Supersizing Self-supervision: Learning to grasp from 50K Tries and 700 Robot Hourshttps://www.youtube.com/watch?v=oSqHc0nLkm8

40

http://arxiv.org/abs/1509.06825

https://www.youtube.com/watch?v=oSqHc0nLkm8

Abbeelらの研究{Levine}, S. and {Finn}, C. and {Darrell}, T. and {Abbeel}, P.: End-to-End Training of Deep Visuomotor

Policies, arXiv:1504.00702, 2015.Sergey Levine and Nolan Wagener and Pieter Abbeel: Learning Contact-Rich Manipulation Skills with

Guided Policy Search, ICRA 2015.Jeremy Maitin-Shepard and Marco Cusumano-Towner and Jinna Lei and Pieter Abbeel: Cloth Grasp Point

Detection based on Multiple-View Geometric Cues with Application to Robotic Towel Folding, ICRA 2010.

41

Test on a pile of 5 randomly-dropped towels (50X)https://www.youtube.com/watch?v=gy5g33S0Gzo

https://www.youtube.com/watch?v=gy5g33S0Gzo

議論

42

なぜこれは起きたか？43

http://i.imgur.com/V2u11ZP.gifv

http://i.imgur.com/V2u11ZP.gifv

ロボットは嘘をつくか？

殺人ロボットを禁止するべきか？反対サイドの意見 - “No, we should not ban autonomous weapons”http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/we-should-not-ban-killer-robots賛成サイドの意見 - “Yes, we should ban autonomous weapons”http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/why-we-really-should-ban-autonomous-weapons

ロボットは人の仕事を奪うか？YESだとして，それはネガティブなこと？(不幸を回避するにはどうすればいい？)

44

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/we-should-not-ban-killer-robots

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/why-we-really-should-ban-autonomous-weapons

45

https://www.youtube.com/watch?v=dIF-Ho_v-Nc

https://www.youtube.com/watch?v=dIF-Ho_v-Nc

図の出典

46

Team WPI-CMU: Darpa Robotics Challengehttp://www.cs.cmu.edu/~cga/drc/(cmu-drc-final-public.zip)

http://blog.fashionsealhealthcare.com/ibm-watson-impacting-healthcare

http://scyfer.nl/wp-content/uploads/2014/05/Deep_Neural_Network.png

http://www.darpa.mil/DDM_Gallery/DARPARoboticsChallenge-RobotTask-619-316.jpg


Lars Kunze, Michael Beetz: Envisioning the qualitative effects of robot manipulation actions using simulation-based projections, Artificial Intelligence, 2014.

Artificial neural networkhttps://en.wikipedia.org/wiki/Artificial_neural_network


http://blog.fashionsealhealthcare.com/ibm-watson-impacting-healthcare

http://scyfer.nl/wp-content/uploads/2014/05/Deep_Neural_Network.png

http://www.darpa.mil/DDM_Gallery/DARPARoboticsChallenge-RobotTask-619-316.jpg



ヒューマノイドロボット頂上決戦と人工知能：実践と理論のギャップを探る...

Technology