research statement { chelsea finn { uc berkeley · research statement { chelsea finn { uc berkeley...

Research Statement – Chelsea Finn – UC BerkeleyThe ultimate goal of my research is to enable robots to be generalists, capable of going into any newenvironment and performing a wide variety of useful tasks. Such a capability would have a transformativeeffect on how and where robots can be deployed in our society. For example, we would no longer require anentire team of engineers to deploy robots in a new factory, nor specialized hardware for individual tasks such asharvesting strawberries or extracting pineapple juice. Furthermore, a generalist robot would acquire commonsense through experience from numerous tasks and environments, making it easier to acquire new skills andadapt in changing situations. Thus, I argue, robots that are generalists will be effective in unstructured,real-world environments unlike the specialist robots that exist today.

Hand-coding a robot’s behavior to handle a range of real-world skills and settings is an insurmountabletask. To become a generalist, a robot instead needs to learn from experience. When considering open-world settings, we can’t rely on human-designed representations for robustness, nor can we assume thatsimple linear models will suffice. For this reason, machine learning techniques using deep, flexible modelswith minimal hand-engineering have shown great success in open-world problems, such as visual and speechrecognition. However, these methods excel most in passive settings using large, labeled datasets, whereas,robotic learning typically requires active data collection with limited supervision, among other challenges.My work focuses particularly on robot learning methods suitable for using deep models with minimal hand-engineering. There are three capabilities which I will argue are important for developing robot generalists: (1)effectively learning skills from low-level perceptual inputs, (2) continually learning and adapting to changesin goals and the environment, and (3) inferring goals from humans in real-world settings. I will discuss thealgorithmic contributions that I have made on each of these fronts, as well as my research agenda towardsdeveloping generalist robots that can continually learn from low-level perceptual inputs in real-world settings.

Skill Learning from Raw Perceptual Inputs: The world is complex and diverse. At any given pointin time, a robot might consider interacting with a wide range of objects, rigid or deformable, or things thataren’t quite objects, like liquids or sand. A crucial question in such a diverse world is – what is the rightrepresentation of the world? The representation for a robot specialist can be hand-designed for a particular

Figure 1: Skills trained end-to-end to map raw sensory inputsto motor torques: rll.berkeley.edu/

deeplearningrobotics

task, whereas a sufficiently generic representation cannot, and insteadshould be acquired through experience, from the raw perceptual inputs,such as image pixels, joint encoder readings, and haptic signals.

In my work, I showed that, for the first time, it is possible for a robotto learn manipulation primitives from raw image pixels and joint encoderreadings, learning a representation suitable for visuomotor control [7, 5].For example, as shown in Figure 1, the robot learned to screw a cap ontobottle and insert a block into a shape-sorting cube, each using 2-3 hours ofexperience and computation. Moreover, using a learned representation,optimized for task performance, led to significantly higher task perfor-mance than using a conventional computer vision system. Thus, even foran individual task that is sufficiently challenging, learned representations lead to more robust behavior.

Continual Learning in the Real World: Most research on learning for robotic manipulation entailslearning a single skill in a single environment, starting from scratch. This is an important first step,

Figure 2: Self-supervised robotlearning: sites.google.com/site/

robotforesight

as discussed previously, but, to enable robots to learn many real-world skillsin a variety of environments, robots need to reuse experience across tasks andcontinually learn in real-world settings in which they are deployed. A keychallenge is learning about new, unknown environments without supervision.

In my research, I developed a method for learning predictive models of rawsensory observations, enabling robot learning from unsupervised “play” [2,3]. By developing deep models for video prediction that are suitable forcontrol, my work enabled robots to learn to push objects to goal positionsusing only unlabeled, raw data from the robots sensors, with minimal humaninvolvement and no object-level supervision. As seen in Figure 2, we appliedthis method to an array of ten robot arms to learn about a wide varietyof objects and how to interact with new, unseen objects. This represents an important first step towardsself-supervised learning in a variety of real-world settings.

rll.berkeley.edu/deeplearningrobotics

rll.berkeley.edu/deeplearningrobotics

sites.google.com/site/robotforesight

sites.google.com/site/robotforesight

Inferring Goals in Unstructured Environments: Many robotic learning methods rely on a scalarreward function to provide feedback for learning. These rewards usually require detailed information aboutthe world that is not readily available in real-world, unstructured settings. Thus, we need other means for

Figure 3: The robot infers the taskobjective from demonstration: rll.

berkeley.edu/gcl

providing learning objectives. In my work, we proposed the problem set-ting of semi-supervised reinforcement learning, where a reward is availablein some known environments, but not available in others [6]. A comple-mentary approach is to provide human demonstrations of a task, e.g. byguiding a robot arm through the correct motion, and then infer the objectiveunderlying the demonstrations (known as inverse reinforcement learning).Unfortunately, existing methods for inverse RL have largely been appliedto simple, low-dimensional problems with known dynamics and known taskfeatures. In my work, we developed a method for inverse RL that can scaleto robotic manipulation problems on real robots (see Figure 3) with un-known dynamics and reward functions with thousands of parameters [4],making inverse RL more applicable to real-world problems.

Research AgendaMy aim is to develop methods for robots to continually learn a variety of skills in any real-world environment.There are two areas that are critical to this goal and natural next steps given previous results: first, learningcomplex skills with long time horizons and second, continual learning by incorporating past experiences.Complex Skills: In my work, I have considered short, 5-second skills. Acquiring more complex skills withlonger time horizons will require more sophisticated computation than simple feedforward policies. In mywork, we have preliminary research in incorporating memory (e.g. [8]) and planning (e.g. [3]) when learningbehavior. In the future, I plan to explore ways for combining high-level predictive modeling, planning, andlearned low-level reactive policies. The key required contributions will be in acquiring the right representationfor planning and in determining the interface between high-level plans and low-level controls.Continual Learning: To continually learn in real-world settings, robots need to build upon what they havelearned previously, incorporating existing knowledge with new experiences when learning new skills. In someof my recent work, we have developed an algorithm enabling such incorporation of previous experience toquickly learn new related tasks [1]. Our results demonstrate, for the first time, the ability to effectively adaptsimulated robot behavior towards different goals in only one or a few gradient-based updates using very littleexperience. To enable continual improvement, I plan to build new algorithms that learn representations thatare not only useful for tasks related to those seen previously (as in [1]), but methods that quickly incorporateexperience from a very different task, implicitly drawing connections between previous, current, and futuretasks. This direction is important for agents to solve a curriculum of increasingly difficult tasks.

With systems capable of continual learning and adaptation from raw perceptual inputs, we will be closerto generalist robots with common sense. It is widely believed that interaction and embodiment are keycomponents of intelligence, perhaps even required for human-level intelligence. By developing learningmethods for generalist robots, I expect for advances to be applicable to other subfields of artificial intelligence,particularly those involving sequential decision making, such as dialog interaction, medicine, and education.Furthermore, I believe that advances in generalist robots will bring us significantly closer to developinghuman-level artificial intelligence, and perhaps even towards understanding human intelligence further.[1] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. International

Conference on Machine Learning (ICML), 2017.[2] C. Finn, I. Goodfellow, and S. Levine. Unsupervised learning for physical interaction through video prediction. Neural

Information Processing Systems (NIPS), 2016.[3] C. Finn and S. Levine. Deep visual foresight for planning robot motion. International Conference on Robotics and

Automation (ICRA), 2017.[4] C. Finn, S. Levine, and P. Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization. International

Conference on Machine Learning (ICML), 2016.[5] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel. Deep spatial autoencoders for visuomotor learning.

International Conference on Robotics and Automation (ICRA), 2016.[6] C. Finn, T. Yu, J. Fu, P. Abbeel, and S. Levine. Generalizing skills with semi-supervised reinforcement learning. Interna-

tional Conference on Learning Representations (ICLR), 2017.[7] S. Levine, C. Finn, T. Darrell, and P. Abbeel. Model-agnostic meta-learning for fast adaptation of deep networks. Journal

of Machine Learning Research (JMLR), 2016.[8] M. Zhang, Z. McCarthy, C. Finn, S. Levine, and P. Abbeel. Learning deep neural network policies with continuous memory

states. International Conference on Robotics and Automation (ICRA), 2016.

rll.berkeley.edu/gcl

rll.berkeley.edu/gcl

research statement { chelsea finn { uc berkeley · research statement { chelsea finn { uc berkeley...

Documents