lifelong learning - stanford...

Post on 04-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS 330

Lifelong Learning

Logistics

2

Project milestone due Wednesday.

Two guest lectures next week!

Jeff Clune Sergey Levine

Plan for Today

3

The lifelong learning problem statement

Basic approaches to lifelong learning

Can we do better than the basics?

Revisiting the problem statement from the meta-learning perspective

4

A brief review of problem statements.

Meta-LearningGiven i.i.d. task distribu0on, learn a new task efficiently

quickly learn new task

learn to learn tasks

Mul8-Task Learning

Learn to solve a set of tasks.

perform taskslearn tasks

5

In contrast, many real world se@ngs look like:

Meta-Learninglearn to learn tasks

quickly learn new task

Mul8-Task Learningperform taskslearn tasks

0me

- a student learning concepts in school - a deployed image classifica8on system learning from a

stream of images from users - a robot acquiring an increasingly large set of skills in

different environments - a virtual assistant learning to help different users with

different tasks at different points in 0me - a doctor’s assistant aiding in medical decision-making

Some examples:

Our agents may not be given a large batch of data/tasks right off the bat!

Sequen8al learning se@ngs

online learning, lifelong learning, con0nual learning, incremental learning, streaming data

dis0nct from sequence data and sequen8al decision-making

Some Terminology

6

1. Pick an example se@ng.

2. Discuss problem statement with your neighbor:

(a) how would you set-up an experiment to develop & test your algorithm?

(b) what are desirable/required proper0es of the algorithm? (c) how do you evaluate such a system?

What is the lifelong learning problem statement?

A. a student learning concepts in school B. a deployed image classifica8on system learning from a

stream of images from users C. a robot acquiring an increasingly large set of skills in

different environments D. a virtual assistant learning to help different users with

different tasks at different points in 0me E. a doctor’s assistant aiding in medical decision-making

Example seTngs:

Exercise:

7

Some considera0ons:

- computa8onal resources

- memory

- model performance

- data efficiency

Problem varia0ons:

- task/data order: i.i.d. vs. predictable vs. curriculum vs. adversarial

- others: privacy, interpretability, fairness, test 0me compute & memory

- discrete task boundaries vs. con8nuous shiVs (vs. both)

- known task boundaries/shiVs vs. unknown

Substan0al variety in problem statement!

What is the lifelong learning problem statement?

8

General [supervised] online learning problem:

What is the lifelong learning problem statement?

for t = 1, …, n

observe xt

predict ̂yt

observe label yt

i.i.d. setting: xt ∼ p(x), yt ∼ p(y |x)

not a function of p t

streaming setting: cannot store (xt, yt)- lack of memory - lack of computational resources - privacy considerations - want to study neural memory mechanisms

otherwise: xt ∼ pt(x), yt ∼ pt(y |x)

true in some cases, but not in many cases!- recall: replay buffers

<— if observable task boundaries: observe xt, zt

9

What do you want from your lifelong learning algorithm?

minimal regret (that grows slowly with )t

regret: cumula0ve loss of learner — cumula0ve loss of best learner in hindsight

(cannot be evaluated in prac0ce, useful for analysis)

RegretT :=T

∑1

ℒt(θt) − minθ

T

∑1

ℒt(θ)

10

Regret that grows linearly in is trivial.t Why?

posi1ve & nega1ve transfer

What do you want from your lifelong learning algorithm?

posi8ve forward transfer: previous tasks cause you to do be[er on future tasks

compared to learning future tasks from scratch

posi8ve backward transfer: current tasks cause you to do be[er on previous tasks

compared to learning past tasks from scratch

posi8ve -> nega8ve : beMer -> worse

11

Plan for Today

12

The lifelong learning problem statement

Basic approaches to lifelong learning

Can we do better than the basics?

Revisiting the problem statement from the meta-learning perspective

Store all the data you’ve seen so far, and train on it.

Approaches

—> follow the leader algorithm

Take a gradient step on the datapoint you observe. —> stochas8c gradient descent

+ will achieve very strong performance

- computa8on intensive

- can be memory intensive

—> Con8nuous fine-tuning can help.

[depends on the applica0on]

+ computa0onally cheap+ requires 0 memory- subject to nega8ve backward transfer

“forgeTng”some0mes referred to as catastrophic forgeTng

- slow learning

13Can we do beMer?

Plan for Today

14

The lifelong learning problem statement

Basic approaches to lifelong learning

Can we do better than the basics?

Revisiting the problem statement from the meta-learning perspective

Case Study: Can we use meta-learning to accelerate online learning?

15

motor malfunctiongradual terrain change

time

online adaptation = few-shot learning tasks are temporal slices of experience

Recall: model-based meta-RL

16

motor malfunctiongradual terrain change

time

icy terrain

k time steps not sufficient to learn entirely new terrain

Continue to run SGD?

example online learning problem

+ will be fast with MAML initialization- what if ice goes away? (subject to forgetting)

17Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19

time

Online inference problem: infer latent “task” variable at each time step

Note: If neural net is random initialized, this procedure would be too slow.

Alternate between:

M-step: Update mixture of network parameters

Mixture of neural networks over task variable T, adapted continually:

E-step: Estimate latent “task” variable at each time step given data

prior

gradient step on each mixture element, weighted by task probability

likelihood of the dataunder task .

P (Tt = Ti|xt,yt) / p✓(Ti)(yt|xt, Tt = Ti)P (Tt = Ti)<latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uuat66tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="HYuehfr9ScUR7ZI+Dh5V0LkyYlA=">AAAGKnicxVRLb9NAEHZLAiUU2nLlMqKKSFRaxVxAoEpIcEDiUqS+pGyw1ut1sur6od1xaeRufxQX/gcnOHAAIX4H6zya2InUE2Ily7Mz334z8+3DT6XQ2Ol8X1m9VavfvrN2t3Fv/f6Djc2t9WOdZIrxI5bIRJ36VHMpYn6EAiU/TRWnkS/5iX/2poifnHOlRRIf4jDlvYj2YxEKRtG6vK3a+ybJWySiOPDDXBsPn8J0RkczNe+ygBx3XNMGYhpNkgpotggLkgKCA460bb3XWGh6BPkF5sg1mlmkIIZ9CMd5GZX5WzNFKipiU0qIr5Zx2zp2z16iKZU78TWaBzPqQ8tQrBxReFgKiHajgrRllQBwec1/URJnWNACSVWSYgKpl48TVOhNax5/uYRqmni/vBBuqMt2BCdKIPIYkgwhVSJR1kckD7ELRGeRVePJvms+WlFcU+GzIVPNuAMk4LLaAoz3BYSGmH8ybUJlOqAARIn+AHs2peJRcs7tL8jigMZsuCD/PxR1TtL2f2h+cqpGd6JSZdH08jMHu0B8G7hpj2+SKaa+pFONFrIYK0fSn4m4BFA+nKVkbbi6ImGiqJTlomZ3cLEmb3O7s9cZDVg03Imx7UzGgbf5lQQJyyIeI5NU667bSbGXU4WCSW4aJNM8peyM9nnXmjGNuO7lo1fPPi7WE4Ct0X4xwsg7vyKnkdbDyLfIokZdjRXOZbFuhuGLXi7iNLO3i40ThZkEeySLJxQCoThDObQGZfYOCgZsQBVlaB/ahhXBrba8aBw/23Ot/aHjrDmPnMdOy3Gd585r551z4Bw5rPa59q32s/ar/qX+o/57LNfqykS3h05p1P/8BQlyJAo=</latexit><latexit sha1_base64="HYuehfr9ScUR7ZI+Dh5V0LkyYlA=">AAAGKnicxVRLb9NAEHZLAiUU2nLlMqKKSFRaxVxAoEpIcEDiUqS+pGyw1ut1sur6od1xaeRufxQX/gcnOHAAIX4H6zya2InUE2Ily7Mz334z8+3DT6XQ2Ol8X1m9VavfvrN2t3Fv/f6Djc2t9WOdZIrxI5bIRJ36VHMpYn6EAiU/TRWnkS/5iX/2poifnHOlRRIf4jDlvYj2YxEKRtG6vK3a+ybJWySiOPDDXBsPn8J0RkczNe+ygBx3XNMGYhpNkgpotggLkgKCA460bb3XWGh6BPkF5sg1mlmkIIZ9CMd5GZX5WzNFKipiU0qIr5Zx2zp2z16iKZU78TWaBzPqQ8tQrBxReFgKiHajgrRllQBwec1/URJnWNACSVWSYgKpl48TVOhNax5/uYRqmni/vBBuqMt2BCdKIPIYkgwhVSJR1kckD7ELRGeRVePJvms+WlFcU+GzIVPNuAMk4LLaAoz3BYSGmH8ybUJlOqAARIn+AHs2peJRcs7tL8jigMZsuCD/PxR1TtL2f2h+cqpGd6JSZdH08jMHu0B8G7hpj2+SKaa+pFONFrIYK0fSn4m4BFA+nKVkbbi6ImGiqJTlomZ3cLEmb3O7s9cZDVg03Imx7UzGgbf5lQQJyyIeI5NU667bSbGXU4WCSW4aJNM8peyM9nnXmjGNuO7lo1fPPi7WE4Ct0X4xwsg7vyKnkdbDyLfIokZdjRXOZbFuhuGLXi7iNLO3i40ThZkEeySLJxQCoThDObQGZfYOCgZsQBVlaB/ahhXBrba8aBw/23Ot/aHjrDmPnMdOy3Gd585r551z4Bw5rPa59q32s/ar/qX+o/57LNfqykS3h05p1P/8BQlyJAo=</latexit><latexit sha1_base64="ekeP9cfXZZoDKmiLpGft+htEIbE=">AAAGNXicxVRLbxMxEN6WBEp4tIUjlxFVRKLSKssFBKpUCQ5IXIroS4rDyut4E6veh+zZ0mjrit/Ehf/BCQ4cQIgrfwFvHk12E6knhKXVjmc+fzPz+eEnUmhstb4tLV+rVK/fWLlZu3X7zt3VtfV7hzpOFeMHLJaxOvap5lJE/AAFSn6cKE5DX/Ij/+RlHj865UqLONrHQcI7Ie1FIhCMonV565U3dZI1SEix7weZNh4+hsmMDmdq1mUBGW66pgnE1OokEVBvENaNcwj2OdKm9V5ioe4R5GeYIddoppGcGHYgGOVlVGavzASpqIhMISG+WMRt69g6eY6mUO7YV6vvTan3LUO+ckjhYSEgmrUS0pZVAMD5Jf9ZQZxBTgskUXGCMSReNkpQojeNWfz5AqpJ4p3iQriiLtsRHCmByCOIU4REiVhZH5E8wDYQnYZWjUc7rnlvRXFNic+GTDnjJpAul+UWYLQvIDRE/INpEiqTPgUgSvT62LEpFQ/jU25/3TTq0ogN5uT/h6LOSNr8D82PT9XwTpSqzJtefOZgC4hvA1ft8VUyRdSXdKLRXBZj5Yh7UxEXAIqHs5CsCRcXJIgVlbJY1PQOztfkrW20tlvDAfOGOzY2nPHY89a+kG7M0pBHyCTVuu22EuxkVKFgkpsaSTVPKDuhPd62ZkRDrjvZ8NWzj4v1dMHWaL8IYeidXZHRUOtB6FtkXqMux3Lnolg7xeBZJxNRktrbxUaJglSCPZL5EwpdoThDObAGZfYOCgasTxVlaB/amhXBLbc8bxw+2Xat/ba1sfvu40iOFeeB89BpOK7z1Nl1Xjt7zoHDKp8qXys/Kj+rn6vfq7+qv0fQ5aWxhPedwqj++QvgriY7</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AAAGNXicxVRLbxMxEN6WBEp4tXDkMqKKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBBX/gLePJrsJlJPCEurHc98/mbm88NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CCYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5uu+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9ddOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFFFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj11v9QroxS0MeIZNU67bbSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHHweMu19tsn6zvvPo7kWHHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit>

Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘1918Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19

Crawler with crippled legs

Does it work? online learning w. MAML initialization SGD w. MAML initialization MAML (always reset to prior + 1 grad step) model-based, no adaptation model-based, grad steps

Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19

no meta-learning

19Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19

Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19

Latent task distribution during online learningDoes it work?

Crawler with crippled legs

20Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19

Case Study: Can we modify vanilla SGD to avoid nega0ve backward transfer?

21

(from scratch)

Idea:

22Lopez-Paz & Ranzato. Gradient Episodic Memory for Continual Learning. NeurIPS ‘17

(1) store small amount of data per task in memory

(2) when making updates for new tasks, ensure that they don’t unlearn previous tasks

How do we accomplish (2)?

memory: for task ℳk zk

For t = 0,...,T

minimize ℒ( fθ( ⋅ , zt) , (xt, yt) )

subject to for all ℒ( fθ , ℳk ) ≤ ℒ( f t−1θ , ℳk ) zk < zt

learning predictor yt = fθ(xt, zt)

(i.e. s.t. loss on previous tasks doesn’t get worse)

Can formulate & solve as a QP.

Assume local linearity: ⟨gt, gk⟩ := ⟨ ∂ℒ( fθ , (xt, yt) )

∂θ,

ℒ( fθ , ℳk )∂θ ⟩ ≥ 0 for all zk < zt

23Lopez-Paz & Ranzato. Gradient Episodic Memory for Continual Learning. NeurIPS ‘17

Experiments

If we take a step back… do these experimental domains make sense?

BWT: backward transfer, FWT: forward transfer

- MNIST permuta0ons - MNIST rota0ons - CIFAR-100 (5 new classes/task)

Problems:

Total memory size: 5012 examples

Can we meta-learn how to avoid nega0ve backward transfer?

24

Javed & White. Meta-Learning Representa3ons for Con3nual Learning. NeurIPS ‘19

Plan for Today

25

The lifelong learning problem statement

Basic approaches to lifelong learning

Can we do better than the basics?

Revisiting the problem statement from the meta-learning perspective

More realis8cally:

learn learn learn learn learnlearn

slow learning rapid learning

learn

0me

What might be wrong with the online learning formula0on?

Online Learning(Hannan ’57, Zinkevich ’03)

Perform sequence of tasks while minimizing sta0c regret. 0me

perform perform perform perform perform performperform

zero-shot performance

26

Online Learning(Hannan ’57, Zinkevich ’03)

Perform sequence of tasks while minimizing sta0c regret.

(Finn*, Rajeswaran*, Kakade, Levine ICML ’18)

Online Meta-LearningEfficiently learn a sequence of tasks from a non-sta0onary distribu0on.

0me

learn learn learn learn learn learnlearn

0me

perform perform perform perform perform performperform

zero-shot performance

evaluate performance aQer seeing a small amount of data

What might be wrong with the online learning formula0on?

27

Primarily a difference in evalua&on, rather than the data stream.

The Online Meta-Learning Se=ng

RegretT :=TX

t=1

`t(�t(✓t))�min✓2⇥

TX

t=1

`t(�t(✓))<latexit sha1_base64="2+KP9DCIWvggRsB3gBx4m2dSNKQ=">AAACu3ichVFNbxMxEPUuX6UFmvJx4mKRRWoPRLvhQIUUqRIcOHAIKGkrZcPK650kprZ3ZY+LotWK38kP4H/gTfZAUyRGsvz0Zt6M/SavpLAYx7+C8M7de/cf7D3cP3j0+Mlh7+jpuS2d4TDlpSzNZc4sSKFhigIlXFYGmMolXORXH9r8xTUYK0o9wXUFc8WWWiwEZ+iprPczimiqGK6Mqr/C0gA22YS+H9HUOpXVOEqabxOagpQZHqfjlWgvXAGyDE9O6BsvFjqrtxRNhabppIXN/xt4eRRlvX48iDdBb4OkA33SxTg7CqK0KLlToJFLZu0siSuc18yg4BKa/dRZqBi/YkuYeaiZAjuvN0419LVnCroojT8a6Yb9W1EzZe1a5b6yNcXu5lryX7mZw8XpvBa6cgiabwctnKRY0tZ2WggDHOXaA8aN8G+lfMUM4+iXc2MKaKcEgvI/0fCDl0oxXdQp/9zUrYs7tCsq30X5HApZQLeIpvG+Jrsu3gbnw0HydjD8Muyffewc3iMvyStyTBLyjpyRT2RMpoST38FB8Dx4EY5CHn4P5bY0DDrNM3IjQvcHPI7YVw==</latexit>

Goal: Learning algorithm with sub-linear

Loss of algorithmLoss of best algorithm

in hindsight

28

for task t = 1, …, n

observe 𝒟trt

use update procedure to produce parameters Φ(θt, 𝒟trt ) ϕt

observe label yt

observe xt

predict ̂yt = fϕt(xt) Standard online learning se@ng

(Finn*, Rajeswaran*, Kakade, Levine ICML ’18)

29

Store all the data you’ve seen so far, and train on it.

Recall the follow the leader (FTL) algorithm:

Follow the meta-leader (FTML) algorithm:

Can we apply meta-learning in lifelong learning seTngs?

Store all the data you’ve seen so far, and meta-train on it.

Run update procedure on the current task.

Deploy model on current task.

What meta-learning algorithms are well-suited for FTML?

What if is non-sta0onary?pt(𝒯)

Experiment with sequences of tasks: - Colored, rotated, scaled MNIST - 3D object pose predic1on - CIFAR-100 classifica0on

Example pose predic0on tasks

plane

car

chair

Experiments

30

ExperimentsLe

arni

ng e

ffici

ency

(# d

atap

oint

s)

Task index

Rainbow MNIST Pose Predic8on

Task index

Rainbow MNIST Pose Predic8on

- TOE (train on everything): train on all data so far- FTL (follow the leader): train on all data so far, fine-tune on current task- From Scratch: train from scratch on each task

Lear

ning

pro

ficie

ncy

(err

or)

Comparisons:

31

Follow The Meta-Leader learns each new task faster & with greater proficiency,

approaches few-shot learning regime

32

TakeawaysMany flavors of lifelong learning, all under the same name.

Defining the problem statement is often the hardest part

Meta-learning can be viewed as a slice of the lifelong learning problem.

A very open area of research.

33

RemindersProject milestone due Wednesday.

Two guest lectures next week!

Jeff Clune Sergey Levine

top related