exploiting cognitive constraints to improve machine-learning memory models michael c. mozer...
TRANSCRIPT
![Page 1: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/1.jpg)
Exploiting Cognitive ConstraintsTo Improve Machine-Learning Memory Models
Michael C. MozerDepartment of Computer ScienceUniversity of Colorado, Boulder
![Page 2: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/2.jpg)
Why Care About Human Memory?
The neural architecture of human vision has inspired computer vision. Perhaps the cognitive architecture of memory can inspire the design of RAM systems.
Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.
E.g., selecting material for students to review to maximize long-term retention (Lindsey et al., 2014)
![Page 3: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/3.jpg)
The World’s Most Boring Task
Stimulus X -> Response aStimulus Y -> Response b
freq
uenc
y
response latency
response latency
![Page 4: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/4.jpg)
Sequential Dependencies
Dual Priming Model(Wilder, Jones, & Mozer, 2009; Jones, Curran, Mozer, & Wilder, 2013)
Recent trial history leads to expectation of next stimulus
Responses latencies are fast when reality matches expectation
Expectation is based on exponentially decaying traces of two different stimulus properties
![Page 5: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/5.jpg)
Examining Longer-Term Dependencies(Wilder, Jones, Ahmed, Curran, & Mozer, 2013)
![Page 6: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/6.jpg)
Declarative Memory
Cepeda, Vul, Rohrer, Wixted, & Pashler (2008)
study test
![Page 7: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/7.jpg)
Forgetting Is Influenced By The Temporal Distribution Of Study
Spaced study Massed studyproduces morerobust & durablelearning than
![Page 8: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/8.jpg)
Experimental Paradigm To Study Spacing Effect
![Page 9: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/9.jpg)
Cepeda, Vul, Rohrer, Wixted, & Pashler (2008)
Intersession Interval (Days)
% R
ecal
l
![Page 10: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/10.jpg)
Optimal Spacing Between Study Sessionsas a Function of Retention Interval
![Page 11: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/11.jpg)
Predicting The Spacing Curve
characterizationof student
and domain
intersessioninterval
MultiscaleContextModel
predictedrecall
forgetting afterone session
Intersession Interval (Days)
% R
ecal
l
![Page 12: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/12.jpg)
Multiscale Context Model(Mozer et al., 2009)
Neural network
Explains spacing effects
Multiple Time Scale Model(Staddon, Chelaru, & Higa, 2002)
Cascade of leaky integrators
Explains rate-sensitive habituation
Kording, Tenenbaum, Shadmehr (2007)
Kalman filter
Explains motor adaptation
![Page 13: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/13.jpg)
Key Features Of Models
Each time an event occursin the environment…
A memory of this eventis stored via multiple traces
Traces decay exponentiallyat different rates
Memory strength isweighted sum of traces
Slower scales are downweighted relative to faster scales
Slower scales store memory (learn) only when faster scales fail to predict event
trac
e st
reng
th
medium slowfast
+ +
![Page 14: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/14.jpg)
time
time
eventoccurrence
eventoccurrence
![Page 15: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/15.jpg)
Exponential Mixtures Scale Invariance➜
Infinite mixture of exponentials gives exactly power function
Finite mixture of exponentials gives good approximation to power function
With , can fit arbitrary power functions
+ + =
![Page 16: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/16.jpg)
Relationship To Memory Models In Ancient NN Literature
Focused back prop (Mozer, 1989), LSTM (Hochreiter & Schmidhuber, 1997)
Little/no decay
Multiscale backprop (Mozer, 1992), Tau net (Nguyen & Cottrell, 1997)
Learned decay constants
No enforced dominance of fast scales over slow scales
Hierarchical recurrent net (El Hihi & Bengio, 1995)
Fixed decay constants
History compression (Schmidhuber, 1992;Schmidhuber, Mozer, & Prelinger, 1993)
Event based, not time based
![Page 17: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/17.jpg)
Sketch of Multiscale Memory Module
xt: activation of ‘event’ in input to be remembered, in [0,1]
mt: memory trace strength at time t
Activation rule (memory update) based on error,
Activation rule consistent with the 3 models(for Koerding model, ignore KF uncertainty)
This update is differentiable ➜can back prop through memory module
Redistributes activation across time scales in a manner that is dependent on temporal distribution of input events
Could add output gate as well to make it even more LSTM-like
+
∆fixed
learned
+1
-1
xt
mt
![Page 18: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/18.jpg)
Sketch of Multiscale Memory Module
Pool of self-recurrent neurons with fixed time constants
Input is the response of a feature-detection neuron
This memory module stores the particular feature that is detected
When the feature is present, the memory updates Update depends on error between is a feature detected at time t
When feature detected, memory state compared to input, and a correction is made to memory to represent input strongly
+
∆fixed
learned
+1
-1
+1
![Page 19: Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder](https://reader036.vdocuments.net/reader036/viewer/2022062519/5697c0201a28abf838cd25f6/html5/thumbnails/19.jpg)
Why Care About Human Memory?
Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.
E.g., shopping patterns
E.g., pronominal reference
E.g., music preferences