overcoming temptation: theory and practice michael mozer computer science dept. and institute of...
TRANSCRIPT
![Page 1: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/1.jpg)
Overcoming Temptation:Theory and Practice
Michael MozerComputer Science Dept. and Institute of Cognitive Science
University of Colorado Boulder
Adrian F. WardMcCombs School of Business, University of Texas Austin
John LynchLeeds School of Business, University of Colorado Boulder
Brett Israelsen, Ian SmithComputer Science, University of Colorado Boulder
Shruthi SukumarElectrical & Computer Engineering, University of Colorado Boulder
Shabnam HakimiInstitute of Cognitive Science, University of Colorado Boulder
![Page 2: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/2.jpg)
Retirement Planning Fail
Among US 55-64 year old
62% have retirement assets
median savings for those who have assets: $42k
Pre-retirement defection in the US
For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans(Argento, Bryant, & Sabelhaus, 2014)
National Institute onRetirement Security
![Page 3: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/3.jpg)
Can Financial Education Change Behavior?
US Government and nonprofits spent $670M on financial education in 2013.
Financial education explain 0.1% of variance in financial outcomes (Fernandes, Lynch, & Netemeyer, 2015)
Social Sciences Finance0.0
0.2
0.4
0.6
0.8
1.0
Effectiveness of Educational In-terventions (r2)
LargeMediumSmall
Domain
Effec
t Siz
e (r
2)
r2 = .0011
![Page 4: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/4.jpg)
Behavioral Control Problem
Agent acts in the world
Some actions can lead to immediate pay offs
e.g., buy a new car
Other actions can lead to delayed pay offs
e.g., increase contributions to retirement account
How do you incentivize people to stay focused on the long-term?
![Page 5: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/5.jpg)
Other Domains
Dieting
Exercise
Cleaning house
Waiting for bus / elevator
Listening to a research talk
![Page 6: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/6.jpg)
Delay Discounting Paradigm
A way to quantify preference for now vs. later rewards
Find point of subjective indifference
Yields hyperbolic discounting
Would you rather have$100 now
or$X in Y days?
![Page 7: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/7.jpg)
Delayed Gratification Paradigm
Marshmallow Test (Mischel and Ebbeson, 1970)
Delay Discounting Delayed Gratificationone shot decision continuous decisionreveals intrinsic future value
future value confounded with grit
![Page 8: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/8.jpg)
Grit, Willpower, and Self Control
All refer to tendency to sustain interest and effort toward a goal
Grit
enduring personality trait
Willpower (= self control)
depends on grit but also varies as a function of mood, time of day, food and beverage intake, ego depletion
![Page 9: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/9.jpg)
Formalization Of Delayed Gratification Task
Choice at every instant to
grab small reward ⟸ end
wait for later large reward ⟸ continue
Finite-state machine (FSM) representation
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc
![Page 10: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/10.jpg)
What Is The Optimal Policy?Optimal policy chooses action at time t that maximizes cumulative summed reward
Or cumulative discounted reward
more discounting -> agent more likely to succumb to temptation
Form of discounting
Exponential vs. hyperbolic
![Page 11: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/11.jpg)
Value Function = Policy
In state S1
Choose action A if V(S2) > V(S3)
Choose action B otherwise
S1
S3
S2A
B
![Page 12: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/12.jpg)
Dynamic Programming
Efficient way of computing value function of optimal policy
The Delayed Gratification FSM has a particularly restricted structured leading to only a few possible state sequences ECECCECCCECCCCECCCCCECCCCCCECCCCCCCECCCCCCCCECCCCCCCCCE
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc
![Page 13: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/13.jpg)
Dynamic Programming
Dynamic programming finds the value function that satisfies
Depending on discount rate γ, this yields policy that either
ends at time 1
continues untiltime τ
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc
![Page 14: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/14.jpg)
Modeling Human Behavior
DP is not a good model of human behavior
People may wait a while and then succumb to temptation
If you test the same person in the same situation, they may not behave identically each time
What do we need to better model people?
Willpower!
W(t) ~ Gaussian(0,σ2)
σ2: grit
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc1 2 3 4 5 τ
κμe
small large
μc . . .
μe-w(1)μe-w(2) μe-w(3)
μe-w(t)
μc μc μc μc μc
![Page 15: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/15.jpg)
Willpower Model
State consists of
t: current time
w: agent’s current willpower level
Agent plans optimally given state {t,w}
Takes deterministic action
However, variability in behavior each time task is performed due to fluctuations in w
![Page 16: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/16.jpg)
Dynamic Programming With State Uncertainty
Agent can only partially predict future states
Value function is based on expectation over this uncertainty
![Page 17: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/17.jpg)
Expectation Has A (Mostly) Intuitive Form
measure of temptation
![Page 18: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/18.jpg)
Theory Predicts Agent’s Temptation Resistance
![Page 19: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/19.jpg)
Two Limitations on Human Behavior
Stochastic fluctuations in willpower
parameter σ
Exponential discounting
parameter γ
Agent is optimal subject to these constraints
Canonical notion of grit: Small σ + large γ
![Page 20: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/20.jpg)
Simulation
Finish line effect
Low grit moderates effect of discount rate
10 time steps (τ)delayed reward is 2 x immediate (κ)
1 2 3 4 5 . . .
![Page 21: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/21.jpg)
Temptation Resistance as a Function of γ and σ
high low
![Page 22: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/22.jpg)
What Magnitude Delayed Reward Leads To Temptation Resistance?
Given a wait time for the delayed reward, what relative magnitude does the reward have to be in order for there to be a 50% chance the agent will wait for it?
Effective discount rate is exponential, as reflected by the log-linear scaling of the delayed reward
Although γ determines the discount rate, σ determines a time-invariant multiplicative factor
γ = 0.89
γ = 0.95low grithigh grit
![Page 23: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/23.jpg)
Prize-Linked Savings Accounts:Incentivizing A Long-Term Focus
“For every $100 you put in your retirement account, we’ll give you one ticket for a lottery for a $10000 prize.”
Potential of an immediate reward for focusing on long-term goal
One size fits all solution
Maybe different individuals would benefit from different reward structures frequent small rewards vs. infrequent large rewards
![Page 24: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/24.jpg)
Simulating Prize-Linked Savings Account
Borrow η reward units from delayed reward as incentive
At each time t, hold a lottery for reward ω(t) obtained with probability ρ(t)
![Page 25: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/25.jpg)
RiskOur reward-maximizing framework is risk neutral.
lottery(ρ,ω) is equivalent to lottery(ρ’,ω’) if ρω = ρ’ω’
Risk seeking vs. risk averse behavior
Prospect theory (Tversky & Kahneman, 1979)
When gains are being considered,people underestimate high probabilitiesand overestimate low probabilities
Risk-sensitive RL (Shen, Tobia, Sommer, Obermeyer, 2013)
replace ρ with a subjective probability,
![Page 26: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/26.jpg)
Incorporating Lottery Into Model
Assumes lottery at every time step (TBD)
1 2 3 4 5 τ
κμe-η
small large
μc(1) . . .
μe-w(1)μe-w(2) μe-w(3)
μe-w(t)
μc(2) μc(3) μc(4) μc(τ-1) μc(τ)
![Page 27: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/27.jpg)
Optimization Problem
Given an agent with discount rate γ and grit σ, what is the lottery
L = {ρ(t), ω(t): t = 1 …τ}
that maximizes agent’s temptation resistance?
![Page 28: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/28.jpg)
varyingdiscountrate (γ)
otherparametersfixed(σ = .10,η = .40,ρ = .01,κ = 2)
γ = 0.950
γ = 0.942
γ = 0.932
γ = 0.921
γ = 0.907
γ = 0.892
γ = 0.874
γ = 0.853
γ = 0.829
γ = 0.800
![Page 29: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/29.jpg)
Interesting Ideas
We can analyze delayed-gratification tasks as an MDP
Grit is helpful if agent does not heavily discount the future; but it can be harmful if the agent does.
behavioral noise can improve performance
Optimal incentive structures depend on an agent’s discount parameter and grit
![Page 30: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/30.jpg)
Experimental Explorations of the Model
1. Develop a laboratory task for adults that
involves choice between smaller-sooner and larger-later rewards
requires continual decision making
induces impulsive behavior
2. Demonstrate that model accounts for human behavior
3. Use the model to optimize human behavior
i.e., resist temptation
![Page 31: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/31.jpg)
Experiment
Demo
Reward per unit time
short: 1.0 points
long: 1.5 points
![Page 32: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/32.jpg)
Experiment
Four minute duration
Mechanical turk participants
Up to 25% bonus payment depending on score
25 participants in control condition
![Page 33: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/33.jpg)
Accumulated Points Over Time
![Page 34: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/34.jpg)
Defection To Short Line
γ = 0.84σ = 0.25Model parameters
![Page 35: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/35.jpg)
Two Versions Of Model
•Willpower at successive moments is independent
•Willpower follows a random walk
![Page 36: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/36.jpg)
Current Directions
Now that we have a model that fits our population, can we determine incentive structure that boosts likelihood of waiting in long line?
![Page 37: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/37.jpg)
Current Directions
Fit parameters of model to an individual’s data
Correlate model parameters with standard assessments like the delay discounting paradigm.
Extend theory to handle
uncertainty in the arrival time of the delayed reward (e.g., marshmallow task)
non-terminal temptations (e.g., Starbucks)
compounded interest (e.g., retirement savings)
human learning from experience (e.g., recency effects)
![Page 38: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/38.jpg)
Thank You!
![Page 39: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/39.jpg)
Game seems to be more interesting than we were intending.
Original intention was to simulate a series of independent episodes, but episodes are interdependent due to variations in line length from one episode to the next.
![Page 40: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/40.jpg)
Retirement Planning Fail
Among US 55-64 year old
62% have retirement assets
median savings for those who have assets: $42k
Among Canadian 55-64 year olds
81% have retirement assets (RRSP or EEP)
median savings for those who have assets: $245k
24% contributed to RRSP in 2011
Pre-retirement defection in the US
For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans (Argento, Bryant, & Sabelhaus, 2014)
statcan.gc.ca
cbc.ca
National Institute onRetirement Security
![Page 41: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/41.jpg)
Optimization Problem
Formalization of delayed gratification task
μe: reward for ending early
κ: relative magnitude of delayed reward
τ: wait time for delayed reward
η: expected lottery payout, Σρ(t)ω(t) = η
Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?
![Page 42: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/42.jpg)
Experiment
short vs. long: 1.0 vs. 1.5 points per time step
controlcondition
bonuscondition
![Page 43: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/43.jpg)
![Page 44: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/44.jpg)
Hazard Function For Each Line Length
Defection increases with line length
Finishing line effect
Seems like fewer defections in bonus condition
![Page 45: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/45.jpg)
Issue With The Game
Intention was to simulate a series of independent episodes but they are interdependent because
time limitation
information provided about next episode’s line length
![Page 46: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/46.jpg)
Optimization Problem
Formalization of delayed gratification task
μc: reward for continuing
μe: reward for ending early
κ: relative magnitude of delayed reward
τ: wait time for delayed reward
η: expected lottery payout, Σρ(t)ω(t) = η
Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?
Simplified
ρ(i) = ρ(j) = ρω(i)= ω(j) = η/ρ
![Page 47: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian](https://reader036.vdocuments.net/reader036/viewer/2022062322/56649ea05503460f94ba36ee/html5/thumbnails/47.jpg)
Varying η and ρ
γ = .92σ = .10