overcoming temptation: theory and practice michael mozer computer science dept. and institute of...

Overcoming Temptation:Theory and Practice

Michael MozerComputer Science Dept. and Institute of Cognitive Science

University of Colorado Boulder

Adrian F. WardMcCombs School of Business, University of Texas Austin

John LynchLeeds School of Business, University of Colorado Boulder

Brett Israelsen, Ian SmithComputer Science, University of Colorado Boulder

Shruthi SukumarElectrical & Computer Engineering, University of Colorado Boulder

Shabnam HakimiInstitute of Cognitive Science, University of Colorado Boulder

Retirement Planning Fail

Among US 55-64 year old

62% have retirement assets

median savings for those who have assets: $42k

Pre-retirement defection in the US

For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans(Argento, Bryant, & Sabelhaus, 2014)

National Institute onRetirement Security

Can Financial Education Change Behavior?

US Government and nonprofits spent $670M on financial education in 2013.

Financial education explain 0.1% of variance in financial outcomes (Fernandes, Lynch, & Netemeyer, 2015)

Social Sciences Finance0.0

0.2

0.4

0.6

0.8

1.0

Effectiveness of Educational In-terventions (r2)

LargeMediumSmall

Domain

Effec

t Siz

e (r

2)

r2 = .0011

Behavioral Control Problem

Agent acts in the world

Some actions can lead to immediate pay offs

e.g., buy a new car

Other actions can lead to delayed pay offs

e.g., increase contributions to retirement account

How do you incentivize people to stay focused on the long-term?

Other Domains

Dieting

Exercise

Cleaning house

Waiting for bus / elevator

Listening to a research talk

Delay Discounting Paradigm

A way to quantify preference for now vs. later rewards

Find point of subjective indifference

Yields hyperbolic discounting

Would you rather have$100 now

or$X in Y days?

Delayed Gratification Paradigm

Marshmallow Test (Mischel and Ebbeson, 1970)

Delay Discounting Delayed Gratificationone shot decision continuous decisionreveals intrinsic future value

future value confounded with grit

https://www.youtube.com/watch?v=QX_oy9614HQ

https://www.youtube.com/watch?v=QX_oy9614HQ

Grit, Willpower, and Self Control

All refer to tendency to sustain interest and effort toward a goal

Grit

enduring personality trait

Willpower (= self control)

depends on grit but also varies as a function of mood, time of day, food and beverage intake, ego depletion

Formalization Of Delayed Gratification Task

Choice at every instant to

grab small reward ⟸ end

wait for later large reward ⟸ continue

Finite-state machine (FSM) representation

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc

What Is The Optimal Policy?Optimal policy chooses action at time t that maximizes cumulative summed reward

Or cumulative discounted reward

more discounting -> agent more likely to succumb to temptation

Form of discounting

Exponential vs. hyperbolic

Value Function = Policy

In state S1

Choose action A if V(S2) > V(S3)

Choose action B otherwise

S1

S3

S2A

B

Dynamic Programming

Efficient way of computing value function of optimal policy

The Delayed Gratification FSM has a particularly restricted structured leading to only a few possible state sequences ECECCECCCECCCCECCCCCECCCCCCECCCCCCCECCCCCCCCECCCCCCCCCE

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc

Dynamic Programming

Dynamic programming finds the value function that satisfies

Depending on discount rate γ, this yields policy that either

ends at time 1

continues untiltime τ

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc

Modeling Human Behavior

DP is not a good model of human behavior

People may wait a while and then succumb to temptation

If you test the same person in the same situation, they may not behave identically each time

What do we need to better model people?

Willpower!

W(t) ~ Gaussian(0,σ2)

σ2: grit

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc1 2 3 4 5 τ

κμe

small large

μc . . .

μe-w(1)μe-w(2) μe-w(3)

μe-w(t)

μc μc μc μc μc

Willpower Model

State consists of

t: current time

w: agent’s current willpower level

Agent plans optimally given state {t,w}

Takes deterministic action

However, variability in behavior each time task is performed due to fluctuations in w

Dynamic Programming With State Uncertainty

Agent can only partially predict future states

Value function is based on expectation over this uncertainty

Expectation Has A (Mostly) Intuitive Form

measure of temptation

Theory Predicts Agent’s Temptation Resistance

Two Limitations on Human Behavior

Stochastic fluctuations in willpower

parameter σ

Exponential discounting

parameter γ

Agent is optimal subject to these constraints

Canonical notion of grit: Small σ + large γ

Simulation

Finish line effect

Low grit moderates effect of discount rate

10 time steps (τ)delayed reward is 2 x immediate (κ)

1 2 3 4 5 . . .

Temptation Resistance as a Function of γ and σ

high low

What Magnitude Delayed Reward Leads To Temptation Resistance?

Given a wait time for the delayed reward, what relative magnitude does the reward have to be in order for there to be a 50% chance the agent will wait for it?

Effective discount rate is exponential, as reflected by the log-linear scaling of the delayed reward

Although γ determines the discount rate, σ determines a time-invariant multiplicative factor

γ = 0.89

γ = 0.95low grithigh grit

Prize-Linked Savings Accounts:Incentivizing A Long-Term Focus

“For every $100 you put in your retirement account, we’ll give you one ticket for a lottery for a $10000 prize.”

Potential of an immediate reward for focusing on long-term goal

One size fits all solution

Maybe different individuals would benefit from different reward structures frequent small rewards vs. infrequent large rewards

Simulating Prize-Linked Savings Account

Borrow η reward units from delayed reward as incentive

At each time t, hold a lottery for reward ω(t) obtained with probability ρ(t)

RiskOur reward-maximizing framework is risk neutral.

lottery(ρ,ω) is equivalent to lottery(ρ’,ω’) if ρω = ρ’ω’

Risk seeking vs. risk averse behavior

Prospect theory (Tversky & Kahneman, 1979)

When gains are being considered,people underestimate high probabilitiesand overestimate low probabilities

Risk-sensitive RL (Shen, Tobia, Sommer, Obermeyer, 2013)

replace ρ with a subjective probability,

Incorporating Lottery Into Model

Assumes lottery at every time step (TBD)

1 2 3 4 5 τ

κμe-η

small large

μc(1) . . .

μe-w(1)μe-w(2) μe-w(3)

μe-w(t)

μc(2) μc(3) μc(4) μc(τ-1) μc(τ)

Optimization Problem

Given an agent with discount rate γ and grit σ, what is the lottery

L = {ρ(t), ω(t): t = 1 …τ}

that maximizes agent’s temptation resistance?

varyingdiscountrate (γ)

otherparametersfixed(σ = .10,η = .40,ρ = .01,κ = 2)

γ = 0.950

γ = 0.942

γ = 0.932

γ = 0.921

γ = 0.907

γ = 0.892

γ = 0.874

γ = 0.853

γ = 0.829

γ = 0.800

Interesting Ideas

We can analyze delayed-gratification tasks as an MDP

Grit is helpful if agent does not heavily discount the future; but it can be harmful if the agent does.

behavioral noise can improve performance

Optimal incentive structures depend on an agent’s discount parameter and grit

Experimental Explorations of the Model

1. Develop a laboratory task for adults that

involves choice between smaller-sooner and larger-later rewards

requires continual decision making

induces impulsive behavior

2. Demonstrate that model accounts for human behavior

3. Use the model to optimize human behavior

i.e., resist temptation

Experiment

Demo

Reward per unit time

short: 1.0 points

long: 1.5 points

Experiment

Four minute duration

Mechanical turk participants

Up to 25% bonus payment depending on score

25 participants in control condition

Accumulated Points Over Time

Defection To Short Line

γ = 0.84σ = 0.25Model parameters

Two Versions Of Model

•Willpower at successive moments is independent

•Willpower follows a random walk

Current Directions

Now that we have a model that fits our population, can we determine incentive structure that boosts likelihood of waiting in long line?

Current Directions

Fit parameters of model to an individual’s data

Correlate model parameters with standard assessments like the delay discounting paradigm.

Extend theory to handle

uncertainty in the arrival time of the delayed reward (e.g., marshmallow task)

non-terminal temptations (e.g., Starbucks)

compounded interest (e.g., retirement savings)

human learning from experience (e.g., recency effects)

Thank You!

Game seems to be more interesting than we were intending.

Original intention was to simulate a series of independent episodes, but episodes are interdependent due to variations in line length from one episode to the next.

Retirement Planning Fail

Among US 55-64 year old

62% have retirement assets


Among Canadian 55-64 year olds

81% have retirement assets (RRSP or EEP)


24% contributed to RRSP in 2011

Pre-retirement defection in the US

For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans (Argento, Bryant, & Sabelhaus, 2014)

statcan.gc.ca

cbc.ca

National Institute onRetirement Security


Formalization of delayed gratification task

μe: reward for ending early

κ: relative magnitude of delayed reward

τ: wait time for delayed reward

η: expected lottery payout, Σρ(t)ω(t) = η

Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?

Experiment

short vs. long: 1.0 vs. 1.5 points per time step

controlcondition

bonuscondition

Hazard Function For Each Line Length

Defection increases with line length

Finishing line effect

Seems like fewer defections in bonus condition

Issue With The Game

Intention was to simulate a series of independent episodes but they are interdependent because

time limitation

information provided about next episode’s line length


Formalization of delayed gratification task

μc: reward for continuing

μe: reward for ending early

κ: relative magnitude of delayed reward

τ: wait time for delayed reward

η: expected lottery payout, Σρ(t)ω(t) = η

Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?

Simplified

ρ(i) = ρ(j) = ρω(i)= ω(j) = η/ρ

Varying η and ρ

γ = .92σ = .10

overcoming temptation: theory and practice michael mozer computer science dept. and institute of...

Documents

delayed reward

cumulative summed reward

cumulative discounted

delayed pay offse

retirement savings9what

retirement accounthow

kpreretirement defection

time t