e cient consolidation of incentives for education and ...rpaluszy/human_capital.pdf · several...
TRANSCRIPT
Efficient Consolidation of Incentives for Education andRetirement Savings∗
Radoslaw Paluszynski† Pei Cheng Yu‡
This version:March 8, 2020
Abstract
We study optimal tax policies with human capital investment and retirement savings forpresent-biased agents. Agents are heterogeneous in their innate ability and make riskyeducation investments which determines their labor productivity. We demonstrate thatthe optimal distortions vary with education status. In particular, the optimal policyencourages human capital investment with savings incentives. Our implementation usesincome-contingent student loans and existing retirement policies, augmented by a new taxinstrument that subsidizes retirement savings for college graduates. The instrument mimicsthe latest policy proposals by allowing employers to offer 401(k) matching contributionsproportional to student loans repayment.
Keywords: Present bias, Human capital, Retirement, Sequential screening
∗The authors would like to thank Gonzalo Castex, Taha Choukhmane, German Cubas, Mikhail Golosov,David Rahman, Satoshi Tanaka, Kei Mu Yi and Hsin-Jung Yu.†University of Houston (e-mail: [email protected])‡University of New South Wales (e-mail: [email protected])
1 Introduction
The average cost of higher education in the US has been growing nearly eight times
faster than median household income over the last two decades. Due to the lack of insur-
ance against labor market uncertainties, this rise in college costs can reduce investment in
higher education. At the same time, policymakers have been concerned about the seemingly
insufficient amount of private retirement savings. Raising the welfare of retirees with more
generous social security benefits would require imposing distortionary taxes, which makes
policies that increase private savings preferable. Though human capital investment and re-
tirement savings are usually treated as separate policy issues, this paper argues that they
are interdependent when people are present-biased.
There are currently multiple policy proposals in the US that suggest making retirement
savings contingent on student loan repayment, which establishes a link between retirement
and education policies. These proposals are based on a pathbreaking IRS ruling in 2018
that allowed a company to make contributions to the retirement plans of employees who are
paying off their student debt even if they do not make any 401(k) contributions.1 In essence,
individuals automatically save for retirement while repaying student loans. Other private
employers have since offered similar benefits. Following these developments, the Retirement
Parity for Student Loans Act and the Retirement Security and Savings Act were introduced
in Congress. Both bills allow employer 401(k) matching based on student loan payments.
Despite the enthusiasm of policymakers, the benefit of connecting student loan repayment
to retirement savings is not apparent. This paper provides a theoretical guidance in linking
education with retirement savings—two seemingly unrelated areas of government policy.
We study a Mirrlees life-cycle model with present-biased agents. We focus on present-
biased agents to capture the self-control problem documented in recent empirical studies on
the underinvestment in education (Cadena and Keys, 2015) and insufficient retirement sav-
ings (Angeletos et al., 2001; Laibson et al., 2017). In our framework, agents initially differ in
their innate ability which could either be high or low. Based on their innate ability, agents
choose their level of education: college or high school. Afterwards, they work before they re-
tire. The likelihood of having higher productivity when working increases with innate ability
and education status. Both innate ability and productivity are the agents’ private informa-
tion, so the government sequentially screens the agents and designs policies conditioned on
the observed education investment and income. Crucially, the government separates agents
so that high innate ability agents go to college while low innate ability agents do not. The
government also attempts to paternalistically offset the present bias.
1It was revealed that the company involved in the ruling was Abbott Laboratories, a health care company.
2
Our theoretical framework characterizes the optimal interdependence between retirement
savings and education investment. When agents are saving for retirement, college graduates
are rewarded with a retirement savings subsidy. Intuitively, present-biased agents who are
deciding on their education investment want to prevent their future-selves from under-saving
for retirement. Therefore, the optimal retirement savings policy incentivizes human capital
investment by providing college graduates with a savings vehicle that offsets their present
bias. For non-college graduates, the retirement savings subsidy is lower and it is monoton-
ically decreasing in income. Since high innate ability agents are more likely to earn higher
income, a savings subsidy for non-college graduates that decreases with income discourages
high innate ability agents from entering the workforce without a college degree.
We also show that the usual inverse Euler equation for time-consistent agents does not
hold. When agents invest in higher education, the inverse marginal utility of consumption
is strictly higher than the working period’s expected inverse marginal utility. This implies
that consumption is more front-loaded for college graduates compared to the time-consistent
case. As a result, the optimal student loans are more generous to gratify the present-biased
agents and encourage the ones with high innate ability to invest in college.
We also derive the optimal labor wedge for our environment. Though the theoretical
characterization differs from those obtained with time-consistent agents, we show quantita-
tively that the optimal labor wedge with present-biased agents is very close to the one for
time-consistent agents. More precisely, two new opposing forces are introduced when agents
are present-biased. First, present-biased agents are less sensitive to future incentives. As
a result, the labor distortion decreases to mitigate the present-biased agents’ tendency to
undervalue future incentives. Second, by helping agents commit to saving for retirement, the
government is distorting intertemporal decisions from the present-biased agents’ perspective.
Therefore, the introduction of commitment increases labor distortions. In the quantitative
analysis, we show that these two economic forces approximately offset each other.
We consider two ways of decentralizing the constrained efficient allocations. For one
of them, we introduce retirement accounts with income and education contingent savings
subsidies. We also consider one where non-college graduates rely mainly on social security
benefits during retirement, while college graduates are supplemented with deposits worth a
fraction of their student loan repayments in their retirement savings accounts. This latter
implementation is inspired by the recent IRS ruling and policy proposals that treat student
loans repayment as salary reduction contribution to retirement accounts. For both schemes,
the government provides individuals with income-contingent student loans.
We bring our model to the US data by calibrating the structural parameters and by
approximating the current tax system to infer realistic distributions of skills among high
3
school and college graduates.2 We show that our theoretical predictions are quantitatively
significant. The optimal tax schedules involve extensive use of the intertemporal wedge
during working life which, crucially, differs across income and education groups. College
graduates are offered savings subsidies to smooth their consumption over the life cycle, which
ex-ante incentivizes them to choose college education. The difference in savings subsidies
between the two education groups declines with income, because the utility is close to linear
at high levels of income which results in low gains from consumption smoothing. Finally, we
show that the welfare gains from our optimal tax are potentially significant, exceeding 1% of
lifetime consumption relative to the world with optimal policies dedicated to time-consistent
agents.
1.1 Related Literature
This paper contributes to the literature on optimal human capital policies. Bovenberg
and Jacobs (2005) studies optimal education and income policies in an environment where
schooling increases productivity. However, human capital investment in their environment is
riskless. This is contrary to empirical studies that find returns to human capital investments
to be risky (Cunha and Heckman, 2007). This paper captures the risky returns to education
by modeling productivity as a random draw from a distribution determined by human capital.
There are other papers that have studied how risk from human capital investments affects the
design of optimal policy. Anderberg (2009) finds that how human capital affects the degree
of wage risk matters for optimal policy. Grochulski and Piskorski (2010) focuses on the
optimal capital taxation in an environment where agents share the same innate ability and
human capital investment is unobservable. Craig (2019) studies a setting where employers
observe informative but imperfect signals to infer the human capital investment of ex-ante
heterogeneous workers. In contrast, our paper focuses on how initial differences in innate
ability affects the design of policies when investment in education is observable.
Several papers have also examined the optimal policy for human capital acquisition over
the working age. Bohacek and Kapicka (2008) and Kapicka (2015) study the optimal tax
policy when human capital investment is deterministic while the agent works. Stantcheva
(2017) studies an environment where agents make monetary investments in each period to
build up their stock of human capital. Makris and Pavan (2019) examine the learning-by-
doing aspect of human capital accumulation, so human capital is acquired stochastically as
a by-product from labor effort. Kapicka and Neira (2019) considers risky but unobservable
human capital investment, so tax policies are not conditional upon this investment. In
2Our calibration uses an extended definition of college that includes Master’s, Doctoral and Professionaldegrees.
4
contrast, our work focuses on human capital acquired before agents enter the labor force.
Gary-Bobo and Trannoy (2015) and Findeisen and Sachs (2016) consider environments
most similar to ours. They examine optimal education and income tax policies in a setting
where agents differ in initial ability and make risky investments in education before they
enter the labor market. Our paper models initial ability and the risk from human capital in-
vestment in a similar fashion to their paper. However, we consider present-biased agents. In
our setting, we demonstrate the importance of linking retirement policies to education out-
comes. For time-consistent agents, as in their papers, introducing distortions in retirement
savings does not increase investments in human capital.
Our paper contributes to the literature on Mirrlees taxation when agents have behavioral
biases. Farhi and Gabaix (2019) use sparse maximization (Gabaix, 2014) to study optimal
taxation of behavioral agents in a static setting. Lockwood (2018) studies optimal income
taxation with present-biased agents where wages depend on past work effort. He shows how
present bias has a potentially large effect on the optimal marginal income tax rate. We find
an economic force similar to the one in Lockwood (2018). However, we also discover a novel
opposing force that quantitatively negates the effect of present bias on the optimal labor
wedge. There are also papers that focus on the design of retirement savings policies for
time-inconsistent agents in a Mirrlees setting. Moser and de Souza e Silva (2019) consider
a multi-dimensional screening environment, where agents are heterogeneous in present bias
and productivity. Yu (2019a) examines a multi-dimensional screening environment, where
the agents’ degree of present bias, productivity and level of sophistication are hidden.
The rest of the paper is organized as follows. Section 2 presents the dynamic life-cycle
model and Section 3 characterizes the optimal savings and labor wedges. In Section 4, we
calibrate the model and present the quantitative results and welfare analysis. Section 5
demonstrates two policies that decentralizes the optimum. Section 6 extends the theory to
include differences in present bias and also considers the case with naıve agents.
2 Model
We consider a life-cycle model with three periods: t = 0, 1, 2. At t = 0, agents learn their
innate ability γ ∈ {H,L} with H > L, and proceed to choose their education investment
e ∈ {eL, eH} where eH ≥ eL. We will refer to agents with innate ability γ as γ-agents. The
share of γ-agents is πγ ∈ (0, 1) with πH + πL = 1. The level of education investment e
represents the binary decision of whether to invest in higher-education (invest eH) or not
(invest eL). Human capital depends on both γ and e, which we denote as κ (e, γ) . We
assume κ is strictly increasing in both arguments. Intuitively, this captures the fact that
5
education helps raise human capital, and also how H-agents are more effective in human
capital accumulation than L-agents. The government observes e while κ and γ are the
agents’ private information. We refer to γ as the ex-ante private information.
At t = 1, agents enter the labor market, and privately learn their productivity θ ∈ Θ =[θ, θ]⊂ R+. Productivity is drawn from a differentiable distribution with c.d.f. F (θ|κ) ,
which depends on human capital κ and is ranked according to first order stochastic domi-
nance: if κ > κ′, F (θ|κ) < F (θ|κ′) ,∀θ ∈ Θ. Also, let f (θ|κ) denote the p.d.f. and assume
f (θ|κ) is bounded away from zero for any θ and κ. This models the riskiness of human
capital investment, where agents with higher human capital are more likely to be produc-
tive. An agent with productivity θ who provides work effort l produces output y = θl. The
government observes output y, but not productivity θ nor labor supply l. We refer to θ as
the ex-post private information. Finally, at t = 2, agents retire and consume their savings.
To model present bias, we adopt the quasi-hyperbolic discounting model (Laibson, 1997).
Let β < 1 denote the short-run discount factor, which represents the degree of present bias.
Let δ denote the long-run discount factor. Agents of type (γ, θ) have the following utility at
t = 1 :
U1 (c1, c2, y; γ, θ, e) = u (c1)− h(yθ
)+ βδ2u (c2) .
The flow utilities u and h are defined for consumption ct ≥ 0 and output y ≥ 0, respectively.
Utility from consumption u is twice differentiable, strictly increasing, and strictly concave:
u′,−u′′ > 0. Disutility from labor h (l) is twice differentiable, strictly increasing, and strictly
convex: h′, h′′ > 0, with h (0) = 0. Agents with innate ability γ have the following utility at
t = 0 :
U0 ({ct} , e, y; γ) = δ0 (e)u (c0) + βδ1 (e)
∫Θ
[u (c1)− h
(yθ
)+ δ2u (c2)
]f (θ|κ (e, γ)) dθ.
The length of each period is different, so the long-run discount factor δt is determined
by the annual discount factor and the number of years in that period. Furthermore, the
length of the schooling period (t = 0) is different across education groups. Therefore, the
long-run discount factors δ0 and δ1 are functions of e to reflect how the number of years in
school affects the length of t = 0. We assume that all agents work the same number of years
in t = 1, so δ2 is constant across education groups. Under our specification, the flow utility
and allocations are in annual terms. For example, (c1, y) is the annual consumption-output
bundle during the working period (t = 1). More details are provided in Section 4.
Crucially, since β < 1, present-biased agents discount the immediate future more than
the distant future. As a result, agents in t = 0 dislike the fact that their future-selves in
t = 1 under-save for retirement.
6
2.1 Planning Problem
To characterize the constrained efficient allocation, we use the direct mechanism—agents
report their private information to the government. This theoretical characterization is
useful because, in later sections, we will use it as a blueprint to decentralize the optimum as
a competitive equilibrium. The government designs
P ={c0 (γ) , [c1 (γ, θ) , c2 (γ, θ) , y (γ, θ)]θ∈Θ
}γ∈{H,L} .
Since agents privately learn their innate ability γ and productivity θ sequentially, we require
P to be incentive compatible for each period (Myerson, 1986). Let U1 (θ′; γ, θ) denote the
utility of a type (γ, θ) agent who reported γ truthfully and reports θ′ ∈ Θ in t = 1. The
ex-post incentive compatibility constraints ensure the agents report θ truthfully: ∀θ, θ′ ∈ Θ,
U1 (γ, θ) ≡ U1 (θ; γ, θ) ≥ U1 (θ′; γ, θ) . (1)
Let the utility in t = 0 of γ-agents who reported innate ability γ′ be denoted as
U0 (γ′; γ) = δ0 (eγ′)u (c0 (γ′)) + βδ1 (eγ′)
∫Θ
[U1 (γ′, θ) + (1− β)δ2u (c2 (γ′, θ))] dF (θ|κγ′,γ) ,
where κγ′,γ = κ (eγ′ , γ) and let κγ,γ = κγ. Then, the ex-ante incentive compatibility con-
straints ensure that the agents report γ truthfully at t = 0 : for any innate ability γ, γ′,
U0 (γ) ≡ U0 (γ; γ) ≥ U0 (γ′; γ) . (2)
The government is paternalistic in that it treats present bias as an error and attempts to
correct it. The basis for this is because β 6= 1 reflects a self-control problem that agents dis-
approve of in every other period (O’Donoghue and Rabin, 1999). The government attempts
to increase investment in education and raise retirement savings by maximizing the sum of
long-run utilities:
∑γ
πγ
{δ0 (eγ)u (c0 (γ))+δ1 (eγ)
∫Θ
[u (c1 (γ, θ))− h
(y (γ, θ)
θ
)+ δ2u (c2 (γ, θ))
]f (θ|κγ) dθ
}
subject to the ex-post incentive constraints (1), the ex-ante incentive constraints (2) and the
resource constraint
∑γ
πγ
{−c0 (γ)− eγR0 (eγ)
+1
R1 (eγ)
∫Θ
[y (γ, θ)− c1 (γ, θ)− 1
R2
c2 (γ, θ)
]f (θ|κγ) dθ
}≥ 0,
7
where Rt denotes the gross rate of return. We will assume that δtRt = 1.
It is worth emphasizing that, apart from the inherent investment risk, education is costly
for two additional reasons. First, it is costly in terms of resources. Second, it is costly in
terms of time, because receiving education delays entry into the labor market. Due to these
reasons, present-biased agents are less willing to invest in human capital.
2.2 Characterizing Incentive Compatibility
Here, we derive a lemma that simplifies ex-post incentive compatibility and discuss
the difficulties in theoretically characterizing ex-ante incentive compatibility. The follow-
ing lemma characterizes the set of policies that are ex-post incentive compatible. Its proof
is standard so it is omitted.
Lemma 1 For any γ, P is ex-post incentive compatible if and only if (i.) y (γ, θ) is
non-decreasing in θ, and (ii.) U1 (γ, θ) is absolutely continuous in θ, with ∂U1(γ,θ)∂θ
=y(γ,θ)θ2
h′(y(γ,θ)θ
).
There are three main difficulties in characterizing ex-ante incentive compatibility for
time-inconsistent agents. First, local ex-ante incentive compatibility does not necessarily
imply global ex-ante incentive compatibility when agents are time-inconsistent (Halac and
Yared, 2014; Galperti, 2015; Yu, 2019b). This paper simplifies the problem by examining
the case with two levels of innate ability.
The second difficulty lies in the direction of the relevant deviation at t = 0. Usually, the
relevant deviation is downwards when agents are time consistent. Findeisen and Sachs (2016)
showed that part of the sufficient condition for this to be true requires output y (γ, θ) to be
weakly increasing with innate ability γ. However, Yu (2019b) showed that the optimal alloca-
tions are usually non-monotonic with respect to ex-ante information. The non-monotonicity
helps relax the ex-ante incentive constraints when agents are time-inconsistent. Therefore,
it is unclear in which direction the ex-ante incentive constraints binds. For our theoreti-
cal analysis, we focus on the case where only the downward ex-ante incentive compatibility
constraint—the incentive constraint for H-agents—binds. Then, in our quantitative analysis,
we verify that the downward ex-ante incentive constraint is indeed the relevant constraint.
Finally, it is unclear whether the government should screen ex-ante private information in
the first place. Given the parameters of the model, it is difficult to theoretically determine
who should go to college: everyone, no one, or only the H-agents. For our theoretical
analysis, we will focus on the case where only the H-agents go to college, and verify that
this is indeed optimal given the cost of college in our quantitative analysis.
8
2.3 Wedges
To understand how present-bias and informational frictions affect the optimal tax policy,
the paper will focus on characterizing the optimal intertemporal and labor wedges.
The intertemporal wedge in t = 0 for innate ability γ is
τ k0 (γ) = 1− u′ (c0 (γ))
Eθ [u′ (c1 (γ, θ)) |γ],
and the intertemporal wedge in t = 1 for type (γ, θ) is
τ k1 (γ, θ) = 1− u′ (c1 (γ, θ))
u′ (c2 (γ, θ)).
When τ k 6= 0, then consumption is not smoothed across time. More specifically, if τ kt > 0,
then savings is restricted in t. Similarly, if τ kt < 0, then savings is distorted upwards in t.
The labor wedge in t = 1 for type (γ, θ) is
τw(γ, θ) = 1−h′(y(γ,θ)θ
)θu′ (c1 (γ, θ))
.
Since agents’ equilibrium wage is equal to their productivity θ in a competitive labor market,
if τw 6= 0, then agents are not working at the efficient level. In particular, if τw > 0, then
there is an under-supply of labor given the market wage. On the other hand, if τw < 0, then
agents are over-supplying labor given the market wage.
3 Theoretical Results
In this section, we derive the optimal intertemporal and labor wedges, which are crucial
for determining the optimal tax policies.
3.1 Intertemporal Wedges
The following proposition provides the inverse Euler equations for present-biased agents.
Proposition 1 The constrained efficient allocation satisfies (i.) the inverse Euler equation
in aggregate: ∑γ
πγu′ (c0 (γ))
=∑γ
πγEθ(
1
u′ (c1 (γ, θ))
∣∣∣∣γ) , (3)
9
(ii.) for any γ ∈ {H,L} ,
Eθ(
1
u′ (c1 (γ, θ))
∣∣∣∣γ) = Eθ(
1
u′ (c2 (γ, θ))
∣∣∣∣γ) , (4)
and (iii.) for any θ ∈ Θ,
1
βu′ (c2 (H, θ))=
1
u′ (c1 (H, θ))+
(1− ββ
)(πH + βµ
πH + µ
)1
u′ (c0 (H))(5)
1
βu′ (c2 (L, θ))=
1
u′ (c1 (L, θ))+
(1− ββ
)πL − βµ(f(θ|κL,H)f(θ|κL)
)πL − µ
1
u′ (c0 (L)), (6)
where µ = [u′ (c0 (L))− u′ (c0 (H))][u′(c0(L))
πL+ u′(c0(H))
πH
]−1
.
Proposition 1 follows from considering variations around any incentive compatible allo-
cation that preserve incentive compatibility. The optimal allocation minimizes the resources
expended, which satisfies (3) and (4).
To understand Proposition 1, first consider the standard inverse Euler equation at t = 0
for time-consistent agents:
1
u′ (c0 (γ))= Eθ
(1
u′ (c1 (γ, θ))
∣∣∣∣γ) for any γ.
By Jensen’s inequality, the inverse Euler equation for time-consistent agents implies
u′ (c0 (γ)) < Eθ [u′ (c1 (γ, θ))] for any γ. Due to informational constraints, the transfer of
consumption from t = 0 to t = 1 for time-consistent agents is restricted regardless of their
ex-ante private information.3 By restricting savings, the government can induce effort in
t = 1 at a lower cost, which relaxes the ex-post incentive constraint. Therefore, the in-
tertemporal wedge τ k0 is strictly positive for any innate ability γ.
With present-biased agents, intertemporal distortions have an additional effect on welfare.
The government can relax the ex-ante incentive compatibility constraint by increasing the
intertemporal wedge for H-agents while decreasing the wedge for L-agents at t = 0. To see
this, if we take expectation of (5) and (6) with respect to θ, then by (4) we can derive the
following inverse Euler inequalities:
1
u′ (c0 (H))> Eθ
(1
u′ (c1 (H, θ))
∣∣∣∣H)3See Golosov et al. (2003) for more on the inverse Euler equation for time-consistent agents.
10
and1
u′ (c0 (L))< Eθ
(1
u′ (c1 (L, θ))
∣∣∣∣L) .Comparing it with the standard inverse Euler equation, the consumption for H-agents is even
more front-loaded while the consumption is relatively back-loaded for L-agents.4 In other
words, the government exacerbates the intertemporal distortion for H-agents to satisfy their
temptation, encouraging them to accumulate human capital. Furthermore, the less front-
loaded consumption path for L-agents helps discourage downward deviations. As a result,
the best the government can do is to choose consumption such that the inverse marginal
utility is equalized in aggregate, which is implied by (3).
Next, note that the optimal intertemporal decision at t = 1 for time-consistent agents
satisfies the standard Euler equation:
u′ (c1 (γ, θ)) = u′ (c2 (γ, θ)) for any γ, θ.
This implies that it is optimal for time-consistent agents to smooth consumption between
work and retirement periods. This is because there is no additional uncertainty beyond
t = 1, so the government does not face informational constraints in the future. As a result,
there is no need to distort the intertemporal margin at t = 1.
From the government’s perspective, present-biased agents save too little for their retire-
ment. In essence, if present-biased agents were allowed to save freely, then u′ (c1 (γ, θ)) =
βu′ (c2 (γ, θ)) , so the intertemporal wedge would be τ k1 (γ, θ) = 1 − β > 0 for any type
(γ, θ) . In contrast, by (5) and (6), it is optimal for the intertemporal wedge τ k1 to depend
on reported innate ability and productivity. This is because, with present-biased agents, an
intertemporal wedge in t = 1 that depends on the reported innate ability relaxes the ex-ante
incentive constraint. To see why, recall that agents at t = 0 are concerned that they will not
save enough at t = 1. Thus, they have demand for a commitment device that incentivizes
them to save more. By introducing a wedge on retirement savings that depends on past
reports, the government is able to influence the incentives for education attainment.
More specifically, by (5), we immediately notice that
u′ (c1 (H, θ))
u′ (c2 (H, θ))> β.
4Grochulski and Piskorski (2010) found that the inverse marginal utility of consumption is a strictsupermartingale when agents are time consistent and ex-ante identical. In their paper, human capitalinvestments are unobservable, so under investing in education is complementary to shirking in future periods.Hence, in addition to the usual distortion to deter over-saving, the optimal policy makes the intertemporaldistortion worse at the education stage to deter under-investing in education. If education investment wasobservable in their environment, like ours, then the intertemporal distortion disappears.
11
In other words, the government helps the H-agents save for retirement. The government
essentially rewards H-agents for going to college with a commitment device that helps them
smooth consumption across work and retirement. This commitment device helps substitute
part of the information rent to H-agents.
By (6), for L-agents, the marginal rate of intertemporal substitution is
u′ (c1 (L, θ))
u′ (c2 (L, θ))
> β if πL > βµ
f(θ|κL,H)f(θ|κL)
= β if πL = βµf(θ|κL,H)f(θ|κL)
< β if πL < βµf(θ|κL,H)f(θ|κL)
.
Notice that the retirement savings for L-agents depend on the likelihood ratiof(θ|κL,H)f(θ|κL)
. If
f(θ|κL,H)f(θ|κL)
is relatively large, meaning that the observed productivity is likely to have come
from an agent with high innate ability, then it is optimal to distort the retirement savings
such that the present bias is exacerbated. The government uses this additional intertemporal
distortion to deter the H-agents from under-investing in education. It is also a cost effective
method since L-agents are unlikely to have that level of productivity. On the other hand, iff(θ|κL,H)f(θ|κL)
is relatively small, meaning that the observed productivity is unlikely to have come
from a H-agent, then the government helps offset the present bias.
Assumption 1 f satisfies the monotone likelihood ratio property: f(θ|κ)f(θ|κ′) is increasing in θ
for any κ > κ′.
Assumption 1 implies that higher productivity θ is more likely to come from higher
accumulated human capital κ. When Assumption 1 holds, the optimal intertemporal wedge
for L-agents increases with productivity. This implies that the government helps the L-
agents who are less productive with their retirement savings, while the retirement savings
of L-agents who are highly productive are restricted. This is because Assumption 1 implies
that H-agents who do not invest in higher education are more likely than L-agents to be
productive. As a result, the government exacerbates the present bias of low-educated and
productive agents to relax the ex-ante incentive constraint and induce H-agents to increase
education attainment. We assume Assumption 1 holds for the rest of the paper.5
The intertemporal distortion of τ k1 described above provides us with the theoretical basis
for linking retirement savings to education investent. Hence, it is worth noting that the
intertemporal wedge τ k1 is distorted because γ is the private information of present-biased
5Assumption 1 implies first order stochastic dominance.
12
agents. If the government can observe γ, then the standard Euler equation would hold:
u′ (c1 (γ, θ)) = u′ (c2 (γ, θ)) for any γ and θ. As was mentioned before, if agents were time-
consistent instead, then the standard Euler equation for t = 1 also holds. As a result, the
interdependence between retirement savings and education investment is solely used to relax
the ex-ante incentive compatibility constraint of present-biased agents.
Also, if productivity θ is a deterministic function of human capital κ, so human capital
investment is riskless, then there will also be no intertemporal distortions. In fact, Yu (2019a)
showed that the government can implement the full-information efficient optimum through
the use of off-path policies when there is no dynamic private information.
3.2 Labor Wedge
The dynamic incentive problem and the agents’ present bias also affects the labor wedge.
To separate the economic forces that determine the optimal labor distortions, we define
Aγ(θ) =1− F (θ|κγ)θf (θ|κγ)
,
Bγ(θ) = 1 +
y(γ,θ)θh′′(y(γ,θ)θ
)h′(y(γ,θ)θ
) ,
Cγ(θ) =
∫ θ
θ
u′ (c1 (γ, θ))
u′ (c1 (γ, x))
[1− u′ (c1 (γ, x))
φ
]f (x|κγ)
1− F (θ|κγ)dx,
Dγ(θ) = u′ (c1 (γ, θ))
[1
u′ (c0 (γ))− 1
φ
],
Eγ(θ) =
[u′ (c1 (γ, θ))
βu′ (c2 (γ, θ))− 1
]−(
1− ββ
)u′ (c1 (γ, θ))
φ,
where φ > 0 is the shadow price on the resource constraint.
Proposition 2 The labor wedge for any θ ∈ Θ satisfies
τw (H, θ)
1− τw (H, θ)= AH (θ)BH (θ) [CH (θ)−DH (θ) + EH (θ)] , (7)
τw (L, θ)
1− τw (L, θ)= AL (θ)BL (θ)
[CL (θ)−
(1− F (θ|κL,H)
1− F (θ|κL)
)DL (θ) +
g (θ|κL)
g (θ|κL,H)EL (θ)
], (8)
where g (θ|κ) = f(θ|κ)1−F (θ|κ)
and 1φ
= Eγ[Eθ(
1u′(c1(γ,θ))
∣∣∣∣γ)] .13
Proposition 2 presents the optimal labor wedge for present-biased agents in a sequential
screening environment. Following Golosov et al. (2016), we decompose the economic forces
into three distinct components: intratemporal, intertemporal and present-bias components.
The intratemporal component summarizes the trade-off between production efficiency and
insurance against productivity differences. The intertemporal component captures how labor
distortions affect the education decision in the previous period. Unique to our paper, the
present-bias component encompasses the effects of time inconsistency on the optimal labor
distortions. We rewrite (7) and (8) to pinpoint each component:
τw (H, θ)
1− τw (H, θ)= AH (θ)BH (θ)CH (θ)︸ ︷︷ ︸
intratemporal component
−AH (θ)BH (θ)DH (θ)︸ ︷︷ ︸intertemporal component
+AH (θ)BH (θ)EH (θ)︸ ︷︷ ︸present-bias component
,
τw (L, θ)
1− τw (L, θ)= AL (θ)BL (θ)CL (θ)︸ ︷︷ ︸
intratemporal component
−(
1− F (θ|κL,H)
1− F (θ|κL)
)AL (θ)BL (θ)DL (θ)︸ ︷︷ ︸
intertemporal component
+g (θ|κL)
g (θ|κL,H)AL (θ)BL (θ)EL (θ)︸ ︷︷ ︸
present-bias component
.
All components are affected by Aγ (θ) and Bγ (θ) . To understand these terms, first note
that by introducing a labor wedge for type (γ, θ) agents, their labor supply changes according
to their Frisch elasticity of labor supply, which is Bγ (θ) . Furthermore, an increase in the
labor distortion for agents of type (γ, θ) decreases their total output in proportion to θf (θ|κ) .
Meanwhile, the incentive constraints for higher productivity agents of mass 1− F (θ|κ) are
relaxed. This trade-off is captured by Aγ (θ) .
Without dynamic information, the optimal labor wedge is determined by the intratem-
poral component, which summarizes the economic forces in static models, such as Diamond
(1998) and Saez (2001). In addition to Aγ (θ) and Bγ (θ) , the intratemporal component
also consists of Cγ (θ) , which captures the strength of the government’s insurance motive
against the productivity shock. In static Mirrlees, the inverse marginal utility is the cost of
a marginal increase in utility in consumption terms, so the cost of a marginal increase in
average utility in t = 1 is 1φ. Hence, if the cost of increasing average utility is small relative
to the cost of increasing the utility of (γ, x) agents ( 1φ< 1
u′(c1(γ,x))), then Cγ (θ) is positive.
This is because the benefits of increasing the labor wedge of type (γ, θ) agents to relax the
ex-post incentive constraints of higher productivity agents (x ≥ θ types) outweighs the cost.
Furthermore, the degree of labor distortion increases with consumption inequality, which is
represented by u′(c1(γ,θ))u′(c1(γ,x))
.
14
When there is dynamic information and agents are time-consistent, then the labor wedge
is shaped by both the intratemporal and intertemporal components. This is similar to the
labor distortions in Findeisen and Sachs (2016). The intertemporal component contains
the term Dγ (θ) and is augmented by1−F(θ|κL,H)
1−F (θ|κL)for L-agents. Notice that Dγ (θ) can be
rewritten as [u′(c0(γ))]−1−φ−1
[u′(c1(γ,θ))]−1 . Therefore, by Proposition 1, we have DH (θ) > 0 and DL (θ) < 0.
This implies that the government can encourage investment in education through promising
a smaller labor wedge τw (H, θ) rather than raising c0 (H) . Similarly, it increases the labor
wedge of L-agents to discourage H-agents from working without a college degree. To that
end, the government also exploits the fact that H-agents who mimicked L-agents are more
likely to have higher productivity than actual L-agents, which is captured by1−F(θ|κL,H)
1−F (θ|κL).
This shows how the optimal labor distortion for non-college grads leverages the difference in
productivity distribution between actual L-agents and H-agents who eschewed college.
When agents are present-biased, the present-bias component highlights the additional
force that influences the labor wedge. The present-bias component is comprised of two
potentially off-setting forces: the disagreement and myopic components,
Eγ(θ) =
[u′ (c1 (γ, θ))
βu′ (c2 (γ, θ))− 1
]︸ ︷︷ ︸
disagreement component
−(
1− ββ
)u′ (c1 (γ, θ))
φ︸ ︷︷ ︸myopic component
.
The disagreement component characterizes how the labor distortion is affected by the gov-
ernment’s policy of increasing retirement savings. The myopic component captures the fact
that present-biased agents do not fully internalize the returns from working. Observe that
when β = 1, the intertemporal wedge at t = 1 is zero so Eγ (θ) = 0 for any γ and θ. Fur-
thermore, the present-bias component for L-agents is amplified by g(θ|κL)
g(θ|κL,H), which is greater
than 1 by Assumption 1.6 Again, this shows how the difference in productivity distribution
for agents with varying innate abilities is used to relax the ex-ante incentive constraint.
To understand the disagreement component, recall from Proposition 1, where we showed
how the H-agents are offered a commitment device in t = 1 to relax the ex-ante incentive
constraints. It also provides commitment to L-agents conditional on their realized productiv-
ity. However, from the present-biased agents’ perspective in t = 1, the optimal consumption
path satisfies u′ (c1) = βu′ (c2) . Hence, the commitment the government provides is per-
ceived as an intertemporal distortion by agents in t = 1, which distorts their labor supply by
decreasing their incentives to work. This is reflected in the disagreement component which
increases the labor wedge when the government provides more commitment. As a result,
6Assumption 1 also implies hazard rate dominance: g (θ|κ) ≤ g (θ|κ′) for all θ when κ > κ′.
15
there is a trade-off between the provision of work incentives and commitment. This suggests
that the consumption of more productive agents would be more front-loaded, because the
gains from higher production efficiency outweighs the cost of lower retirement savings.
To understand the myopic component, note that labor supply is incentivized with more
consumption in both t = 1 and t = 2. However, present-biased agents underestimate the
benefits of work because they discount retirement consumption more heavily. As a result,
the myopic component shows how the optimal labor wedge is decreased to correct for the
tendency of present-biased agents to undervalue the rewards for effort. Furthermore, due
to insurance motives, the myopic component decreases the labor wedge more for agents
with lower working-period consumption c1. So, less productive agents receive more help in
internalizing the benefits of working. The optimal labor wedge in Lockwood (2018) also con-
tains an economic force similar to the myopic component. However, Lockwood (2018) does
not characterize the optimal labor wedge with dynamic consumptiom, so the disagreement
component is absent in his characterization.
Lastly, notice that the myopic component always decreases the labor distortion, while the
disagreement component increases it unless the government does not provide commitment:u′(c1)u′(c2)
< β. As a result, the present-bias component is shaped by two opposing economic
forces. For H-agents, we can show that the disagreement component dominates the myopic
component for all levels of productivity. Meanwhile, for L-agents, the myopic component
dominates the disagreement component. To see this, notice that by (5) and (6) we can
rewrite the present-bias component as
EH (θ) =
((1− β)µ
πH
)u′ (c1 (H, θ))
φ,
EL (θ) = −
(1− β)µf(θ|κL,H)f(θ|κL)
πL
u′ (c1 (L, θ))
φ.
Beyond the government’s paternalistic motives, providing H-agents with commitment also
encourages them to enroll in college, so the disagreement component dominates for H-agents.
At the same time, the value of providing L-agents with commitment is smaller to the gov-
ernment, so the myopic component dominates.
Since all of the components are composed of endogenous variables, we rely on our quan-
titative exercise to characterize the labor wedge. Appendix C quantifies the decomposition
of the optimal labor wedge. As we show there, the shape of the labor wedge is mostly de-
termined by our assumptions on the distribution of skills and the size of the intratemporal
component. By contrast, the present-bias component plays a minor role quantitatively.
16
4 Quantitative Analysis
In this section, we quantify the model by imposing specific functional forms and cali-
brating their parameters. Then, we measure the quantitative significance of the theoretical
results presented in Section 3, as well as the welfare gains under the optimal tax system.
Table 1 presents the calibrated parameter values. We select the standard functional
forms for the utility of consumption u(c) = c1−σ
1−σ and the disutility of labor h(`) = `1+ 1
η
1+ 1η
, and
we assume they are constant over time. The risk aversion and the Frisch elasticity of labor
supply are then set to standard values of 2 and 0.5, respectively. The short- and long-run
discount factors are calibrated based on the values proposed by Nakajima (2012). The short-
term discount factor is 0.7, in the ballpark of the empirical estimates of Laibson et al. (2017).
The long-run discount factors are based on the annual rate of 0.9852 and compounded to
take into account the relative length of different periods. In the subsequent analysis we will
also make comparisons with a variant of our model for time-consistent agents (i.e. β = 1).
In that case, following Nakajima (2012), we recalibrate the effective discount factors based
on the annual rate of 0.9698. The purpose of such a recalibration is to separate the effect of
time-inconsistency in agents’ behavior from their effectively increased impatience.
In our calibrated model, we expand the definition of high school and college graduates by
admitting a wide range of real-world education outcomes. We associate the former with all
individuals who hold an Associate’s degree or less. The share of such low types in the 2015
Current Population Survey is 0.68. We associate the latter with all individuals who hold a
Bachelor’s, Master’s, Professional or Doctoral degree. We assume that t = 0 begins at age
18 and lasts 5.12 years for the high types (reflecting a weighted average across all degree
durations), or 0 years otherwise (hence, δ0 (eL) = 0). Agents work for 45 years and then
retire and live for 20 years in retirement. The annual cost of higher education is calculated
to be $15,700. Section B.2 in the Appendix discusses more details of our calibration.
In order to calibrate the distributions of skills for agents of different innate ability and
education, we create a separate model which we refer to as the “current policies” world.
This model is described in detail in Appendix B. We take this model to the data, solve
for optimal behavior and simulate large population of agents from each of the four groups:
(i.) factual high school graduates, (ii.) high school graduates, had they gone to college
(high school counterfactual), (iii.) factual college graduates, and (iv.) college graduates, had
they not gone to college (college counterfactual). We select the parameters governing the
distributions of skills such that the simulated distribution of lifetime earnings for each group
matches the one reported by Cunha and Heckman (2007). In particular, this study uses
a variation of the Roy model to infer counterfactual distributions of earnings for both high
17
Table 1: Parameter values in the model
Symbol Meaning Value
π0(L) Share of low type 0.68π0(H) Share of high type 0.32σ Risk aversion 2η Frisch elasticity 0.5eH Cost of higher education 1.57
Discount factors: present bias
β Short-term discount factor 0.7δ0(eL) High school period 0 long-term discount factor 0.00δ1(eL) High school period 1 long-term discount factor 1.00δ0(eH) College period 0 long-term discount factor 0.15δ1(eH) College period 1 long-term discount factor 0.93δ2 Retirement discount factor 0.27
Discount factors: time-consistent benchmark
δ0(eL) High school period 0 long-term discount factor 0.00δ1(eL) High school period 1 long-term discount factor 1.00δ0(eH) College period 0 long-term discount factor 0.19δ1(eH) College period 1 long-term discount factor 0.85δ2 Retirement discount factor 0.15
school and college graduates had they made the opposite education decision. Also, to correct
for the under-representation of high-end earnings in the data, we add an upper Pareto-tail
to each distribution such that the upper 10% of the mass is distributed according to a shape
parameter of 1.5, as in Saez (2001). Figure 1 presents the four distributions backed out as a
result of this procedure.
In what follows, we discuss our quantitative results. We begin with Table 2 which shows
the optimal intertemporal wedge in t = 0. In line with the hallmark dynamic Mirrlees result,
the government finds it optimal to tax savings in t = 0 in order to induce higher labor effort
from agents in the next period. Notice also that the optimal wedge amounts are in the
ballpark of the model with time-consistent agents, which is a result of our calibration that
holds the effective discount factor constant across the two models. Importantly though, the
wedge for present-biased agents is slightly higher, raising the consumption of college students
and providing additional incentive to make the college investment.7
7The intertemporal wedge for L-agents τk0 (L) is not shown since for our quantitative exercise, we assumedL-agents do not have a student period (δ0 (eL) = 0).
18
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
5 10 15 20 25 30
productivity
PDF of skill distributions in the model
high school factualhigh school counterfactual
college factualcollege counterfactual
Figure 1: Calibrated distributions of skills for the four groups of agents
Table 2: Intertemporal wedge in period zero: present-bias vs. time-consistent case
Present-biased Time-consistent
τ k0 (H) 0.3244 0.2805
Figure 2 shows the optimal intertemporal wedges in t = 1 and conveys a key quantitative
result. The wedges are negative for a wide interval of low incomes, and are always smaller
than 1− β, which implies that the government offers savings subsidies.8 This is an expected
outcome in a model with paternalistic policies and agents who suffer from present bias. More
importantly, the wedges are significantly different for the two education groups. College
graduates enjoy a much higher subsidy than high school graduates at all income levels,
with the difference eventually disappearing for higher incomes. The government does so in
order to provide them with incentives to invest in college education ex-ante. Without such
incentives, H-agents worry that additional education will not deliver a sufficient increase in
their welfare, because their own present bias will prevent them from smoothing their higher
working-age income across the life cycle. By contrast, notice that in the variant of our model
with time-consistent agents, the optimal intertemporal wedge in the working-age period is
equal to zero for both education groups. This is because time-consistent agents are able to
raise retirement savings on their own.
8The theoretical result where the government decreases savings for sufficiently high θ is not quantitativelysignificant since the distributions f (θ|κ) and f (θ|κL,H) are similar.
19
−0.8
−0.6
−0.4
−0.2
0
0.2
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Intertemporal wedge in period 1
high schoolcollege
Figure 2: Intertemporal wedge in the model with present-biased agents
Figure 3 presents the optimal labor wedges for both education groups according to the
two variants of our model: with present-biased agents or with time-consistent agents. The
optimal labor wedges follow a U-shaped pattern and converge to a constant for top income
levels, which is standard in Mirrlees taxation with Pareto-tailed productivity distributions
(Diamond, 1998; Saez, 2001). It is important to notice that optimal wedges mostly decline
with income and are significantly different for the two education groups. This resembles the
main result of Findeisen and Sachs (2016) and stems from the fact that H-agents must be
offered a separate income tax schedule to provide them with incentives to optimally choose
to go to college. Notice that the differences in optimal wedges between the present-biased
and time-consistent settings are very small, and arise predominantly at lowest incomes. This
implies that the presence of present-biased agents does not alter the normative prescriptions
in terms of the design of income tax schedules that the literature has established so far.
Appendix C reinforces this point by showing that the present-bias component of the optimal
labor wedge, as introduced in Section 3.2, is in general small quantitatively and declines
monotonically with income. This is in contrast to a setting without dynamic consumption,
where the myopic component is not mitigated by the disagreement component and marginal
income tax rates could become negative (Lockwood, 2018).
20
0.2
0.3
0.4
0.5
0.6
0.7
0.8
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Labor wedge in period 1
present biased − high schooltime consistent − high school
time consistent − collegepresent biased − college
Figure 3: Labor wedge in the model with present-biased agents
4.1 Welfare Gains from Optimal Policies
We now turn our attention to the calculation of potential welfare gains arising from the
optimal policies. We will compare our optimum to three separate benchmarks: optimal
policies for time-consistent agents, as well as optimal policies with present-biased agents
when either labor or intertemporal wedges are restricted to be education-independent.
4.1.1 Welfare Gains Relative to Optimal Time-Consistent Policies
As the first benchmark, we use the optimal policies dedicated to time-consistent agents,
for whom β = 1. We consider two possible policy implementations for time-consistent
agents.9 The first one, called the laissez-faire implementation, leaves the agents alone in
their retirement savings decision in period t = 1. Because the policy is dedicated for time-
consistent agents, the government is confident that agents will smooth consumption in line
with their time preferences. This is not the case for present-biased agents though, and we
expect our optimal policies to bring about significant welfare gains relative to this benchmark.
In order to isolate the effect of education-dependent savings incentives from mere subsi-
dization of retirement savings, we also consider a second implementation for time-consistent
agents which features mandatory savings. In this world, agents are forced to smooth their
9The details of these implementations are presented in Appendix D.
21
consumption between working-life and retirement in line with the Euler equation. It does not
make a difference for time-consistent agents who would have made the same choice anyway.
On the other hand, the government helps present-biased agents save for retirement under
this implementation, but without taking advantage of the education-dependent intertempo-
ral wedge.
Table 3: Welfare gains over optimal policies for time-consistent agents
Mandatory savings Laissez-faire
% increase in lifetime consumption 0.96 2.19
Table 3 presents the welfare gains under our baseline parametrization relative to the two
time-consistent benchmarks. Consistent with prior expectations, the gains over laissez-faire
policies are the highest and amount to 2.19% of lifetime consumption. The gains come from
increased retirement savings and also from improved production efficiency. On the other
hand, the gains relative to mandatory savings is 0.96% of lifetime consumption, which is
considerably lower than laissez-faire policies, but still significant. Since mandatory savings
policy already forces agents to smooth their consumption, this implies the welfare gains of
the optimal education-dependent policies largely come from more efficient production.
4.1.2 Welfare Gains from Education-Dependent Wedges
We now turn our attention to the benchmark of optimal policies with present-biased
agents, but where some policies are restricted to be education-independent. Education-
independent policies are policies that are conditioned only on observed income y. We will
calculate the consumption-equivalent welfare gain from moving to the tax system where both
wedges depend on education attainment, relative to either of the two cases: (i.) education-
independent labor income tax and education-dependent savings subsidy and (ii.) education-
independent savings subsidy but education-dependent labor income tax.
Specifically, for case (i.) we solve the government’s problem under an additional con-
straint that, for any θ and θ such that y(H, θ) = y(L, θ) we have
h′(y(L,θ)
θ
)θu′(c1(L, θ)
) =h′(y(H,θ)
θ
)θu′(c1(H, θ)
) (9)
In essence, regardless of education, agents with the same reported income face the same
labor tax. For case (ii.), on the other hand, we require the agents with the same reported
22
income, y(H, θ) = y(L, θ), face the same savings subsidy
u′(c1(L, θ)
)u′(c2(L, θ)
) =u′(c1(H, θ)
)u′(c2(H, θ)
) (10)
Solving for optimal policies under constraints (9) and (10) is non-trivial because these
restrictions are contingent on allocations (declared income) rather than the underlying state
variables (productivity). We overcome this challenge by designing a computational algo-
rithm, described in Appendix E, which allows us to make the constraints conditional on
allocations. Figures 4(a) and 4(b) present the optimal wedges obtained under the two re-
strictions, along with the education-dependent benchmark.
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Labor wedges in period 1: optimal vs education-independent
high schoolcollege
independent
(a) Education-independent labor wedge
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Intertemporal wedges in period 1: optimal vs education-independent
high schoolcollege
independent
(b) Education-independent intertemporal wedge
Figure 4: Optimal education-dependent and independent wedges
Table 4: Welfare gains over optimal education-independent policies
Education-independent: labor wedge intertemporal wedge
% increase in lifetime consumption 0.51 0.02
Table 4 shows welfare gains measured as a corresponding percentage increase in lifetime
consumption that would result from moving from a tax system in which one of the wedges
is education-independent to the optimal tax system (where all taxes depend on educational
attainment). First, if labor income taxes are allowed to be conditioned on education, the
resulting welfare gain is equivalent to a 0.51% increase in lifetime consumption. Second,
if savings subsidies are allowed to depend on education, the corresponding gain in lifetime
consumption would be 0.02%.
23
The welfare implications of optimal education-dependent income taxes are in the ballpark
of the numbers reported by previous literature, for example Findeisen and Sachs (2016).
The novel component in this paper, education-dependent savings subsidies, are shown to
have much smaller welfare gains. This result is consistent with the broad literature in
macroeconomics which has found that consumption smoothing yields relatively small welfare
gains, given the standard parameter values. Most notably, Lucas (1987) showed that the
gain from eliminating all post-war business cycle fluctuations in the US would be equivalent
to a 0.05% increase in average consumption.
4.2 Testing Policies Without Screening
As a final step in our quantitative analysis of the model, we test whether the optimal
Mirrleesian screening approach is indeed preferable quantitatively to simpler alternatives. In
particular, we calculate the government’s value derived under the policy that all agents get
higher education, as well as one where no agents do.10
-0.405
-0.4
-0.395
-0.39
-0.385
-0.38
-0.375
-0.37
-0.365
-0.36
-0.355
-0.35
0 5 10 15 20 25 30 35 40
annual cost of college (in thousands of 2015 USD)
values related to different human capital policies
no one goes to collegeeveryone goes to collegeoptimal screening policy
Figure 5: Comparing optimal policies to alternatives without screening
Figure 5 presents the values associated with these alternative policies, along with the
optimal screening one. The values are presented as function of the annual monetary cost of
higher education, ranging from zero up to 40,000 USD (the actual calibrated cost, as Table
10In evaluating these policies, we use the counterfactual distributions of skills presented in Figure 1, aswell as counterfactual values for the discount factors δ0(e) and δ1(e)
24
1 shows, is 15,700 USD). It can immediately be noticed that the optimal Mirrleesian policy
dominates both alternatives at all cost values, including zero. This is due to the fact that
going to college and beyond entails a significant opportunity cost of time spent in education.
It is also worth noticing that for realistic levels of the calibrated cost, sending no one to
college dominates the alternative of sending everyone to college.
5 Implementation
After characterizing the wedges, we are ready to discuss the implications of our findings on
the design of student loans, income taxes and retirement policies. In particular, this section
highlights how to decentralize policies where retirment savings is contingent on education
investment, which is the main innovation of the paper.11
For education policies, we consider a decentralization with student loans and income-
contingent repayment plans. Agents can take out a loan amount of L (e) , which is a function
of the education investment. After agents enter the work force, the loan repayment depends
on realized income. We abstract from parental financial assistance, so students solely rely
on student loans in t = 0.
For retirement savings, we consider two ways of implementing the optimum. First, we
present an implementation where the subsidy for retirement savings is both income and
education contingent. Finally, we examine an implementation with social security and a
retirement savings account where student loan repayments are also considered as contribu-
tions to the account. The latter captures the spirit of the recently proposed bills in the US
Congress, the Retirement Parity for Student Loans Act and the Retirement Security and
Savings Act, which intend to qualify student loan repayments for employer matching.
Before presenting the decentralized economy, it is important to note that we are departing
from the direct revelation mechanism in which agents report their type (γ, θ) . Instead, for
our implementation, policies are based on the observed education investment e, income y,
and savings. To do this, we first need to show that the optimal consumption from the
direct revelation mechanism {c0 (γ) , c1 (γ, θ) , c2 (γ, θ)}γ,θ∈Θ can be expressed as a function
of income y and education e. It is immediate that, by separating the agents according to their
innate ability, the optimal allocations can be rewritten as a function of education instead of
reported innate ability: c0 (γ) = c0 (eγ) and ct (γ, θ) = ct (eγ, θ) . The next lemma shows that
reported productivity can be replaced with income, so the government can implement the
optimum using policies that depend on income and education.
11We present the three period formulation of the problem here. Details on the life-cycle model arepresented in the appendix.
25
Lemma 2 For any e ∈ {eL, eH} , the optimal consumption c1 (e, θ) and c2 (e, θ) are functions
of y (e, θ) : ct (e, θ) = ct (y (e, θ)) for any t ≥ 1.
5.1 Education-Contingent Retirement Savings Subsidy
For our first implementation, agents are offered a student loan L (e) in t = 0. They are
required to make income contingent repayments of [1− τ e (e, y)]L (e) in t = 1, where the
subsidy τ e (e, y) is a function of education expenses and income. In t = 1, agents also face
an income tax T (y) independent of education. Most importantly, agents can save s2 in a
retirement account at t = 1, where the savings are subsidized at a rate τ s (e, y) which is
a function of income and education investment. Furthermore, retirement savings s2 come
from after-tax funds, so the income and education dependent retirement savings account is
similar to a Roth 401(k). Finally, in each period, agents can save via the risk-free bond b,
which are taxed with a history-independent bond savings tax T k (b) .12
Given the proposed policies, at t = 1, agents with education level e and productivity θ
solve the following:
maxc1,y,c2,s2,b2
u (c1)− h(yθ
)+ βδ2u (c2)
subject to
c1 + s2 + b2 + R1 (e) (1− τ e (e, y))L (e) = y − T (y) + R1 (e) b1 − T k (b2) ,
c2 = R2 (1 + τ s (e, y)) s2 +R2b2,
where R1 (e) = R1(e)R0(e)
is the gross interest rate normalized by the difference between the
period lengths of t = 0 and t = 1. For example, R (eL) = 0 since we assumed δ0 (eL) = 0 for
our quantitative analysis. Let {c∗1 (e, θ) , y∗ (e, θ) , c∗2 (e, θ)} denote the solution to the agents’
problem at t = 1 for any θ ∈ Θ and e ∈ {eL, eH} . Also, let U1 (e, θ) denote the value function
for the agents’ problem at t = 1. The agents’ problem with innate ability γ at t = 0 is
maxc0,e,b1
δ0 (e)u (c0) + βδ1 (e)
∫ θ
θ
[U1 (e, θ) + (1− β) δ2u (c∗2 (e, θ))] f (θ|κ (e, γ)) dθ
subject to
c0 + e+ b1 = L (e)− T k (b1) and e ∈ {eL, eH} .
Let P ss ={
[L (e) , τ e (e, y)] , τ s (e, y) ,[T (y) , T k (b)
]}. The following proposition states
12The bond savings tax helps the government deter agents from over-saving while simultaneously under-supplying labor (Werning, 2011).
26
that the optimum can be decentralized with an income-contingent student loans policy
(L (e) , τ e (e, y)) combined with an income and education dependent retirement subsidy
τ s (e, y) and tax policy(T (y) , T k (b)
).
Proposition 3 The optimum can be implemented with P ss.
Figure 6 presents the optimal student loan repayment and retirement savings subsidies
for the two education groups as function of income. Panel 6(a) shows that for the H-agents
with income below 60,000 in present value, the repayment subsidy starts at over 80% and
decreases with income. Once the Pareto tail kicks in, the trend reverts and the optimal
subsidy increases before settling at around 120%. Panel 6(b) shows the savings subsidy
schedules. Notice that the optimal subsidies closely mirror the intertemporal wedges τ k1
depicted in Figure 2, where lower income levels receive more subsidies and the subsidy for
college graduates is higher for virtually all income levels.
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Student loan repayment subsidy in period 1
high schoolcollege
(a) Optimal student loan subsidy τe(e, y)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Optimal savings subsidy in period 1
high schoolcollege
(b) Optimal savings subsidy τs(e, y)
Figure 6: Optimal education-contingent subsidies
It is worth pointing out that the optimal student loans subsidy is determined by the labor
wedges. We set the income tax to match the labor wedge for L-agents, while the income
contingent student loans subsidies coupled with the marginal income tax rate replicates the
optimal labor wedge for H-agents. Since the optimal labor wedge for H-agents is larger with
the difference growing until income 60, 000, the student loans subsidy is decreasing up to
that amount. Beyond 60, 000, the difference in the labor wedges decreases initially and with
the optimal labor wedge for L-agents eventually rising above the labor wedge of H-agents,
which causes the student loans subsidy to increase. Also, since we appended the Pareto-tail
to the top 10% of the productivity distribution for each education group, the U -shaped dip
in the labor wedge for H-agents is much higher than the one for L-agents. As a result, the
27
labor wedge for L-agents is much larger than the labor wedge for H-agents with incomes
between 70, 000 and 90, 000. This drives the significant increase in student loans subsidy
beyond 60, 000.
5.2 Student Loan Payment as Contribution to Retirement Savings
In this section, we consider an implementation with social security benefits and retire-
ment savings accounts that are linked to student loan repayments. The advantage of this
decentralization is that it adopts the main features of existing retirement policies. Further-
more, it demonstrates how the aforementioned retirement bills proposed in the US Congress
could be used to implement the optimum.
For this implementation, similar to the current system, all agents receive an income-
contingent social security benefit a (y) upon retirement. The retirement savings account is
defined by the contribution matching rate α ∈ [0, 1] and a contribution limit s. Retirement
account contributions come from pre-tax income and are only lump-sum taxed T ra upon
withdrawal, so the retirement savings account is similar to a traditional 401(k). Furthermore,
similar to current retirement savings accounts, matched contributions are not subject to the
contribution limit s.
The novelty of this implementation is that the amount of student loan repaid r (e, y) is
considered a contribution, so employers can further contribute αr (e, y) into the account. Let
φ (s2, r) denote the amount of assets in the retirement savings account as a function of the
deposit s2 and the student loan repayment r, so we have φ (s2, r) = (1 + α) s2 + αr. Similar
to the previous implementation, L (e) denotes the student loan taken out in t = 0, and T (y)
denotes the income tax in t = 1. Here, the student loan repayment is tax deductible and
reduces income tax by g (r) .
Given the proposed policies, at t = 1, agents with education investment e and produc-
tivity θ solve
maxc1,y,c2,s2,b2
u (c1)− h(yθ
)+ βδ2u (c2)
subject to
c1 + s2 + b2 + r (e, y) = y − T (y − s2) + g (r (e, y)) + R1 (e) b1 − T k (b2) ,
c2 = a (y) +R2φ (s2, r (e, y)) +R2b2 − 1φ>0Tra,
0 ≤ s2 ≤ s,
where 1φ>0 is an indicator function with 1φ>0 = 1 if and only if there are assets in the account,
28
otherwise 1φ>0 = 0. Adopting the notation introduced in the previous implementation, the
agents’ problem with innate ability γ at t = 0 is
maxc0,e,b1
δ0 (e)u (c0) + βδ1 (e)
∫ θ
θ
[U1 (e, θ) + (1− β) δ2u (c∗2 (e, θ))] f (θ|κ (e, γ)) dθ
subject to
c0 + e+ b1 = L (e)− T k (b1) and e ∈ {eL, eH} .
Let P ra ={
[L (e) , r (e, y)] , a (y) , [α, s] ,[T (y) , T k (b) , T ra, g (r)
]}denote the policy instru-
ments for the proposed implementation. The following proposition shows that it is possible
to decentralize the optimum using P ra.
Proposition 4 The optimum can be implemented through P ra where student loan repay-
ments are considered contributions to the retirement savings account.
Figure 7 presents the student loan repayment schedule in our second implementation. The
solid green line shows the face value of the repayment which starts high and then decreases
initially, allowing college-graduates to accumulate additional (and decreasing in income)
contributions in their 401(k) plans. In order to maintain the optimal income-contingency in
loan repayments, these agents are offered the tax deduction, which makes their repayment
schedule increase in income, as represented by the dashed red line. Notice that once the
Pareto tail for high school graduates kicks in, the two trends revert.
0
20
40
60
80
30 60 90 120 150−2
0
2
4
6
average annual income (in thousands of 2015 USD)
Student loan repayment schedules
face value (left axis)effective (right axis)
Figure 7: Optimal student loan repayment schedule r(e, y)
29
The effective student loan repayment schedule, which is determined by the difference
between the repayment schedule and the tax deduction, is largely shaped by the optimal
labor distortion as explained in the previous implementation. What is significant is that this
decentralization uses student loan repayments as a retirement savings vehicle for college-
educated agents. More specifically, our implementation constructs the social security benefits
to match the optimal retirement consumption of high school graduates. The student loan
repayments at its face value are designed to supplement the social security benefits so college-
educated agents consume the optimum during retirement.
At the heart of the implementation is the idea that college graduates can save for retire-
ment while paying off their student loans. The contribution matching rate α that arises in
our proposed implementation amounts to 2.00%, which is in the ballpark of the actual rate
used by the IRS ruling from May 2018.
6 Discussion
6.1 Heterogeneous Present Bias
We extend our results to an environment with heterogeneous present bias by assuming
that agents with innate ability γ have present bias βγ, where 1 ≥ βH > βL. The perfect
correlation between innate ability and the degree of present bias allows us to bypass the
multi-dimensional screening problem, simplifying the analysis.13 Proposition 5 characterizes
the distortions and shows that the link between retirement savings and education investment
persists when the degree of present bias is heterogeneous.
Proposition 5 The constrained efficient allocation with heterogeneous present bias satisfies
i. the inverse Euler equations (3), (4) and for any θ ∈ Θ,
1
βHu′ (c2 (H, θ))=
1
u′ (c1 (H, θ))+
(1− βHβH
)(πH + βHµ
πH + µ
)1
u′ (c0 (H)),
1
βLu′ (c2 (L, θ))=
1
u′ (c1 (L, θ))+
(1− βLβL
)πL − βHµ(f(θ|κL,H)f(θ|κL)
)πL − µ
1
u′ (c0 (L)),
where µ = [u′ (c0 (L))− u′ (c0 (H))][u′(c0(L))
πL+ u′(c0(H))
πH
]−1
.
13This setup is related to Golosov et al. (2013). They consider an environment with time-consistent agentswhere productivity is perfectly correlated with the long-run discount factor.
30
ii. the labor wedge for H-agents satisfies (7) and for L-agents:
τw (L, θ)
1− τw (L, θ)= AL (θ)BL (θ)
×
[CL (θ)−
(1− F (θ|κL,H)
1− F (θ|κL)
)DL (θ) +
βL1−βL
g (θ|κL)βH
1−βHg (θ|κL,H)
EL (θ)
],
where Eγ(θ) =[
u′(c1(γ,θ))βγu′(c2(γ,θ))
− 1]−(
1−βγβγ
)u′(c1(γ,θ))
φand 1
φ= Eγ
[Eθ(
1u′(c1(γ,θ))
∣∣∣∣γ)] .Though the economic forces determining the wedges for H-agents remain unchanged,
Proposition 5 shows us how the optimal policy leverages the difference in β for the L-agents’
wedges. For the intertemporal wedge τ k1 (L, θ) , recall that the optimal policy recommends
front-loading consumption for high-income L-agents. Here, this front-loading could be more
perverse. It takes advantage of the fact that H-agents value retirement consumption more
than L-agents, so a restriction on retirement savings further deters downward deviations
by H-agents. This logic is similar to the key finding in Golosov et al. (2013) which shows
that discouraging the consumption of a good preferred by high types among low types raises
welfare. The labor wedge for L-agents τw (L, θ) also differs from the case with homogeneous
β. Recall that, in the previous sections, the present-bias component EL (θ) for L-agents
is enhanced by the differences in the factual and counterfactual distributions to deter H-
agents from mimicking. Here, the labor distortion for L-agents coming from the present-bias
component is weakened. This is because H-agents are less tempted to mimic L-agents due
to the larger intertemporal distortion, which relieves the labor distortions stemming from
present bias.
A special case is when H-agents are time consistent while only L-agents are present-
biased (βH = 1 > βL). From Proposition 5, the H-agents’ wedges share the same properties
as the wedges for time-consistent agents. Also, the present-bias component EL (θ) no longer
influences the labor wedge of L-agents. Instead, the optimal policy takes advantage of
present-biased L-agents entirely through the intertemporal distortion in retirement savings
τ k1 (L, θ) , which is worsened with time-consistent H-agents. This implies that even though
linking student loan payments to retirement savings is not essential for time-consistent college
graduates, education-dependent savings policies are still optimal. We believe this case is
a theoretical curiosity, since empirical studies have demonstrated pervasive present-biased
behavior among college students (Ariely and Wertenbroch, 2002; Steel, 2007).
31
6.2 Non-Sophistication
The paper has thus far assumed that the agents are sophisticated—fully aware of their
present bias. Sophisticated agents have a demand for commitment to prevent their future-
selves from under-saving. The optimal policy in this paper takes advantage of this demand by
assisting college graduates with their retirement savings to incentivize them to go to college
in the first place. We may also want to investigate the optimal education and retirement
savings policies for non-sophisticated agents.
For non-sophisticated agents, the government can use off-path policies to take advantage
of their incorrect beliefs. Following Yu (2019a), the government can introduce a menu of
savings options in t = 1. One of the options in the menu will be selected by the agents
on the equilibrium path while the other option is a decoy, the off-path policy. The decoy
option features a relatively backloaded consumption path—high retirement consumption but
lower working period consumption—compared to the on-path option. At t = 0, the non-
sophisticated agents underestimate their present bias and thus overestimate the value of
retirement consumption to their future-selves. As a result, they mispredict that they will
select the decoy option in t = 1. In reality, their future-selves prefer the more front-loaded on-
path option instead. Therefore, the government can exploit this incorrect belief by promising
college graduates with high retirement benefits—which never needs to be implemented on the
equilibrium path—to induce investment in higher education. In other words, the inclusion
of a decoy option in the menu can relax the ex-ante incentive constraint. In fact, Yu (2019a)
showed that if the consumption utility is unbounded above and below, then the ex-ante
incentive constraints can be fully relaxed. More details are provided in Appendix F.
Off-path policies are powerful, but the optimal policy should still feature the interdepen-
dence between retirement savings and education investment. This is due to two reasons.
First, the economy is most likely populated by agents with heterogeneous levels of sophisti-
cation. A menu with decoy options would not be able to fool sophisticated agents, so it is
optimal for the government to rely on the present paper’s policies for relatively more sophis-
ticated agents. Future work should explore the optimal combination of these two policies.
Second, governments may object to the use of off-path policies to mislead agents due to moral
or reputational reasons. In this case, it is optimal to implement the education-dependent
retirement savings even for non-sophisticated agents. As long as agents have some demand
for commitment, albeit lower than what is optimal, the government can still take advantage
of this demand by making retirement savings contingent on education investment. However,
this interdependence disappears when agents are naıve—fully unaware of their present bias.
This is because naıve agents believe their future-selves to be time-consistent, so this paper’s
retirement policies would not encourage them to increase investment in education.
32
7 Conclusion
This paper formulates the optimal education and retirement policies in a dynamic Mir-
rlees model with present-biased agents. A novel contribution of this paper is to show that
the optimal retirement savings policy depends on education. More specifically, we show how
linking student loan repayments to retirement savings along with some qualitative changes
to existing policies can implement the optimum. We estimate the welfare gains from these
policies to be significant. Also, the inverse Euler equation does not hold with present-biased
agents, but the labor wedge is quantitatively similar to the case with time-consistent agents.
References
Anderberg, Dan, “Optimal Policy and the Risk Properties of Human Capital Reconsid-
ered,” Journal of Public Economics, 2009, 93, 1017–1026.
Angeletos, George-Marios, David Laibson, Andrea Repetto, Jeremy Tobacman,
and Stephen Weinberg, “The Hyperbolic Consumption Model: Calibration, Simula-
tion, and Empirical Evaluation,” Journal of Economic Perspectives, 2001, 15 (3), 47–68.
Ariely, Dan and Klaus Wertenbroch, “Procrastination, Deadlines, and Performance:
Self-Control by Precommitment,” Psychological Science, 2002, 13 (3), 219–224.
Bohacek, Radim and Marek Kapicka, “Optimal Human Capital Policies,” Journal of
Monetary Economics, 2008, 55, 1–16.
Bovenberg, A. Lans and Bas Jacobs, “Redistribution and Education Subsidies are
Siamese Twins,” Journal of Public Economics, 2005, 89, 2005–2035.
Cadena, Brian and Benjamin Keys, “Human Capital and the Lifetime Costs of Impa-
tience,” American Economic Journal: Economic Policy, 2015, 7 (3), 126–153.
Craig, Ashley, “Optimal Income Taxation with Spillovers from Employer Learning,” Work-
ing Paper, 2019.
Cunha, Flavio and James Heckman, “Identifying and Estimating the Distributions of
Ex Post and Ex Ante Returns to Schooling,” Labour Economics, 2007, 14 (6), 870–893.
Diamond, Peter, “Optimal Income Taxation: An Example with a U-Shaped Pattern of
Optimal Marginal Tax Rates,” American Economic Review, 1998, 96 (3), 694–719.
Farhi, Emmanuel and Xavier Gabaix, “Optimal Taxation with Behavioral Agents,”
American Economic Review, 2019, Forthcoming.
Findeisen, Sebastian and Dominik Sachs, “Education and Optimal Dynamic Taxation:
The Role of Income-Contingent Student Loans,” Journal of Public Economics, 2016, 138,
1–21.
Gabaix, Xavier, “A Sparsity-Based Model of Bounded Rationality,” Quarterly Journal of
33
Economics, 2014, 129 (4), 1661–1710.
Galperti, Simone, “Commitment, Flexibility, and Optimal Screening of Time Inconsis-
tency,” Econometrica, 2015, 83 (4), 1425–1465.
Gary-Bobo, Robert and Alain Trannoy, “Optimal Student Loans and Graduate Tax
under Moral Hazard and Adverse Selection,” The RAND Journal of Economics, 2015, 46
(3), 546576.
Golosov, Mikhail, Maxim Troshkin, Aleh Tsyvinski, and Matthew Weinzierl,
“Preference Heterogeneity and Optimal Capital Income Taxation,” Journal of Public Eco-
nomics, 2013, 97, 160–175.
, , and , “Redistribution and Social Insurance,” American Economic Review, 2016,
106 (2), 359–386.
, Narayana Kocherlakota, and Aleh Tsyvinski, “Optimal Indirect and Capital Tax-
ation,” Review of Economic Studies, 2003, 70 (3), 569–587.
Grochulski, Borys and Tomasz Piskorski, “Risky Human Capital and Deferred Capital
Income Taxation,” Journal of Economic Theory, 2010, 145, 908–943.
Halac, Marina and Pierre Yared, “Fiscal Rules and Discretion Under Persistent Shocks,”
Econometrica, 2014, 82 (5), 1557–1614.
Heathcote, Jonathan and Hitoshi Tsujiyama, “Optimal Income Taxation: Mirrlees
Meets Ramsey,” Working Paper, 2017.
, Kjetil Storesletten, and Giovanni Violante, “Optimal Tax Progressivity: An Ana-
lytical Framework,” Quarterly Journal of Economics, 2017, 132 (4), 1693–1754.
Kapicka, Marek, “Optimal Mirrleesean Taxation in a Ben-Porath Economy,” American
Economic Journal: Macroeconomics, 2015, 7 (2), 219–248.
and Julian Neira, “Optimal Taxation with Risky Human Capital,” American Economic
Journal: Macroeconomics, 2019, 11 (4), 271–309.
Laibson, David, “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of Eco-
nomics, 1997, 112 (2), 443–477.
, Peter Maxted, Andrea Repetto, and Jeremy Tobacman, “Estimating Discount
Functions with Consumption Choices over the Lifecycle,” Working Paper, 2017.
Lockwood, Benjamin, “Optimal Income Taxation with Present Bias,” Working Paper,
2018.
Lucas, Robert E. Jr., “Models of business cycles,” New York: Basil Blackwell, 1987.
Makris, Miltiadis and Alessandro Pavan, “Taxation Under Learning-by-Doing,” Work-
ing Paper, 2019.
Meara, Ellen, Seth Richards, and David Cutler, “The Gap Gets Bigger: Changes in
Mortality and Life expectancy by Education, 1981−2000,” Health Affairs, 2008, 27 (2),
350–360.
34
Moser, Christian and Pedro Olea de Souza e Silva, “Paternalism vs Redistribution:
Designing Retirement Savings Policies with Behavioral Agents,” Working Paper, 2019.
Myerson, Roger, “Multistage Games with Communication,” Econometrica, 1986, 54 (2).
Nakajima, Makoto, “Rising indebtedness and temptation: A welfare analysis,” Quantita-
tive Economics, 2012, 3, 257–288.
Nigai, Sergey, “A Tale of Two Tails: Productivity Distribution and the Gains from Trade,”
Journal of International Economics, 2017, 104, 44–62.
O’Donoghue, Ted and Matthew Rabin, “Doing It Now or Later,” American Economic
Review, 1999, 89 (1), 103–124.
and , “Choice and Procrastination,” Quarterly Journal of Economics, 2001, 116, 121–
160.
Saez, Emmanuel, “Using Elasticities to Derive Optimal Income Tax Rates,” Review of
Economic Studies, 2001, 68, 205–229.
Stantcheva, Stefanie, “Optimal Taxation and Human Capital Policies over the Life Cycle,”
Journal of Political Economy, 2017, 125 (6), 1931–1990.
Steel, Piers, “The Nature of Procrastination: A Meta-Analytic and Theoretical Review of
Quintessential Self-Regulatory Failure,” Psychological Bulletin, 2007, 133 (1), 65–94.
Werning, Ivan, “Nonlinear Capital Taxation,” Working Paper, 2011.
Yu, Pei Cheng, “Optimal Retirement Policies with Present-Biased Agents,” Working Pa-
per, 2019.
, “Seemingly Exploitative Contracts,” Working Paper, 2019.
35
Appendices (for online publication)
A Derivation of the Theoretical Results
A.1 The Optimization Problem
Given Lemma 1, the relaxed optimal tax problem is
maxP
∑γ
πγ
[δ0 (eγ)u (c0 (γ)) + δ1 (eγ)
∫ θ
θ
[U1 (γ, θ) + (1− β) δ2u (c2 (γ, θ))] f (θ|κγ) dθ
]
subject to
U1 (γ, θ) = u (c1 (γ, θ))− h(y (γ, θ)
θ
)+ βδ2u (c2 (γ, θ)) , (11)
∂U1 (γ, θ)
∂θ=y (γ, θ)
θ2h′(y (γ, θ)
θ
), (12)
δ0 (eH)u (c0 (H)) + βδ1 (eH)
∫ θ
θ
[U1 (H, θ) + (1− β) δ2u (c2 (H, θ))] f (θ|κH) dθ
≥ δ0 (eL)u (c0 (L)) + βδ1 (eL)
∫ θ
θ
[U1 (L, θ) + (1− β) δ2u (c2 (L, θ))] f (θ|κL,H) dθ,
and the resource constraint. As is standard, we ignore the monotonicity constraint—y (γ, θ)
is non-decreasing in θ—and check it later. Also, we assume that the ex-ante incentive
constraint for H-agents binds and show that the incentive constraint for L-agents holds.
Let (λγ (θ) , ξγ (θ) , µ, φ) be the multipliers on (11), (12), ex-ante incentive compatibility
and resource constraint respectively. Using standard Hamiltonian techniques, we derive the
following necessary conditions for optimality(1 +
µ
πH
)u′ (c0 (H)) =
(1− µ
πL
)u′ (c0 (L)) = φ,
(πH + βµ) δ1 (eH) f (θ|κH)− ξ′H (θ) = λH (θ) ,[πL − βµ
(f (θ|κL,H)
f (θ|κL)
)]δ1 (eL) f (θ|κL)− ξ′L (θ) = λL (θ) ,
(1− β) (πH + βµ) δ1 (eH) f (θ|κH) + βλH (θ) =φπHδ1 (eH) f (θ|κH)
u′ (c2 (H, θ)),
(1− β)
[πL − βµ
(f (θ|κL,H)
f (θ|κL)
)]δ1 (eL) f (θ|κL) + βλL (θ) =
φπLδ1 (eL) f (θ|κL)
u′ (c2 (L, θ)),
36
and for all γ, the boundary conditions hold: ξγ (θ) = ξγ(θ)
= 0, and
λγ (θ)u′ (c1 (γ, θ)) = φπγδ1 (eγ) f (θ|κγ) ,
λγ (θ)1
θh′(y (γ, θ)
θ
)+ξγ (θ)
[1
θ2h′(y (γ, θ)
θ
)+y (γ, θ)
θ3h′′(y (γ, θ)
θ
)]= φπγδ1 (eγ) f (θ|κγ) .
Below, we show that the theoretical results follow from these conditions.
A.2 Proofs
Proof of Proposition 1: Conditions (5) and (6) and µ follow from the first order conditions.
The inverse Euler equations (3) and (4) are derived using the perturbation argument.
Let P ={c0 (γ) , [ct (γ, θ) , y (γ, θ)]t>0,θ∈Θ
}γ
be the allocation that solves the constrained
efficient planning problem. We first derive (4) by considering a small increase in c2 (γ, θ)
across θ for a fixed γ. That is, for all θ, define u (c2 (γ, θ)) = u (c2 (γ, θ)) + ∆ for some small
∆. We simultaneously decrease c1 (γ, θ) for all θ such that u (c1 (γ, θ)) = u (c1 (γ, θ))− δ2∆.
Such perturbations do not affect the objective function, the ex-ante incentive compatibility
and the ex-post incentive compatibility. It only affects the resource constraint. Note that the
perturbation must be the same for all θ or else it may violate ex-post incentive compatibility,
which is not the case if β = 1. If P is optimal, then it must be that ∆ = 0 minimizes the
resource used, i.e.,
0 = arg min∆
∫Θ
[−u−1 [u (c1 (γ, θ))− δ2∆]− 1
R2
u−1 [u (c2 (γ, θ)) + ∆]
]f (θ|κγ) dθ.
Evaluating the first order condition of this problem at ∆ = 0 yields (4).
Similarly, to derive (3), we consider a small decrease in c1 (γ, θ) for all θ and γ such that
u (c1 (γ, θ)) = u (c1 (γ, θ))− 1δ1(eγ)
∆ for some small ∆. We simultaneously increase c0 (γ) for
all γ such that u (c0 (γ)) = u (c0 (γ)) + 1δ0(eγ)
∆. Since it is perturbed for all θ, the ex-post
incentive compatibility constraint is not affected. Also, notice that the ex-ante incentive
compatibility constraint and objective function are not affected, but the resource constraint
changes. Crucially, the perturbation must be the same for all γ or it may violate ex-ante
incentive compatibility, which is not the case if β = 1. If P is optimal, then ∆ = 0 solves,
min∆
∑γ
πγ
{−u−1[u (c0 (γ)) + ∆
δ0(eγ)
]R0 (eγ)
− 1
R1 (eγ)
∫Θ
u−1
[u (c1 (γ, θ))− ∆
δ1 (eγ)
]f (θ|κγ) dθ
}.
Evaluating the first order condition of this problem at ∆ = 0 yields (3).
37
Proof of Proposition 2: From the first order conditions, we have
ξH (θ) =
∫ θ
θ
[λH (x)− (πH + βµ) δ1 (eH) f (x|κH)] dx,
ξL (θ) =
∫ θ
θ
[λL (x)− [πLf (x|κL)− βµf (x|κL,H)] δ1 (eL)] dx.
First, we derive (7). Since λγ (θ) = φπγδ1(eγ)f(θ|κγ)
u′(c1(γ,θ)), we rewrite the first order condition on
y (H, θ) as
φπHδ1 (eH) f (θ|κH)
1−1θh′(y(H,θ)θ
)u′ (c1 (H, θ))
=
[1
θ2h′(y (H, θ)
θ
)+y (H, θ)
θ3h′′(y (H, θ)
θ
)]∫ θ
θ
[λH (x)− (πH + βµ) δ1 (eH) f (x|κH)] dx.
Let Aγ(θ) = 1−F (θ|κγ)
θf(θ|κγ)and Bγ(θ) = 1 +
y(γ,θ)θ
h′′( y(γ,θ)θ )h′( y(γ,θ)θ )
, then dividing both sides by
1θh′(y(H,θ)θ
)φπHδ1 (eH) f (θ|κH) yields
1
1θh′(y(H,θ)θ
) − 1
u′ (c1 (H, θ))
= AH (θ)BH (θ)
∫ θ
θ
[λH (x)
φπHδ1 (eH) f (x|κH)− πH + βµ
φπH
]f (x|κH)
1− F (θ|κH)dx.
By definition 1θh′(y(γ,θ)θ
)= (1− τw(γ, θ))u′ (c1 (γ, θ)) and from the first order condition,
λγ(x)
φπγδ1(eγ)f(x|κγ)= 1
u′(c1(γ,x)), so we have
1
u′ (c1 (H, θ))
(τw (H, θ)
1− τw (H, θ)
)= AH (θ)BH (θ)
[∫ θ
θ
1
u′ (c1 (H, x))
f (x|κH)
1− F (θ|κH)dx− πH + βµ
φπH
].
Observe that βµφπH
= µφπH− (1−β)µ
φπH, then by the first order conditions, we can sub-
stitute in µφπH
= 1u′(c0(H))
− 1φ
and (1−β)µφπH
= 1βu′(c2(H,θ))
− 1u′(c1(H,θ))
−(
1−ββ
)1φ. Define
Cγ(θ) =∫ θθu′(c1(γ,θ))u′(c1(γ,x))
[1− u′(c1(γ,x))
φ
]f(x|κγ)
1−F (θ|κγ)dx, Dγ (θ) = u′ (c1 (γ, θ))
[1
u′(c0(γ))− 1
φ
], and
Eγ(θ) =[u′(c1(γ,θ))βu′(c2(γ,θ))
− 1]−(
1−ββ
)u′(c1(γ,θ))
φthen multiplying both sides by u′ (c1 (H, θ)) to
yield (7).
38
Using a similar process as above, we have the following expression for γ = L
τw (L, θ)
1− τw (L, θ)= AL (θ)BL (θ)
[CL (θ)−
∫ θ
θ
(πLf (x|κL)− βµf (x|κL,H)
φπL
)u′ (c1 (L, θ))
1− F (θ|κL)dx
],
and integrating gives us (8). Furthermore, from the first order condition for c0, we have
φ =[
πHu′(c0(H))
+ πLu′(c0(L))
]−1
, combining it with (3) yields φ =
{Eγ[Eθ(
1u′(c1(γ,θ))
∣∣∣∣γ)]}−1
.
Proof of Lemma 2: For a fixed γ, suppose there exists θ and θ such that
y(γ, θ)
= y(γ, θ). Let Φ (γ, θ) = u (c1 (γ, θ)) + βδ2u (c2 (γ, θ)) . There are two cases
to consider. First, suppose Φ(γ, θ)6= Φ
(γ, θ), then clearly the allocations are not
incentive compatible. Next, suppose Φ(γ, θ)
= Φ(γ, θ), and without loss of generality
c1
(γ, θ)> c1
(γ, θ)
and c2
(γ, θ)< c2
(γ, θ). Let π and π denote the measure of
(γ, θ)
and(γ, θ)
agents. Let ut = 1π+π
[πu(ct
(γ, θ))
+ πu(ct
(γ, θ))]
. By assigning these
agents the average utility, the total welfare is unchanged and incentive compatibility is
preserved. However, since u is strictly concave, the consumption level that gives u1 and u2
relaxes the resource constraint. This means that it is not optimal for c1
(γ, θ)> c1
(γ, θ)
and c2
(γ, θ)< c2
(γ, θ)
with Φ(γ, θ)
= Φ(γ, θ). In other words, the consumption paths
are equivalent for agents of the same level of income.
Proof of Proposition 3: First, following Werning (2011), we construct bond savings tax
T k (b) such that agents never purchase bonds. To see how, consider the government assigning
the optimal allocation from the direct revelation mechanism given past and current reports,
while agents are allowed to purchase any desired amount of bonds. Define a fictitious tax
T k1 (b2, r, θ) paid in t = 1 for each productivity realization θ, current bond level b1, past
report rγ, current report rθ, and bond savings b2, where r = (rγ, rθ) . The tax T k1 (b2, r, θ) is
set such that
u(c1 (r) + R1 (e (rγ)) b1 − b2 − T k1 (b2, r, θ)
)− h
(y (r)
θ
)+ βδ2u (c2 (r) +R2b2)
= u (c1 (γ, θ))− h(y (γ, θ)
θ
)+ βδ2u (c2 (γ, θ)) .
Next, by taking the supremum over all θ ∈ Θ, we obtain a bond savings tax T k1 (b2, r) =
supθ∈Θ Tk1 (b2, r, θ) that is independent of productivity. Before we derive the bond savings
39
tax in t = 0, let
V (b1, rγ, θ) = u(c1 (rγ, rθ) + R1 (e (rγ)) b1 − b2 − T k1
(b2, rγ, rθ
))− h
(y (rγ, rθ)
θ
)+ δ2u
(c2 (rγ, rθ) +R2b2
),
where
(rθ, b2
)∈ arg max
rθ,b2
{u(c1 (r) + R1 (e (rγ)) b1 − b2 − T k1 (b2, r)
)− h
(y (r)
θ
)+ βδ2u (c2 (r) +R2b2)
}.
Next, define T k0 (b1, rγ) = supγ∈{H,L} Tk0 (b1, rγ, γ) with T k0 (b1, rγ, γ) chosen such that
δ0 (eγ)u(c0 (rγ)− b1 − T k0 (b2, rγ, γ)
)+ βδ1 (e (rγ))E [V (b1, rγ, θ) |γ] = δ0 (eγ)u (c0 (γ))
+ βδ1 (eγ)
∫ θ
θ
[u (c1 (γ, θ))− h
(y (γ, θ)
θ
)+ δ2u (c2 (γ, θ))
]dF (θ|κ (eγ, γ)) .
Finally, by taking the supremum over all reports, we obtain a bond savings tax T k (b) =
suprt Tkt (b, rt) , where r1 = r and r0 = rγ, that only depends on bond purchases. With
T k (b) , agents do not purchase bonds while misreporting in equilibrium.
Next, we construct the other policy instruments. By Lemma 2, we can define the opti-
mal consumption derived from the direct mechanism as (c0 (e) , c1 (e, y) , c2 (e, y)) . First, we
construct the student loans and its income-contingent repayment schedule along with the
income tax. Let the loan amount be defined as
L (e) =
c0 (e) + e if e ∈ {eL, eH}
0 otherwise,
and the income-contingent repayment subsidy is τ e (eL, y) = 1 and
τ e (eH , y) = 1+1
R1 (eH)L (eH)
[c1 (eH , y)−c1 (eL, y)+
c2 (eH , y)
R2 (1 + τ s (eH , y))− c2 (eL, y)
R2 (1 + τ s (eL, y))
].
Let y (γ, θ) be the optimal output of type (γ, θ) agents in a direct revelation mechanism
and define Y = {y|y = y (γ, θ) with γ ∈ {L,H} and θ ∈ Θ} to be the admissible set of
40
income. The income tax is
T (y) =
y − c1 (eL, y)− c2(eL,y)R2(1+τs(eL,y))
if y ∈ Y
y if y /∈ Y.
Next, we define the income and education contingent retirement savings subsidy as
1 + τ s (e, y) =
u′(c1(e,y))βu′(c2(e,y))
if e ∈ {eL, eH}
0 otherwise.
Finally, we check that the policy instruments implement the optimum. First, notice that
all agents choose e ∈ {eL, eH} , otherwise they will not have any retirement consumption.
Similarly, due to the income tax, all agents produce output y ∈ Y. Next, for any e ∈ {eL, eH}and y ∈ Y, agents at t = 1 choose consumption to satisfy
u′ (c1)
βu′ (c2)= 1 + τ s (e, y) and c1 +
c2
R2 (1 + τ s (e, y))= c1 (e, y) +
c2 (e, y)
R2 (1 + τ s (e, y)).
Clearly, agents optimally choose c1 = c1 (e, y) and c2 = c2 (e, y) . Also, by the taxation
principle, agents with productivity θ choose y = y (e, θ) . For the final step, notice that
given L (e) , agents with innate ability γ optimally choose education level eγ.
Proof of Proposition 4: By Lemma 2, we can define the optimal consumption derived from
the direct mechanism as (c0 (e) , c1 (e, y) , c2 (e, y)) . Similarly, we focus on an implementation
where agents do not purchase bonds due to the bond savings tax T k (b) , which is constructed
in the proof of Proposition 3.
For the policy instruments, we focus on an implementation where none of the agents
save in the retirement savings account, so s2 = 0. Agents with education eL rely on social
security for retirement consumption while agents with education eH depend on social security
benefits plus student loan repayment contributions in the retirement account. Let y (γ, θ)
be the optimal output of type (γ, θ) agents in a direct revelation mechanism and define
Y = {y|y = y (γ, θ) with γ ∈ {L,H} and θ ∈ Θ} to be the set of admissible income. First,
we construct the matching rate α to be
1 + α = infy∈Y,e∈{eL,eH}
u′ (c1 (e, y))
βu′ (c2 (e, y)).
Next, we construct the social security benefit a (y) = c2 (eL, y) . We set the income tax to be
41
T (y − s2) = y − s2 − c1 (eL, y) , and the tax deduction from student loan repayment is
g (r (eH , y)) = r (eH , y)− [c1 (eL, y)− c1 (eH , y)] and g (0) = 0.
Finally, we construct the student loans and its income-contingent repayment schedule along
with the tax on retirement savings account. Let the loan amount be defined as
L (e) =
c0 (e) + e if e ∈ {eL, eH}
0 otherwise,
and the income-contingent repayment schedule is r (eL, y) = 0 and
r (eH , y) =1
αR2
[c2 (eH , y)− c2 (eL, y) + T ra] .
We choose T ra such that r (eH , y) and g (R1r) are weakly positive. Let T ra (y) be a fictitious
tax schedule defined as
T ra (y) = max {0, c2 (eL, y)− c2 (eH , y) , c2 (eL, y)− c2 (eH , y) + αR2 [c1 (eL, y)− c1 (eH , y)]} .
Observe that given T ra (y) , both the repayment schedule and the tax deduction are weakly
positive for any income. Lastly, by taking the supremum over all income, we obtain an
income-independent lump-sum tax:
T ra = supy∈Y
T ra (y) .
For our last step, we check that the policy instruments implement the optimum. First,
notice that all agents would choose e ∈ {eL, eH} , otherwise c0 = 0. Next, due to the low
matching rate, all agents choose s2 = 0. As a result, given the taxes and social security
benefit, agents who invested eL consume c1 = c1 (eL, y) and c2 = c2 (eL, y) . Next, for
agents who invested eH , given the taxes, c1 = y − T (y) + g (r (eH , y)) − r (eH , y) and
c2 = a (y) + αR2r (eH , y)− T ra, so they optimally choose c1 = c1 (eH , y) and c2 = c2 (eH , y) .
Also, by the taxation principle, agents with productivity θ choose y = y (e, θ) . Finally,
notice that given L (e) , agents with innate ability γ optimally choose education level eγ.
Proof of Proposition 5: With heterogeneous β, the government’s problem remains the
42
same except (11) is now
U1 (γ, θ) = u (c1 (γ, θ))− h(y (γ, θ)
θ
)+ βγδ2u (c2 (γ, θ)) ,
for all γ and the ex-ante incentive constraint is
δ0 (eH)u (c0 (H)) + βHδ1 (eH)
∫ θ
θ
[U1 (H, θ) + (1− βH) δ2u (c2 (H, θ))] f (θ|κH) dθ
≥ δ0 (eL)u (c0 (L)) + βHδ1 (eL)
∫ θ
θ
[U1 (L, θ) + (1− βL) δ2u (c2 (L, θ))] f (θ|κL,H) dθ.
The results follow from the procedures outlined in the proofs for Proposition 1 and
Proposition 2.
B Approximating Current Policies
To approximate current income taxes in the United States, we follow Heathcote et al.
(2017) and assume an income tax function T (y) = y − λy1−τ . College students have access
to low-interest federal loans. In t = 0, agents take out loans with gross interest R < R1 (eH)
for loans below L. We model this as if agents who chose eH receive a lump-sum transfer of
T (eH) = RL when they start working. Agents who choose eL do not receive this transfer.
Upon retirement, agents receive social security benefits, which are income-dependent.
The regulation below has been translated to fit the context of our model. To derive an
agent’s social security benefits, first calculate the agent’s average indexed monthly earnings
(AIME) which is defined as AIME = y12
for annual income y. In practice, the social security
administration takes 35 of the highest annual incomes from the 45 years of the agent’s work
life and calculate the average monthly earnings. Next, based on 2015 social security regu-
lations, the agent’s monthly benefit a(AIME) is determined by the following replacement
rates and bend points:
a(AIME) =
0.9× AIME if AIME ≤ 826
743.4 + 0.32× (AIME − 826) if 826 < AIME ≤ 4, 980
2, 072.68 + 0.15× (AIME − 4, 980) if 4, 980 < AIME ≤ 9, 875
2, 806.93 if AIME > 9, 875
.
43
This immediately implies that the agent receives A(y) = 12×a (AIME) every year in social
security benefits.
Using the 2015 regulations, agents are subject to a flat social security tax Ts (y) , which
is defined as
Ts (y) =
0.124× y if y ≤ 118, 500
14, 694 if y > 118, 500.
The tax is capped at an annual income of 118, 500. Furthermore, the social security benefits
are distributed from the social security tax.
We assume that agents accumulate retirement savings in a 401(k) account and a regular
savings account which pays a gross interest of R2. Let s2 denote savings in a 401(k) account
and b2 in the regular savings account. Contributions to the 401(k) account are capped at an
annual amount of 18, 000. We also assume an employer matching rate of 50%. Contributions
to defined contribution plans, such as 401(k), are pre-tax. This means that income tax
payments are deferred upon withdrawal when retiring. However, social security tax is not
deferred. Since contributions to 401(k) are matched, agents would first save in their 401(k)
accounts until the cap binds, before saving in their regular accounts.
B.1 Deriving Allocations for Current Policies
To determine the allocation of present-biased agents under the current policy, we adopt
subgame perfect Nash equilibrium as our solution concept.
B.1.1 The Working Period Problem
By backward induction, agents with productivity θ who took out a loan of b1 in t = 0
and invested e in education solve the following problem:
maxu (c1)− h (l) + βδ2u (c2)
subject to
c1 + b2 + s2 = θl − T (θl − s2)− Ts (θl)− R1 (e)
R0 (e)b1 + 1e=eHT (eH) ,
c2 = 1.5R2s2 +R2b2 + A (θl)− T (1.5R2s2) ,
s2 ≤ c,
44
where c is the upper-bound on contributions to the 401(k) account and 1e=eH = 1 only if
agents invested eH , or else 1e=eH = 0. Let χt (θ) denote the multiplier on the period t budget
constraint for agents who invested eH , and χt(θ) be the multiplier for low-educated agents.
Using Only 401(k): When agents only use 401(k), then it means that agents choose to
save s2 < c.
We first look at agents who invested eH . The first order conditions for consumption and
savings s2 are
u′ (c1) = χ1 (θ) , βδ2u′ (c2) = χ2 (θ) and χ1 (θ) = χ2 (θ) 1.5R2
(θl − s2
1.5R2s2
)τ.
This provides us with the following Euler equation:
u′ (c1) = 1.5β
(θl − s2
1.5R2s2
)τu′ (c2) .
For labor supply, we have four different income regions to consider:
h′ (l) =
χ2 (θ){
1.5R2
(θl−s2
1.5R2s2
)τB (θ, y, s2) + 0.9θ
}if y ≤ 9, 912
χ2 (θ){
1.5R2
(θl−s2
1.5R2s2
)τB (θ, y, s2) + 0.32θ
}if 9, 912 < y ≤ 59, 760
χ2 (θ){
1.5R2
(θl−s2
1.5R2s2
)τB (θ, y, s2) + 0.15θ
}if 59, 760 < y ≤ 118, 500
χ2 (θ) 1.5R2
(1
1.5R2s2
)τθλ (1− τ) if y > 118, 500
,
where B (θ, y, s2) = θλ (1− τ) (y − s2)−τ − 0.124θ As for agents who invested eL, the first
order conditions are the same except for replacing χt (θ) with χt(θ) .
Using Both 401(k) and Savings: When agents start saving in the regular savings
account—b2 > 0, then it means that s2 = c.
We first analyze the case where agents invested eH in t = 0. Suppose the agent has saved
s2 = c, then the agent can only continue to save with the standard savings account. We can
rewrite the sequential budget constraint into its present value terms:
c1 +c2 − λ (1.5R2c)
1−τ − A (θl)
R2
= λ (θl − c)1−τ − Ts (θl)− R1 (eH)
R0 (eH)b1 + T (eH) .
Let χ (θ) denote the multiplier on the present-valued budget constraint. The first order
45
conditions on consumption are
u′ (c1) = χ (θ) and βu′ (c2) = χ (θ) .
The first order condition for labor is
h′ (l) =
χ (θ)[θλ (1− τ) (θl − c)−τ − 0.124θ + 0.9
R2θ]
if y ≤ 9, 912
χ (θ)[θλ (1− τ) (θl − c)−τ − 0.124θ + 0.32
R2θ]
if 9, 912 < y ≤ 59, 760
χ (θ)[θλ (1− τ) (θl − c)−τ − 0.124θ + 0.15
R2θ]
if 59, 760 < y ≤ 118, 500
χ (θ) θλ (1− τ) (θl − c)−τ if y > 118, 500
,
We can derive a similar set of first order conditions for agents who obtained education
level eL.
B.1.2 The Schooling Period Problem
Let (c1 (e, θ) , y (e, θ) , c2 (e, θ)) denote the solution to the problem in Section B.1.1, which
is the optimal consumption path and output agents choose in t = 1 given education e and
producticity θ. Agents with innate ability γ solve the following problem:
maxc0,e,b1
δ0 (e)u (c0) + βδ1 (e)
∫ θ
θ
[u (c1 (e, θ))− h
(y (e, θ)
θ
)+ δ2u (c2 (e, θ))
]f (θ|κ (e, γ)) dθ
subject to
c0 + e = b1 and e ∈ {eL, eH} .
In essence, agents take out a yearly loan of b1 to pay for their schooling and consumption in
t = 1.
B.2 Calibration
In this section we calibrate the model to resemble the “real world” as closely as possible.
The goal is to back out the distribution of productivities across different education groups.
To this extent, we first pick a number of parameters externally and summarize them in Table
5. Then, we calibrate the distributions of skills internally to match the evidence on lifetime
earning provided by Cunha and Heckman (2007).
The values of risk aversion and Frisch elasticity of labor are standard and set to 2 and
46
Table 5: Parameter values in the model
Symbol Meaning Value Source
γ Risk aversion 2}
Standard valuesη Frisch elasticity 0.5
τ Tax progressivity 0.161}
Heathcote andTsujiyama (2017)λ Taxation level 0.839
c 401(k) contribution limit 1.80
Approximatedfrom data
eH Cost of college 1.57rm Commercial interest on student loans 0.1rg Government interest on student loans 0.05bg Cap on government-subsidized student loans 5.75
β Short-term discount factor 0.7
Based onNakajima (2012)
δ0(eL) High school period 0 long-term discount factor 0.00δ1(eL) High school period 1 long-term discount factor 1.00δ0(eH) College period 0 long-term discount factor 0.15δ1(eH) College period 1 long-term discount factor 0.93δ2 Retirement discount factor 0.270
Time-consistent benchmark (β = 1)
Based onNakajima (2012)
δ0(eL) High school period 0 long-term discount factor 0.00δ1(eL) High school period 1 long-term discount factor 1.00δ0(eH) College period 0 long-term discount factor 0.19δ1(eH) College period 1 long-term discount factor 0.85δ2 Retirement discount factor 0.154
Note: All monetary parameters are denominated in 10,000 of 2015 US dollars.
0.5, respectively. Next, we discuss the calibration of the current tax system. The parameters
of the income tax function τ and λ are borrowed from Heathcote and Tsujiyama (2017) and
apply to income level normalized by average income in the economy.14 The upper bound for
401(k) contributions c is set to $18,000 and reflects the present value of lifetime contribution
limits based on the limit in 2015. As for the financing of student loans, we assume for
simplicity that the annual interest rates an agent may obtained through private market and
through a government-subsidized scheme are 10% and 5%, respectively. The amount of
subsidized loan is capped at $57,500, in line with the regulations for Stafford loans in the
US. We further assume that an agent takes ten years to repay the student loans.
14We calculate average income directly using the factual distributions of lifetime income from Cunhaand Heckman (2007) and the shares of high school and college graduates (and beyond) of 0.68 and 0.32,respectively, from the CPS. The average lifetime income amounts to $1,570,900.
47
The annual cost of higher education eH is assumed to be $15,700, which is calculated
for 2015 based on average tuition costs of private and public colleges plus different types of
graduate degrees15 as well as relative enrollment data for both types of college.16 Table 6
presents a breakdown of different higher education outcomes, along with average costs and
durations, which we use to calculate this parameter.
Table 6: Breakdown of higher education outcomes
Degree type % of population Duration Annual cost
Associate’s and less 67.7 0 0Bachelor’s only 20.3 4 15, 396Master’s 8.0 6 16, 140Professional 1.9 8 27, 210Doctoral 2.1 10 6, 158
Total 100 5.12 15, 695
Note: distribution of educational attainment is from CPS 2015. The durations and
annual costs are cumulative. The data on costs of various higher degrees are taken
from NCES, Digest of Education Statistics and expressed in 2015 dollars. We ignore
the cost and duration of Associate’s degrees as those are often combined with jobs.
In calibrating the short- and long-term discount factors we primarily follow Nakajima
(2012) who uses a general equilibrium model with present-biased agents and targets a capital-
output ratio of 3. We adopt his assumed value of the short-term discount factor of 0.7
which places in the midrange of estimates found by Laibson et al. (2017). The annual
long-run discount factor is δannual = 0.9852 following Nakajima (2012) which we in turn use
to calculate effective discount rates across the three periods in our model. These effective
discount rates also reflect the relative lengths of the periods, which may differ across agents
of different education groups. Because high school graduates start working right away, they
never actually experience the education period 0; hence their parameter δ0(eL) is zero and
δ1(eL) is one. On the other hand, college graduates spend 5.12 years in period 0, which
reflects the average duration of undergraduate and graduate studies in the US (Table 6
presents a detailed breakdown), and then another 45 years in period 1. This yields δ0(eH) =1−δ5.12annual
1−δ45annual= 0.15 and δ1(eH) =
δ5.12annual−δ50.12annual
1−δ45annual= 0.93. We assume that both education types
spend 45 years working and 20 years in retirement. This yields a common retirement period
discount factor of δ2 =δ45annual−δ
65annual
1−δ45annual= 0.27.17
15Source: College Board, Annual Survey of Colleges and NCES, Digest of Education Statistics16Source: NCES, Digest of Education Statistics, and Current Population Survey 201517Because the college type first spends four years on education before they start to work, we assume that
they also retire four years later, at age 67, and live four years longer. This is consistent with a significant
48
For our analysis in the main body of the paper we also use the benchmark of time-
consistent agents, i.e. the world where β = 1. For reference, we present here the analogous
derivations of the effective long-run discount factors for that case. Once again following
Nakajima (2012) we assume an annual discount rate δannual = 0.9698. Then, with the same
reasoning we assume δ0(eL) = 0 and δ1(eL) = 1 for high school graduates, compared to
δ0(eH) = 0.19 and δ1(eH) = 0.85 for college graduates. The discount factor for retirement
amounts to δ2 = 0.15.
Having established the external parameters, we turn to the parameters governing the
distribution of skills which are set through solving and simulating the model. For each of
the four groups of agents: (i.) factual high school graduates, (ii.) high school graduates, had
they gone to college, (iii.) factual college graduates, and (iv.) college graduates, had they not
gone to college, we observe the empirical distributions of lifetime earnings reported by Cunha
and Heckman (2007). Roughly speaking, these distributions are obtained by estimating a
Roy-type model on combined NLSY and PSID data and generating counterfactuals for both
education groups. As it is commonly known, panel surveys such as these tend to under-
represent the upper tail of the earnings distribution. For this reason, similar to Findeisen
and Sachs (2016), we add an upper Pareto-tail with the shape parameter of 1.5 (Saez (2001)).
For each distribution, we select an income threshold at which we attach the Pareto tail such
that the upper 10% of the mass is distributed according to it. We pick the scale parameter
such that the (smoothed out) PDF of the empirical distribution of earnings from Cunha and
Heckman (2007) intersects at the threshold with the Pareto PDF Table 7 summarizes the
parameters of the Pareto tail that we add to each of the empirical distribution of lifetime
earnings.
Table 7: Adding a Pareto tail to lifetime income distributions
HS fact. HS counter. COL fact. COL counter.
Threshold 203.7 257.6 288.4 212.8Scale parameter 62.9 80.2 89.4 63.8
Note: The thresholds refer to present value of lifetime earnings and are expressed in
$10,000s of 2015 dollars. Thresholds are selected in each case such that 10% of total
mass is distributed according to Pareto distribution with the shape parameter of 1.5.
To capture the earnings distribution with a fat upper tail in our model, we assume
that agents’ skills θ follow a mixture of two distributions, a normal distribution and a two-
piece distribution (lognormal-Pareto) as described in Nigai (2017). The probability density
body of research which shows college graduates live longer than non-college graduates (Meara et al., 2008).
49
function of our mixture is then given by
f(θ) =p×[
1
2πσ1
exp
{(θ − µ1)2
2σ21
}](13)
+(1− p)×
ρ
Φ(αs(α,ρ)
) 1√2πs(α,ρ)θ
exp{−1
2
(αs(α, ρ)− log(θT )−log(θ)
s(α,ρ)
)}, if θ ∈ (0, θT )
(1− ρ)α(θT )α
θα+1 , if θ ∈ [θT ,∞)
In equation (13), µ and σ are the mean and standard deviation of the normal distribu-
tion, and p is the probability of drawing it. The two-piece lognormal-Pareto distribution
comes with a shape parameter α, which we fix at 1.5, and two scale parameters, ρ and θT .
Intuitively, θT is the threshold value at which the standard lognormal distribution turns into
Pareto, while ρ ∈ (0, 1) represents the fraction of total mass that is distributed according to
lognormal. We have hence 5 parameters to pin down for each of the four groups of agents,
(µ, σ, ρ, θT , p), in order to replicate the empirical distributions of earning provided by Cunha
and Heckman (2007) and augmented with the Pareto tail. To do so, we solve for the optimal
policy functions in each of the four cases and simulate random draws for 100,000 agents. We
use a global optimization algorithm to minimize the distance between the simulated CDF of
lifetime earnings and the targeted one. Table 8 shows all components of our mixture density
defined in (13) matter quantitatively and altogether result in a good fit for model-derived
distributions of earnings in each group. Figures 8-9 depict the CDF of lifetime earnings
in the model and their empirical targets across the four groups of agents. Notice that all
estimations result in an excellent fit to the data, with an exception of High School counter-
factual. However, this distribution does not affect the model solution in any way and it is
only necessary to verify that the low type indeed prefers to reveal truthfully.
Table 8: Parameters of productivity distributions
Symbol MeaningValue
HS fact. HS counter. COL fact. COL counter.
µ Mean of normal 7.91 8.99 10.34 8.25σ St. dev. of normal 2.15 2.44 3.48 2.15θT Threshold for Pareto 8.36 8.45 18.24 8.10ρ Fraction of lognormal 0.39 0.48 0.65 0.43p Probability of normal 0.64 0.60 0.66 0.56
Note: Productivities drawn from these distributions are in annual terms.
50
0
0.2
0.4
0.6
0.8
1
200 400 600 800 1000
lifetime earnings
High school graduates - factual
modeldata
0
0.2
0.4
0.6
0.8
1
200 400 600 800 1000
lifetime earnings
College graduates - factual
modeldata
Figure 8: Distributions of lifetime income - factual
0
0.2
0.4
0.6
0.8
1
200 400 600 800 1000
lifetime earnings
High school graduates - counterfactual
modeldata
0
0.2
0.4
0.6
0.8
1
200 400 600 800 1000
lifetime earnings
College graduates - counterfactual
modeldata
Figure 9: Distributions of lifetime income - counterfactual
C Decomposition of the Labor Wedge
In this section, we quantify the decomposition of the labor wedge introduced in Section
3.2. Figure 10 presents the numerical approximation of the labor wedge components A,C,D
and E as function of income for the high innate ability type.18 The A component depends
on the inverse hazard rate of the distribution of θ and declines at first, before increasing and
converging to a constant due to the presence of a Pareto tail. By contrast, the intratemporal
component C increases and then converges, resulting in the overall convergence of the labor
18We ignore the B components because, given the functional forms we impose, it reduces to a constant. Wealso omit the decomposition for the low innate ability type because in our calibrated model the period-zeroconsumption of L-agents is not pinned down. As a result, the intertemporal component is not well-defined.
51
wedge at the top of the distribution. The offsetting role that comes from the intertemporal
component D is much smaller in size and decreases monotonically.
Decomposition of the labor wedge
0
2
4
6
8
10
12
14
16
30 60 90 120 150
A
2
4
6
8
10
12
14
16
18
30 60 90 120 150
C
0.05 0.1
0.15 0.2
0.25 0.3
0.35 0.4
0.45 0.5
0.55 0.6
30 60 90 120 150
average annual income (in thousands of 2015 USD)
D
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
30 60 90 120 150
average annual income (in thousands of 2015 USD)
E
Figure 10: Labor wedge components for the high innate ability type
The novel aspect of our paper is the introduction of E, the present bias component. As
can be noticed, this components declines monotonically but its magnitude is also very small
compared to components D, or especially C. Consequently, the labor wedge is generally not
much affected by the present bias, and any difference shows up most prominently at the
lowest levels of income, as evident in Figure 3.
To get a better understanding of this result, we can decompose the present bias element
further into the disagreement component and the myopic component, as explained in Section
3.2. Figure 11 plots these two components on a single graph for the high innate ability
type. Notice immediately that they offset each other for the most part and so the difference
between them, the present bias component of the labor wedge, is quantitatively very small.
This is similar for L-agents, which is plotted in Figure 12.
52
−1.5
−1
−0.5
0
0.5
1
1.5
2
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Decomposition of present bias element E for high type
Disagrement componentMyopic component
Figure 11: Decomposition of the present bias element of the labor wedge for H-agents
−1.5
−1
−0.5
0
0.5
1
1.5
30 60 90 120 150
average annual income (in thousands of 2015 USD)
Decomposition of present bias element E for low type
Disagrement componentMyopic component
Figure 12: Decomposition of the present bias element of the labor wedge for L-agents
D Time-Consistent Benchmarks
In this section, we present details of the implementation of the time-consistent optimal
policies, which we use as benchmark for welfare calculations in Section 4.1. We consider two
ways to implement the optimal allocation for time-consistent agents: mandatory retirement
savings and laissez faire retirement savings. The two different implementations lead to
different measures of welfare improvement.
First, we characterize the optimal allocations for time-consistent agents in a direct mech-
53
anism. Let{c0 (γ) , [ct (γ, θ) , y (γ, θ)]t>0,θ∈Θ
}be the optimal allocation for time-consistent
agents. The optimal allocation for time-consistent agents satisfies the following:
u′ (c1 (γ, θ)) = u′ (c2 (γ, θ)) .
This implies that c1 (γ, θ) = c2 (γ, θ) = c (γ, θ) . For t = 0, the government implements c0 (γ)
by providing agents a student loan of
L (eH) = c0 (H) + eH and L (eL) = c0 (L) + eL.
Next, we proceed to consider two different methods to decentralize the optimal allocations
in t = 1 and t = 2.
D.1 Mandatory Savings
Consider a mandatory minimum savings rule that forces agents to smooth consump-
tion: c1 (γ, θ) = c2 (γ, θ) . For time-consistent agents, the policy implements the optimum.
However, for present-biased agents, the minimum savings rule is not incentive compatible.
To see how the minimum savings rule changes the behavior of present-biased agents, we
first analyze how agents would change their reports of θ. Since for our quantitative exercise,
u (c) = c1−σ
1−σ and h(yθ
)= 1
1+ 1η
(yθ
)1+ 1η . Then, for a given report of innate ability γ and the
time-consistent allocations, present-biased agents choose a report θ to maximize the utility
at t = 0. In essence, a θ-agent solves
maxθu(c1
(γ, θ))− h
y(γ, θ)
θ
+ βδ2u(c2
(γ, θ))
.
From the argument above and the assumptions on the utility function, the problem can be
rewritten as
maxθu(c(γ, θ))− 1
1 + 1η
y(γ, θ)
(1 + βδ2)1
1+ 1η θ
1+ 1η
.
We know that when β = 1, the solution to the problem above is θ = θ, because the mechanism
satisfies incentive compatibility for time-consistent agents by assumption. Thus, we can
54
transform the problem into the following alternative problem:
maxθu(c(γ, θ))− 1
1 + 1η
y(γ, θ)
α (1 + δ2)1
1+ 1η θ
1+ 1η
,
where α =(
1+βδ21+δ2
) 1
1+ 1η . Immediately, we can see that agents optimally report θ = αθ,
because the problem is similar to a time-consistent agent with productivity αθ. As a result,
the present-biased agents with productivity θ do not report truthfully and instead report
(1 + βδ2
1 + δ2
) 1
1+ 1ηθ.
This result is intuitive, because the reward for working is spread evenly between the two
periods with mandatory savings. Since present-biased agents put less weight on retirement
consumption, the mandatory savings policy provides less incentives for them to work. Their
optimal strategy is to under-report their productivity to work less.
Finally, in t = 0, agents know that they will report(
1+βδ21+δ2
) 1
1+ 1η θ in t = 1. As a result,
given the optimal time-consistent allocation, H-agents solve the following:
max
{u (c0 (H))+β
δ1 (eH)
δ0 (eH)
∫ θ
θ
u(c1
(H, θ
))− h
y(H, θ
)θ
+ δ2u(c2
(H, θ
)) dF (θ|κH) ,
δ0 (eL)
δ0 (eH)u (c0 (L))+β
δ1 (eL)
δ0 (eH)
∫ θ
θ
u(c1
(L, θ
))− h
y(L, θ
)θ
+ δ2u(c2
(L, θ
)) dF (θ|κL,H)
},
where θ =(
1+βδ21+δ2
) 1
1+ 1η θ.
D.2 Laissez Faire Savings
Another way to implement the optimum is for the government to allow agents to save
freely for retirement. This is because with time-consistent agents, it is not necessary for
the government to introduce any additional incentives for retirement savings. Hence, to
implement the optimal allocation for time-consistent agents, the government only needs to
introduce appropriate income taxes at t = 1 and student loans in t = 0. However, under
laissez faire savings, present-biased agents do not smooth consumption and it is also not
incentive compatible.
55
To find out how present-biased agents behave, we first derive the income tax T (y) that
implements the optimum for time-consistent agents. At t = 1, time-consistent agents solves
the following:
maxc1,c2,y
u (c1)− h(yθ
)+ δ2u (c2)
subject to
c1 + s2 = y − T (y) and c2 = R2s2.
Let Y be the set of optimal income for TC agents:
Y = {y|y = y (γ, θ) , ∀γ ∈ {L,H} , θ ∈ Θ} .
By Lemma 2, we can rewrite the allocations in terms of income: ct (y (γ, θ)) = c (γ, θ) .
As a result, we can define the following income tax, which implements the optimum for
time-consistent agents:
T (y) =
y if y /∈ Y
y − c (y)− 1R2c (y) if y ∈ Y .
.
For simplicity, we assume that if the government observes an off-path income level that it
didn’t expect, it usurps all of the output and leaves the agent without any consumption.
Next, we outline how present-biased agents behave under laissez faire savings. Given
laissez faire savings and the income tax above, present-biased agents solve the following at
t = 1,
maxc1,c2,y
u (c1)− h(yθ
)+ βδ2u (c2)
subject to
c1 + s2 = y − T (y) and c2 = R2s2.
We can rewrite the problem as
maxc1,c2,y
u (c1)− h(yθ
)+ βδ2u (c2)
subject to
c1 +1
R2
c2 =
(1 +
1
R2
)c (y) and y ∈ Y .
It is clear that agents never choose y /∈ Y , because all of the output would be confiscated.
As a result, for any given y ∈ Y , present-biased agents choose consumption (c1 (y) , c2 (y))
56
to satisfy
u′ (c1 (y)) = βu′ (c2 (y))
and
c1 (y) +1
R2
c2 (y) = c (y) +1
R2
c (y) .
It is obvious that agents do not choose the optimum for time-consistent agents. Hence, there
will be intertemporal inefficiencies. In addition to the intertemporal inefficiencies, the agents
might also choose suboptimal output. Agents choose y by solving the following:
maxy∈Y
u (c1 (y))− h(yθ
)+ βδ2u (c2 (y)) ,
where (c1 (y) , c2 (y)) is defined above.
After solving for the optimal allocations for t = 1, 2, we can solve for the agent’s education
choices in t = 0. The process is the same as the one for mandatory savings.
D.3 Welfare Comparisons
To evaluate the welfare improvement of the paper’s proposed policies, we measure the
change of moving from mandatory savings or laissez faire savings to the policies introduced
in Section 5.
However, this welfare evaluation is not straightforward. We need to guarantee the al-
locations chosen by present-biased agents under mandatory savings or laissez faire savings
are feasible. This is because, from the analysis above, output of present-biased agents is
further distorted under policies designed for TC agents. Therefore, the government budget
constraint does not hold with present-biased agents under mandatory savings or laissez faire
savings.
To facilitate the welfare comparison, we introduce an external government expenditure
G > 0 in the time-consistent setup, so that the resource constraint becomes
∑γ
πγ
{− c0 (γ)
R0 (eγ)− eγ +
1
R1 (eγ)
∫Θ
[y (γ, θ)− c1 (γ, θ)− 1
R2
c2 (γ, θ)
]f (θ|κγ) dθ
}≥ G.
We interpret G as an emergency fund the government uses to supplement the agents’
consumption when total output is lower than expected. Hence, we require G to
be sufficiently large so that the allocations chosen by the present-biased agents,
57
{c0 (γ) , [ct (γ, θ) , y (γ, θ)]t>0,θ∈Θ
}, are feasible:
∑γ
πγ
{− c0 (γ)
R0 (eγ)−eγ+
1
R1 (eγ)
∫Θ
[y (γ, θ)− c1 (γ, θ)− 1
R2
c2 (γ, θ)
]f (θ|κγ) dθ
}≥ 0. (14)
D.4 Quantitative Implementation
In our quantitative exercise, we design a fixed-point algorithm to find the value of G such
that the resource constraint in (14) binds. The algorithm can be summarized as follows:
1. Start with an initial value for government spending G0.
2. Solve for the optimal allocations with time-consistent agents.
3. Use the allocations, implemented either through mandatory savings or laissez-faire
arrangement, to solve for the best response of present-biased agents. Calculate the
resulting gap in the resource constraint which stems from present-biased agents under-
reporting their productivity type. Denote the gap G1.
4. Check if |G0 −G1| < ε, where ε is a tolerance criterion. If yes, we have found a fixed
point. If not, update G0 and go back to step 1.
Table 9 summarizes the fixed-point amount of government spending G under both im-
plementations which balances the resource constraint under present-biased agents.
Table 9: Fixed-point amount of government spending that balances the resource constraint
Mandatory savings Laissez-faire
10.19 11.27
E Solving for Optimal Education-Independent Policies
In this section, we describe the computational algorithm used to solve for the bench-
mark case of optimal allocations conditional on the labor or the intertemporal wedge being
education-independent. The challenge lies in the fact that while the allocations that we
solve for are a function of productivity θ, the education-independence constraint on wedges
is imposed for each observable income y(γ, θ) which itself is an allocation.
To overcome this challenge we adopt the following approach:
58
1. Create an exogenous grid for income consisting of Ny points. This grid is separate
from the original grid for productivity θ on which the allocations are defined.
2. Consider a generic set of allocations {c1(γ, θ), c2(γ, θ), y(γ, θ)}γ∈{H,L}. For each point
on the exogenous income grid defined in step 1, yi, find θi and θi such that:
y(L, θi) = yi = y(H, θi)
We use linear interpolation to evaluate income at off-grid productivity values.
3. Solve for the optimal allocations under an additional set of Ny constraints, each one
defined by a point on the exogenous grid for income from step 1, as follows.
• For the case of education-independent labor wedge:
h′(y(L,θi)
θi
)θiu′
(c1(L, θi)
) =h′(y(H,θi)
θi
)θiu′
(c1(H, θi)
)• For the case of education-independent intertemporal wedge:
u′(c1(L, θi)
)u′(c2(L, θi)
) =u′(c1(H, θi)
)u′(c2(H, θi)
)Once again, we use linear interpolation to evaluate consumption in both period at
off-grid values of productivity.
F Off-Path Policies for Non-Sophisticated Agents
The paper has focused on sophisticated present-biased agents. Sophisticated agents fully
anticipate the behavior of their future-selves, so they have a demand for commitment. On
the other hand, non-sophisticated agents underestimate the severity of their bias and tend
to demand too little commitment. We explore the implications of non-sophistication on the
design of optimal policy in this section.
To model non-sophistication, we follow O’Donoghue and Rabin (2001). Agents at t = 0
perceive their present bias in t = 1 to be β ∈ [β, 1]. Let W1
(c1, c2, y; γ, θ, e, β
)denote the
non-sophisticated agents’ perceived utility in t = 1:
W1
(c1, c2, y; γ, θ, e, β
)= u (c1)− h
(yθ
)+ βδ2u (c2) .
59
If β = β, agents are sophisticated and fully aware of the bias. If β = 1, agent are fully
naıve and believe their future-selves to be time-consistent. Partially naıve agents know
they are present-biased, β < 1, but they underestimate its severity, β > β. For this exten-
sion, we assume all agents are non-sophisticated and have heterogeneous and unobservable
sophistication distributed within support[β, 1], where β ∈ (β, 1] .
Yu (2019a) showed that it is optimal for the government to take advantage of the mis-
specified beliefs of present-biased agents through the preference arbitrage mechanism (PAM).
PAM features off-path allocation used to exploit the incorrect beliefs, which are referred to as
the imaginary allocations and denoted as (cI , yI). The allocation the government implements
on-path is called the real allocations denoted as (c, y) . We assume that u is unbounded below
and above (u (R+) = R). For H-agents, the government designs the menu
PH ={c0 (H) ,
[cI1, y
I , cI2], [c1 (H, θ) , y (H, θ) , c2 (H, θ)]θ∈Θ
}.
At t = 1, H-agents choose between imaginary and real allocations. The consumption path
of the imaginary allocations is backloaded (cI2 > cI1), while the consumption path of the real
allocation is relatively less back-loaded. It is designed this way so that at t = 0, the agents
mistakenly believe their future-selves will choose the imaginary allocations. However, they
end-up selecting the real allocations instead. Since the ex-ante incentive constraints are non-
binding for L-agents, the government does not need to design imaginary allocations for them,
so PL ={c0 (L) , [c1 (L, θ) , y (L, θ) , c2 (L, θ)]θ∈Θ
}. Similar to Yu (2019a), it is not necessary
to design imaginary allocations tailored for each level of sophistication. It is possible to find
a single set of imaginary allocations such that it implements the same real allocations for
agents of any sophistication.
Lemma 3 For non-sophisticated present-biased agents, the ex-ante incentive compatibility
constraint is non-binding at the optimum.
Proof From Yu (2019a), we first choose the imaginary allocations such that it satisfies the
preference arbitrage constraint: for any θ,
u(cI1)−h
(yI
θ
)+ βδ2u
(cI2)≥ max
θ
u(c1
(H, θ
))− h
y(H, θ
)θ
+ βδ2u(c2
(H, θ
)) .
In essence, in t = 0, agents believe their future-selves would choose the imaginary allocation
over the real allocation. Notice that the real allocations may not be incentive compatible
under the erroneous belief. Next, the imaginary allocations have to satisfy the executability
60
constraints to makes sure that agents eventually choose the real allocation: for any θ,
U1 (H, θ) = u (c1 (H, θ))− h(y (H, θ)
θ
)+ βδ2u (c2 (H, θ)) ≥ u
(cI1)− h
(yI
θ
)+ βδ2u
(cI2).
Thus, the ex-ante incentive compatibility constraint is
δ0 (eH)u (c0 (H)) + βδ1 (eH)
∫ θ
θ
[u(cI1)− h
(yI
θ
)+ δ2u
(cI2).
]f (θ|κH) dθ ≥ U0 (L;H) .
Next, we show how the imaginary allocations can be designed such that the ex-ante in-
centive constraint is non-binding for all sophistication levels. Denote I (θ) = u(cI1)−h
(yI
θ
)and set I (θ) = U1 (H, θ)−βδ2u
(cI2)
so the executability constraints are binding. Hence, the
preference arbitrage constraints can be expressed as u(cI2)≥ J
(θ, β), for all θ ∈ Θ where
J(θ, β)
=maxθ
{u(c1(H,θ))−h
(y(H,θ)θ
)+βδ2u(c2(H,θ))
}−U1(H,θ)
(β−β)δ2. Since the preference arbitrage con-
straints need to hold for all productivity realizations and sophistication, it is clear that cI2 is
chosen to satisfy u(cI2)≥ maxθ,β J
(θ, β). Similarly, the ex-ante incentive constraint can be
rewritten as u(cI2)≥ K, where K = 1
(1−β)δ2
{U0(L;H)−δ0(eH)u(c0(H))
βδ1(eH)−∫ θθU1 (H, θ) dF (θ|κH)
}.
Since u is unbounded above, for any real allocation, the imaginary retirement consumption
cI2 can be chosen to satisfy u(cI2)≥ max
{maxθ,β J
(θ, β), K}. Also, since u is unbounded
below, it is possible to adjust cI1 for any given yI so that I (θ) = U1 (H, θ) − βδ2u(cI2). As
a result, it is always possible to find a single set of imaginary allocations for all levels of
sophistication such that the ex-ante incentive constraints are non-binding for any allocation
implemented on the equilibrium path.
To understand Lemma 3, note that non-sophisticated agents at t = 0 overestimate the
value of retirement consumption to their future-selves. PAM takes advantage of incorrect
beliefs by encouraging education investment through an increased imaginary retirement con-
sumption cI2, which H-agents believe they will choose in t = 1. However, their future-selves
forsake it for more immediate gratification—the relatively less back-loaded real allocations.
The following proposition describes the optimal wedges for non-sophisticated agents.
Proposition 6 The constrained efficient allocation for non-sophisticated agents satisfies
i. full insurance in t = 0 : c0 (H) = c0 (L) ,
ii. the inverse Euler equations: for any γ, 1u′(c0(γ))
= Eθ(
1u′(c1(γ,θ))
)= Eθ
(1
u′(c2(γ,θ))
), and,
for any θ, 1βu′(c2(γ,θ))
= 1u′(c1(γ,θ))
+(
1−ββ
)1
u′(c0(γ)),
61
iii. the labor wedge for any γ and θ satisfies τw(γ,θ)1−τw(γ,θ)
= Aγ (θ)Bγ (θ)Cγ (θ) .
Proof By Lemma 3, the optimization problem with non-sophisticated agents is the same as
the original problem except the ex-ante incentive constraints are non-binding, so µ = 0. As
a result, the first order conditions are
u′ (c0 (H)) = u′ (c0 (L)) = φ,
and for all γ,
πγδ1 (eγ) f (θ|κγ)− ξ′γ (θ) = λγ (θ) ,
(1− β) πγδ1 (eγ) f (θ|κγ) + βλγ (θ) =φπγδ1 (eγ) f (θ|κγ)
u′ (c2 (γ, θ)),
λγ (θ)u′ (c1 (γ, θ)) = φπγδ1 (eγ) f (θ|κγ) ,
ξγ (θ) = ξγ(θ)
= 0,
λγ (θ)1
θh′(y (γ, θ)
θ
)+ξγ (θ)
[1
θ2h′(y (γ, θ)
θ
)+y (γ, θ)
θ3h′′(y (γ, θ)
θ
)]= φπγδ1 (eγ) f (θ|κγ) .
By rearranging the first order conditions, the results follow.
Proposition 6 demonstrates the government’s ability to fully insure agents against differ-
ences in innate ability γ. This is not surprising, because Lemma 3 stated that the government
can screen innate ability without distortions by utilizing PAM. As a result, the only distor-
tions in the economy stem from the unobserved productivity θ realized in t = 1.
Since innate ability is screened for free but productivity is not, Proposition 6 shows that
the intertemporal wedge τ k0 (γ) is characterized by the standard inverse Euler equation for
all innate ability types. This is because the government no longer needs the additional
intertemporal distortions illustrated in Proposition 1 on τ k0 (γ) to incentivize investment
in human capital. The imaginary allocations are sufficient for that purpose. However,
productivity remains unobservable by the government, so savings in t = 0 is still restricted
and shaped by the inverse Euler equation to relax the ex-post incentive constraints.
More interestingly, Proposition 6 shows that all agents are provided with a commitment
device: for any γ and θ, u′(c1(γ,θ))u′(c2(γ,θ))
> β. With non-sophisticated agents, the government can
focus on its paternalistic goals since it no longer needs to manipulate retirement savings
to screen innate ability. More specifically, notice for any γ, the expected intertemporal
distortion is
E[τ k1 (γ, θ)
]= (1− β)
[1− E [u′ (c1 (γ, θ))]
u′ (c0 (γ))
].
62
Since the inverse Euler implies u′ (c0 (γ)) < E [u′ (c1 (γ, θ))] , we have E[τ k1 (γ, θ)
]< 0. In
essence, agents over-save for retirement in expectation. This is due to the disagreement
between the paternalistic government and present-biased agents at t = 1. The government
uses this disagreement to deter downward deviations in productivity by back-loading the
consumption of lower productivity types. At the same time, more productive agents who
produce higher output are rewarded with a more front-loaded consumption path. Since it
is more cost effective to increase production efficiency by decreasing working-period con-
sumption c1 of low productivity agents than to increase it for high productivity agents, the
consumption path is slightly back-loaded on average. Without this disagreement, it is most
efficient to motivate agents to work by respecting their intertemporal preferences and screen
through the standard downward labor distortion.
Finally, Proposition 6 shows that the optimal labor distortion is determined solely by
the intratemporal component. This means the economic forces that shape the labor wedge
are essentially static. Recall from Proposition 2 that both the intertemporal and present-
bias components are integral to the optimal provision of dynamic incentives through labor
distortion. Since the ex-ante incentive constraint is non-binding, the forces that determine
the provision of dynamic incentives are absent from the labor wedge. As a result, the
intertemporal and present-bias components no longer influence labor distortion.
63