lecture 3 [0.3cm] optimal income...

Lecture 3

Optimal Income Taxation

Plan of the talk

1. The intensive model of labor supplyI The setupI Two typesI Continuum of types: the incentive constraintsI The first order condition

2. The extensive model

Income taxation: the basic (simplified) setupThe economy is made of consumer-workers who differ along a singledimension, their productivity or ability w , distributed with the cdf F onR+. They consume an undifferentiated good C and supply labor L.Their tastes are represented by a utility function U(C , L), often takenseparable u(C )− v(L). Before tax income of consumer w is (implicitlinear production function)

Y (w) = wL(w).

The government observes incomes Y (Mirrlees) and can implement a taxschedule T (Y ). The after tax income of consumer w isY (w)− T (Y (w)), so that her utility is

U(w) = U(Y (w)− T (Y (w)), L(w)).

The government objective is to maximize an additive utilitarian functional∫Ψ(U(w))dF (w),

subject to financing a level R of public expenditures∫T (Y (w))dF (w) = R.

Key Concepts for Taxes/Transfers

I Let T (Y ) denote tax liability as a function of earnings Y

1. Transfer benefit with zero earnings −T (0) [sometimes calleddemogrant or lumpsum grant]

2. Marginal tax rate T ′(Y ): individual keeps 1− T ′(Y ) for anadditional 1 ¿ of earnings (relevant for intensive margin labor supplyresponses)

3. Participation tax rate τp = [T (Y )− T (0)]/Y : individual keepsfraction 1− τp of earnings when moving from zero earnings toearnings Y :

Y − T (Y ) = −T (0) + Y − [T (Y )− T (0)] = −T (0) + Y · (1− τp)

Relevant for extensive margin labor supply responses

4. Break-even earnings point Y ∗: point at which T (Y ∗) = 0

Function Y − T (Y ) = R(Y ), single woman, France 1999

Tukey graph of marginal tax rates, France 1999

20

30

40

50

60

70

80

90

100

110

1 3 5 7 9 11 13 15 17 19 21 23252729Tranche de salaires nets du ménage

milliers de francs

Taux marginaux

(pourcentage)

First best: lump sum taxes

Lump sum taxes are taxes that depend on the citizens’ types: T (w).

The Lagrangian of the optimal tax problem is∫[Ψ(U(Y (w)− T (w)) + λT (w)]dF (w)− λR.

The first order condition is, for all w

Ψ′(U(w))U ′(Y (w)− T (w)) = λ.

If Ψ ◦ U is strictly concave, it implies complete equality of after taxincomes. Lump sum transfers implicitly assume that before-tax incomesY (w) do not react to changes in taxes.

Mirrlees: the consumer chooses the labor supply that maximizes

U(wL− T (wL), L).

The two-type case: Stiglitz(1982)

Two types: w1 and w2, w1 < w2, in proportions n1 and n2.

Quasi-linear utility in consumption (no income effects):

U(C , L) = C − v(L),

with v convex increasing, v ′(0) = 0, v ′(+∞) = +∞.We introduce a taste for redistribution in the government objectivethrough a coefficient µ, 0 < µ < 1, multiplying type 2 utility, whilekeeping linearity

n1U1 + n2µU2.

The government budget constraint is

n1T1 + n2T2 = R.

Information and incentives

The government does not observe productivity w , nor labor supply L, butonly pre-tax income or earnings Y = wL.

It can choose a tax schedule T (Y ), and let the agents decide on theirlabor supply, knowing the implications of their choices on their tax bill.

Equivalently, from the revelation principle, the government can use adirect revealing mechanism, announcing two pairs (T1,Y1) and (T2,Y2),making sure that type 1 prefers the first one while type 2 the latter.

The incentive constraints

Y1 − T1 − v

(Y1

w1

)≥ Y2 − T2 − v

(Y2

w1

)(IC1)

Y2 − T2 − v

(Y2

w2

)≥ Y1 − T1 − v

(Y1

w2

)(IC2)

Screening problem: high productivity types get the highest utility,because they can mimic the low types.

Proof that Y2 > Y1

The sum of the two incentive constraints is

v

(Y2

w1

)+ v

(Y1

w2

)− v

(Y1

w1

)− v

(Y2

w2

)≥ 0. (IC1+2)

The function

f (x) = v

(x

w1

)− v

(x

w2

)is increasing in x since its derivative

1

w1v ′(

x

w1

)− 1

w2v ′(

x

w2

)is positive from the convexity of v . As a consequence (IC1+2) implies

Y2 > Y1.

Proof that IC2 binds

The two incentive constraints can be rewritten, denoting C = Y − T asafter tax income:

v

(Y2

w1

)− v

(Y1

w1

)≥ C2 − C1 ≥ v

(Y2

w2

)− v

(Y1

w2

)> 0,

with IC1 (resp. IC2) corresponding to the left (resp. right) inequality sign.Given the social choice criterion, considering only the tax variables(T1,T2), the government maximizes −n1T1 − µn2T2, subject to thebudget constraint n1T1 + n2T2 = R and the above inequalities. Theredistributive objective has µ smaller than 1, pushing towards large T2’sand small T1’s.

Indeed, if IC2 were not binding, any positive dT2, with associateddT1 = −n2dT2/n1, would satisfy the budget constraint, increase theobjective by

n2(1− µ)dT2,

while reducing the slack in IC2.

The optimal tax schedule

The Lagrangian simplifies to:

L = n1[Y1 − T1 − v(

Y1

w1

)] + n2µ[Y2 − T2 − v

(Y2

w2

)]+

λ21

[Y2 − T2 − v

(Y2

w2

)− Y1 + T1 + v

(Y1

w2

)]+ π[n1T1 + n2T2 − R]

The first order condition in Y2 yields immediately the no distortion at thetop result:

v ′(Y2

w2

)= w2.

The optimal tax schedule: concluded

The Lagrangian is linear in the tax rates, and the corresponding FOCgives the multipliers

π =n1 + n2µ

n1 + n2λ21 = (1− µ)

n1n2n1 + n2

.

Finally the FOC with respect to Y1 gives

v ′(Y1

w1

)= w1

[1− (1− µ)

n2n1 + n2

(1− 1

w2v ′(Y1

w2

))].

Since Y2 > Y1, the last term in brackets is positive, and the labor supplyof type 1 is distorted downwards.

Continuum of types

Same setup as before except that there is an exogenous (discuss?)continuous distribution of abilities.

Separable utility function u(C )− v(Y ,w) (typically v(Y /w)): uincreasing concave, v increasing convex in Y , decreasing in w , vYw < 0(single crossing).

The tax payer maximizes u(Y − T (Y ))− v(Y ,w), which yields anindirect utility

U(w).

Envelope theorem: U ′(w) = −vw . First order condition1− T ′ = vY /u

′(C ).

Revelation principle and mechanism design approach:Y (w),C (w) = Y (w)− T (w) must be feasible and incentive compatible.

Incentive compatibility Rochet(1987) Chone-L(2010)

A necessary and sufficient condition for Y (w) to be associated with anincentive compatible allocation is that it be non decreasing in w . Also Uis differentiable a.e. and both U and C are non decreasing.

Incentive compatibility: for all w0,w1

u(C (w1))− v(Y (w1),w1) ≥ u(C (w0))− v(Y (w0),w1)

that isU(w1) ≥ U(w0) + v(Y (w0),w0)− v(Y (w0),w1),

also

v(Y (w1),w0)−v(Y (w1),w1) ≥ U(w1)−U(w0) ≥ v(Y (w0),w0)−v(Y (w0),w1).

Proof of IC lemma: necessity

v(Y (w1),w0)− v(Y (w1),w1)− v(Y (w0),w0) + v(Y (w0),w1) =

−∫ w1

w0

∫ Y (w1)

Y (w0)

vYwdY dw .

From (IC ), the left hand side must be non negative for all (w0,w1).From the assumption vYw < 0, the right hand side has the sign of(w1 − w0)× (Y (w1)− Y (w0)).

Proof of IC lemma: sufficiency

Given Y (w), define [U(w),C (w)] through

U ′(w) = −vw (Y (w),w),

u(C (w)) = U(w) + v(Y (w),w),

the levels being given by the government budget constraint.

By construction, for w1 > w0

U(w1)− U(w0) = −∫ w1

w0

vw (Y (x), x)dx

≥ −∫ w1

w0

vw (Y (w0), x)dx

= −v(Y0,w1) + v(Y0,w0).

Deriving the Lagrangian as a function of indirect utilityFirst getting Y :

U ′(w) = −vw (Y (w),w) implies Y (w) = η(U ′(w),w)

with

ηU′ = − 1

vwY (Y (w),w).

Second finding C :

U(w) = u(C (w))− v(Y (w),w) implies C (w) = γ[U(w),U ′(w),w ]

with

γU =1

u′(C (w))γU′ =

vY (Y (w),w)

u′(C (w))ηU′ .

The Lagrangian becomes

L =

∫ ∞0

{Ψ(U(w)) + λ [η(U ′(w),w)− γ[U(w),U ′(w),w ]]} dF (w).

First order condition

∆U ′(w) = δ small on [a, a + da], so that for all w larger than a + da

∆U(w) =

∫ a+da

a

∆U ′(x)dx = δda.

This variation is admissible if Y stays non decreasing after the change(no bunching). It induces a change in the Lagrangian equal to

∆L = δda

∫ ∞a

[Ψ′(U(w))− λγU ]dF (w) + λ[ηU′ − γU′ ]δf (a)da.

∆Lδda

=

∫ ∞a

[Ψ′(U(w))− λ

u′(C (w))

]dF (w)

− λ

vwY (Y (a), a)

[1− vY (Y (a), a)

u′(C (a))

]f (a).

The linear in consumption caseWe first go back to the usual cost of work function

v(Y ,w) = v(L) = v(Y /w)

vY =1

wv ′(Y

w

)= (1− T ′)

vYw = − 1

w2v ′ ×

[1 +

Y

w

v ′′

v ′

]= −(1− T ′)

1

w

(1 +

1

εL

)Second, we transform the first term, to ease interpretation. A uniformmarginal transfer is incentive compatible and gives∫ ∞

0

[Ψ′(U(w))u′(C (w))− λ]dF (w) = 0.

The multiplier λ is equal to the average of the social weights of all theagents. The average social weight of the agents S(a) of ability largerthan a is defined by

S(a) =1

1− F (a)

∫ ∞a

Ψ′(U(w))u′(C (w))dF (w),

so that S(0) = λ.

The formula in the quasi-linear case Diamond(1998)

Putting all together

λ

vwY (Y (a), a)T ′(Y (a))f (a) =

∫ ∞a

[Ψ′(U(w))− λ)]dF (w)

divided through by λ, becomes

T ′

1− T ′=

(1 +

1

εL

)1− F (a)

af (a)

(1− S(a)

S(0)

)

Properties of optimal taxation in the Mirrlees model

I The higher the elasticity of labor supply, the lower the optimal taxrates.

I The optimal marginal tax rate is higher when one is lower in thedistribution of productivities ([1− F (a)]/a decreases in a: the lowera, the more tax is collected on more productive individuals when oneincreases T ′ ). It is inversely proportional to f (a), the number ofindividuals that support the distortion.

I Social choices enter through S(a)/S(0), which is smaller than 1 anddecreasing in a in standard redistributive setups.

Properties of Mirrlees model, concluded

There are few general properties of the optimal tax scheme:

I T ′ is between zero and one: labor supply is always distorteddownwards for all a such that F (a) < 1, except perhaps at the lowerability a where F (a) = 0. Subsidies to low skilled workers (EITC)are ruled out.

I If the distribution of wages is bounded upwards, the labor supply ofthe richest agents is undistorted (no redistributive gain associatedwith the distortive effect of the tax).

Top tax rates

The right tail of the distribution of wages looks like a Pareto distribution,α > 1:

F (w) = 1−(w0

w

)αf (w) =

α

w

(w0

w

)α.

The average wage above some w is equal to

Wm(w) =

∫∞w

xf (x)

1− F (w)=

α

α− 1wwα0

wα

wα0

wα

=α

α− 1w .

On US data, the ratio Wm(w)/w appears to be stable for yearly wagesabove 100000$, and approximately equal to 2, which yields α = 2.

The formula reduces to

T ′

1− T ′=

(1 +

1

εL

)1

α

(1− S(a)

S(0)

)

Top tax rates: numerical application

Soak the rich: S(a) = 0.

Low labor supply elasticities: εL = .2.

T ′

1− T ′= 6× 1

2

T ′ = .75

The extensive modelIn the basic Mirrlees model, everybody works: participation is not aconcern.

In the extensive model, the labor supply decision in {0, 1} is either not towork, or to work full time (simpler than the continuous decision of theintensive setup).

Two (instead of one) dimensions of heterogeneity: productivity w asbefore, opportunity cost of work δ.

Government: it observes w for the workers; it never sees δ which isprivate information of the agents. Its instruments are R(w) = w − T (w)the non decreasing after tax income of the workers and s the subsistenceincome of the unemployed.

Here pecuniary cost of work (no income effects); also the worker bears allthe opportunity cost.Utility for a consumption level equal to C : u(C − δ) if the person works,u(C ) if she does not work.

Distributions F (w),G (δ|w). I shall assume that opportunity costs arenon negative, i.e. G has support on R+.

Labor supply

The typical agent compares income when at work net of opportunity costR(w)− δ with income out of work s.

The unemployment rate of agents of productivity w is

1− G (R(w)− s|w).

The tax collected on agents of productivity w is

[w − R(w)]G (R(w)− s|w) = TG (w − T − s|w).

It is zero for T = 0 and for T = w − s. The Laffer tax is the value of Twhich maximizes government income. It satisfies the FOC(interpretation: variation of income from the workers= distorsion):

G (w − T − s|w) = T g(w − T − s|w).

The optimal taxation program

The government objective is:

max

∫ [∫ R(w)−s

0

u(R(w)− δ)dG (δ|w) +

∫ ∞R(w)−s

u(s)dG (δ|w)

]dF (w).

The feasibility constraint is∫[w − R(w) + s]G (R(w)− s|w) dF (w) = s.

The Lagrangian of the problem for persons of ability w is

L(R, s,w) =

∫ R−s

0

u(R − δ)dG (δ|w) +

∫ ∞R−s

u(s)dG (δ|w)

+ λ [w − R + s]G (R − s|w)− λs.

First order condition in R

The derivative of the Lagrangian with respect to R at a no bunchingpoint is:

∂L(R, s,w)

∂R=

∫ R−s

0

u′(R − δ)dG (δ|w)

+ λ [w − R + s] g(R − s|w)− λG (R − s|w).

Average social weight of the workers of productivity w

SE (R, s,w) =1

G (R − s|w)

∫ R−s

0

u′(R − δ)dG (δ|w).

The participation tax rate is τ = [w − (R − s)]/w .

w−R+s = wτ(w) =G (R − s|w)

g(R − s|w)

[1− SE (R, s,w)

λ

]=

R

εR

[1− SE (R, s,w)

λ

]

Two well behaved tax schemes

-

6

ωδ ωm

R − s

δ

downward distortionwork subsidy

450 line

Properties of the pure extensive model

I When at a productivity level θ the average social weight of theworkers is larger than the marginal cost of public funds, their laborsupply is distorted upwards at the optimum.

I A lot of bunching (100% marginal tax rates).

I The disposable income function can take essentially any (nondecreasing) shape above the Laffer bound, for a suitable choice ofsocial weights.

lecture 3 [0.3cm] optimal income...

Documents