optimal feedback design - kelley school of business · concept of feedback policy and the very...

47
Optimal Feedback Design * Alex Smolin November 15, 2015 JOB MARKET PAPER Abstract I study a problem of optimal feedback provision. An agent of uncertain ability re- peatedly supplies effort and a principal provides feedback on his performance. Feedback is valuable but may discourage the agent and dampen his effort. I show that a policy, optimal for the principal among all persuasion mechanisms, is coarse yet tractable. In each period, the principal only informs the agent whether her current belief on his abil- ity is above a chosen cutoff. The agent is expected to exert effort above the cutoff and is threatened with no feedback for deviating. Depending on performance technology, optimal feedback can be provided through a tenure-track or performance standard pol- icy. With access to monetary incentives, even under moral hazard, the principal can extract full surplus by providing coarse feedback and backloading compensation into the future. These results alert to adverse effects of increasing performance transparency in organizations and may explain many features of professional contracts. Keywords: principal-agent, experimentation, persuasion, general learning JEL Codes: D82, D83, M52 * I thank Dirk Bergemann, Johannes Hörner, and Larry Samuelson for constant support and guidance throughout this project. This paper has benefited from helpful comments and suggestions from Florian Ederer, Chiara Margaria, and Anne-Katrin Roesler. I am grateful to Pierre-André Chiappori, Juan Dubra, William Fuchs, Sergei Izmalkov, Emir Kamenica, Nicolas Lambert, Sofia Moroni, Aniko Öry, Marzena Rostek, and Áron Tóbiás for insightful conversations. This is an ongoing project. The latest version of the draft can be found on my webpage. Department of Economics, Yale University, 28 Hillhouse Ave., New Haven, CT, 06520, USA, [email protected].

Upload: others

Post on 18-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Optimal Feedback Design∗

Alex Smolin†

November 15, 2015

JOB MARKET PAPER

Abstract

I study a problem of optimal feedback provision. An agent of uncertain ability re-peatedly supplies effort and a principal provides feedback on his performance. Feedbackis valuable but may discourage the agent and dampen his effort. I show that a policy,optimal for the principal among all persuasion mechanisms, is coarse yet tractable. Ineach period, the principal only informs the agent whether her current belief on his abil-ity is above a chosen cutoff. The agent is expected to exert effort above the cutoff andis threatened with no feedback for deviating. Depending on performance technology,optimal feedback can be provided through a tenure-track or performance standard pol-icy. With access to monetary incentives, even under moral hazard, the principal canextract full surplus by providing coarse feedback and backloading compensation intothe future. These results alert to adverse effects of increasing performance transparencyin organizations and may explain many features of professional contracts.

Keywords: principal-agent, experimentation, persuasion, general learningJEL Codes: D82, D83, M52

∗I thank Dirk Bergemann, Johannes Hörner, and Larry Samuelson for constant support and guidancethroughout this project. This paper has benefited from helpful comments and suggestions from FlorianEderer, Chiara Margaria, and Anne-Katrin Roesler. I am grateful to Pierre-André Chiappori, Juan Dubra,William Fuchs, Sergei Izmalkov, Emir Kamenica, Nicolas Lambert, Sofia Moroni, Aniko Öry, MarzenaRostek, and Áron Tóbiás for insightful conversations. This is an ongoing project. The latest version of thedraft can be found on my webpage.†Department of Economics, Yale University, 28 Hillhouse Ave., New Haven, CT, 06520, USA,

[email protected].

Page 2: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

1 Introduction

Evaluation of individual performance is an important part of organizational life. Althoughmuch evaluation is informal, most organizations have formal feedback policies designed tocollect and distribute performance information to employees.1 As communication and infor-mation technologies advance, many companies find it easier to provide more feedback. Asa recent example, in August 2015 General Electric (GE) announced an ongoing shift fromits legacy system of annual performance reviews to more frequent conversations betweenmanagers and employees via an online application.2 This way, GE joined other high-profilecompanies such as Microsoft, Accenture, and Adobe in a move towards more frequent, ex-haustive, and real-time feedback. However, whenever adopting new feedback policies thecompanies should ask: What is their effect on the overall performance? Would other feed-back policies do better? Ultimately, what feedback policy is the best for the company?

Optimal feedback provision must balance between two opposing forces—the employee’sneed for feedback and the damage to his self-esteem. On one hand, feedback is a valuableinput for employees future decisions. It provides additional incentives and results in greaterperformance transparency. On the other hand, any feedback may turn out negative. Theresulting discouragement and decreased in effort push towards less performance transparency.The organizational literature has long recognized the conflict between these two forces andBeer (1987) termed it central to the appraisal process. Despite its importance, this trade-offhas not been made precise and formally studied. The purpose of this paper is to fill thisgap—to formalize this trade-off and to characterize optimal feedback policies in a broadperformance framework.

Section 2 introduces the feedback model, building on the experimentation setting ofRothschild (1974). An agent of uncertain ability repeatedly decides whether to exert effort.High ability on average translates into high performance. Equivalently, consistently highperformance suggests high ability. The agent would like to know his performance to makeinference about his ability and adjust future effort.

The novelty of my model is that the agent does not observe performance himself. He mustrely on data or expertise of a principal who pursues her own interests. At the beginning of therelationship, the principal designs a feedback policy that governs when and what informationis revealed to the agent. The policy is modeled as a dynamic persuasion policy motivated byKamenica and Gentzkow (2011) and is not restricted in any way. As such, the principal could

1According to Murphy and Cleveland (1995), between 74% and 89% of business organizations had formalperformance appraisal policies by 1995.

2“GE’s Real-Time Performance Development”, Harvard Business Review, August 12, 2015,https://hbr.org/2015/08/ges-real-time-performance-development.

2

Page 3: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

provide full feedback, delay feedback, partially inform, or make future feedback dependenton past findings.

If the players’ interests coincide, then full feedback is optimal. In this case, as shownin Section 3, under full feedback both players prefer the agent to exert effort only if theirbeliefs are above an individually optimal cutoff that is the same for both players. However,the players’ interests may not align perfectly, as each may value performance and effortdifferently. In this case, their individually optimal cutoffs differ and they generally disagreeon the course of action under full feedback. Which feedback policy the principal shouldpursue is then not obvious, as the optimal policy should optimally trade off the individual’sneed for feedback and the players’ disagreement whenever their beliefs are between theindividually optimal cutoffs.

Section 4 studies and solves the feedback problem. The problem is complex with a richspace of possible policies and multiplicity of dynamic incentive constraints. Nevertheless, Ishow that an optimal feedback policy is tractable and coarse. In every period, the principalinforms the agent only whether her current belief about the agent’s ability is above somecutoff—that is, meets a standard. The agent is expected to exert effort only if above thecutoff and his deviations are threatened with no feedback. Under this policy, the principalcondenses rich performance signals into binary ones and maintains informational asymmetrybetween the players. Whenever meeting the standard, the agent deterministically becomesmore optimistic, whereas the principal’s belief stochastically evolves.

Two features of the optimal feedback policy are general. First, the policy is recommen-dation one—it does not provide any information besides that required for the agent to makedecisions. This feature follows from a dynamic version of the revelation principle as pro-viding more information could only complicate the agent’s incentives. Second, the principalpresents a uniformly maximal threat—no feedback—for agent’s deviations. This threat doesnot affect players’ payoffs but facilitates provision of the agent’s incentives.

At the same time, the exact constant-cutoff structure of the policy relies on the stationar-ity of the environment. Specifically, I show that in this setting all Pareto efficient payoffs canbe achieved by cutoff effort strategies. In fact, each can be found as a solution to a decisionproblem of a fictitious player with adjusted preferences. In particular, a cutoff strategy canachieve the payoff pair that maximizes the principal’s payoff subject to the agent’s individualrationality. This is the most the principal can possibly hope for in the relationship, and itcan be achieved with the proposed feedback policy. Indeed, under this policy, the agent’sincentive in the very first period coincides with his individual rationality. Incentives in allother periods follow: whenever meeting a standard the agent becomes even more optimisticwhile facing the same cutoff; the first time failing the standard, his belief drops sufficiently

3

Page 4: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

low to prevent any further effort.The information control turns out to be a powerful tool. If players’ interests sufficiently

align, then the principal achieves maximal feasible payoff and can implement the optimalfeedback policy even without commitment power. In general, however, commitment powerbenefits the principal, and I provide conditions under which no informative communicationcan be sustained in the absence of commitment.

The optimality of a constant-cutoff feedback policy is robust to many extensions of thebaseline model, as shown in Section 6. First, I consider the case of moral hazard, whenthe agent’s actions cannot be monitored. A cutoff policy remains optimal at least when theperformance comes as conclusive successes. In this case, a cutoff in belief corresponds to asingle revision time at which the principal reveals past performance to the agent. The cutoffneeds to be adjusted to induce effort at the beginning of the relationship and the agentis left with procrastination rents. In contrast, and perhaps surprisingly, the principal canextract full surplus if she can jointly design compensation and feedback policies, even if theagent is subject to limited liability. The optimal contract combines a cutoff feedback policywith pay-per-performance compensation, backloaded into the future to not interfere withinformation provision. The procrastination incentives do not arise since the agent’s beliefincreases over the career path, translating into a decreasing bonus schedule.

Further, I allow the players to openly disagree on the chances of high ability that, forexample, may capture agent’s overconfidence. A cutoff policy remains optimal with thecutoff adjusted to account for differing players’ beliefs. Finally, I show that the model canbe extended to account for advisory feedback. The advice guides the agent whenever heexerts effort and is purely allocative—it does not resolve any aggregate uncertainty. I showthat if the players’ interests coincide along the advice dimension then an optimal policycombines exhaustive advice with coarse feedback on performance.

My analysis has two major implications. First, it alerts to potentially adverse effectsof increasing performance transparency in organizations. As discussed in the first para-graph, due to technological advances, more and more companies provide frequent feedbackto their employees. However, as shown in Section 4, while optimal feedback is frequent, fullfeedback is almost never optimal. It suggests that managers should choose the contents oftheir feedback carefully so that the its decision value is not outweighed by discouraging theworkforce.

Second, it reveals that many observed features of professional contracts may facilitate op-timal feedback provision. If performance comes as infrequent conclusive outcomes, commonin academia and law firms, then an optimal policy can be implemented by a tenure-track con-tract with an appropriately chosen revision date. Similarly, if performance comes as frequent

4

Page 5: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

but noisy outcomes, arguably common to the majority of organizations, then an optimal pol-icy can be implemented by a standard on average performance. Finally, as discussed above, ifthe principal can use monetary incentives, then the compensation is optimally postponed. Itcomes as a “golden parachute,” a large bonus at career end, to not interfere with informationprovision while still providing incentives for effort.

Related Literature. This paper unites experimentation and persuasion strands of eco-nomic literature. Performance technology builds on the experimentation setting of Roth-schild (1974) with the novelty that the agent does not observe performance himself. Theconcept of feedback policy and the very design question is motivated by Kamenica andGentzkow (2011). I enrich their static persuasion setting to allow for features essential tothe study of feedback: dynamic relationship, endogenous information flow, and forward-looking agent.3 To the best of my knowledge, my paper is the first to combine these featuresand not restrict the class of information policies under consideration. Also, I do not restrictthe learning patterns that occur under full feedback while most dynamic persuasion litera-ture concentrated on specific learning technologies. In the next three paragraphs, I outlinerecent dynamic persuasion studies in greater detail.

Most of the dynamic persuasion literature considered the case of a myopic agent or,equivalently, a stream of short-lived agents. Renault et al. (2014) and Ely (2015) studieddynamic settings with exogenous information flow that does not depend on the actions ofthe agent. Renault et al. (2014) considered the case when the principal’s payoff does notdepend on her information and showed sometimes a “greedy” policy that maximizes theprincipal’s payoffs period-per-period is optimal. Ely (2015) studied a general payoff settingand characterized an optimal information policy.4 These authors built on concavificationtechniques used by Kamenica and Gentzkow (2011), which does not directly apply in mysetting.

Kremer et al. (2014) and Che and Hörner (2015) studied settings with endogenous infor-mation flow, motivated by a question of encouraging explorative consumption of a productof uncertain quality. Kremer et al. (2014) considered the case in which the uncertainty isrich, but learning instantaneous. They showed that an optimal policy is characterized bya sequence of increasing quality cutoffs above which the principal recommends to consumethe product. Che and Hörner (2015) studied the case in which the uncertainty is binary,

3Notable static generalizations include Bergemann and Morris (2013), who extended the analysis to staticgames with many agents, Alonso and Câmara (2015) to different prior beliefs, and Kolotilin et al. (2015) toa privately informed receiver. Ely et al. (2015) derived an optimal dynamic disclosure of static informationover time when the agent has explicit preferences over the evolution of his beliefs.

4He also studied the case of forward-looking agent if the principal’s information is binary in each period.

5

Page 6: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

but learning gradual. They showed that an optimal policy is characterized by a belief cutoffabove which the principal randomly recommends to consume the product. A cutoff structureremains optimal in my setting, even though the payoff structure is different and the agent isforward-looking.

Hansen (2013) and Hörner and Lambert (2015) studied career-concern settings withadditional restrictions on the class of information policies. Hansen (2013) studied a two-period model and concentrated on partition policies. Hörner and Lambert (2015) studieda continuous-time setting but concentrated on stationary Gaussian policies. Similar to mypaper, these studies highlighted that providing too much feedback may dampen the agent’sefforts.

From an applied perspective, my paper contributes to the economic literature on feedbackprovision. The main concern of this literature, as in my paper, was the feedback effect onthe agent’s self-confidence.5 Fang and Moscarini (2005) highlighted in a static model thatwage differentiation can implicitly provide negative feedback to workers and may optimallybe avoided. Ederer (2010) studied two-period tournaments and highlighted the dual role ofinterim feedback as informing about both ability and relative position in the tournament.Halac et al. (2015) studied a design of dynamic contests for innovation of uncertain feasibilityand showed that full feedback is not optimal. In contrast to my paper, most previous studiesin this literature restricted information policies to full or no disclosure per period.

Finally, my paper contributes to the growing literature on dynamic contracts withouttransfers. Guo (2015) studied dynamic delegation when the agent has private prior informa-tion with the player’s disagreement structure similar to my setting. Hörner and Guo (2015)studied dynamic resource allocation when the principal needs to elicit evolving private infor-mation from the agent. My paper shows that in the absence of monetary incentives, controlover the agent’s information can be as effective as control over his actions that respectsindividual rationality.

5See, however, Lizzeri et al. (2002), Orlov (2015), Goltsman and Mukherjee (2011), who studied howfeedback interacts with monetary incentive provision to the agent of known ability.

6

Page 7: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

2 Feedback Model

A principal (she) and an agent (he), i ∈ P,A, repeatedly interact at times t = 0, 1, 2, . . . .The agent is of uncertain ability, low or high, ω ∈ Ω , 0, 1, that is realized at the beginningand is fixed throughout the relationship. Prior information is symmetric and both playersattach probability p0 to the agent to be of high ability,

Pr (ω = 1) = p0.

Effort Strategy. In each period, the agent decides whether to exert effort, at ∈ A , 0, 1.His actions are observed by the principal. The agent’s effort strategy a = at∞t=0 is a mappingfrom all past messages and actions into a possibly random action:6

at :mt, at−1

→M (A) .

Denote by A the set of all possible effort strategies.7

Performance. The agent’s effort over time produce outcomes yt ∈ Y ⊆ R, which I willrefer to as performance. If the agent does not exert effort, then performance does not dependon ability and its average is normalized to zero. If the agent exerts effort, performance is dis-tributed independently and identically across periods with a distribution function dependingon ability. It is distributed according to F0 (yt) if the ability is low, and according to F1 (yt)

if the ability is high. The corresponding distributions

(F0 (yt) , F1 (yt)) ,

constitute a performance technology. A high-ability agent on average performs better thana low-ability agent, and I can further normalize the performance so that

E [yt | ω = 0] = 0, (1)

E [yt | ω = 1] = 1.

Exerting effort is hence informative about ability—a consistently high performance over timesuggests the ability is high. I do not restrict performance technology in any other way. First,

6I adopt a convention that for any stochastic process x its time-t realization is denoted by subscript xtand the history up to time t is denoted by superscript xt , xss≤t.

7The set A depends on the set of possible messages which is chosen by the principal. I will omit thedependence for the sake of exposition.

7

Page 8: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

supports of F0 (yt) and F1 (yt) do not need to coincide so some outcomes can perfectly revealthe agent’s ability. Second, the performance of a high-ability agent does not need to first-order stochastically dominate the performance of a low-ability one; thus, higher performancein a given period can suggest lower ability. In general, the learning pattern about the abilitywhenever the agent exerts effort can be very rich and is determined by the performancetechnology (F0 (yt) , F1 (yt)).

Feedback Policy. The performance is observed by the principal, but not the agent. Theprincipal can communicate performance to the agent through a feedback policy m = mt∞t=0

which she designs and commits to at the beginning of the relationship.8 This policy governswhen and what information is revealed to the agent. Formally, the feedback policy mapsthe principal’s private histories, which include past performance, actions, and messages, intopossibly random messages to be sent to the agent:

mt :yt−1, at−1,mt−1

→M (M) , (2)

where the choice ofm includes the choice of the message setM . I place no further restrictionson the class of feedback policies, so the principal can use any policy of the form (2). Denotethe set of all possible feedback policies byM.

The concept of a feedback policy is an extension of Kamenica and Gentzkow’s (2011)static persuasion policy and admits two possible interpretations. First, it can be viewed as adisclosure policy. In this interpretation, the principal constantly monitors the performancebut is bound to communicate according to the policy chosen at the beginning of the game.Second, it can be viewed as a sequence of public experiments. In this interpretation, theprincipal does not observe performance directly but commits to a sequential policy of publictests to inform both players on past performance. The choice of future experiments maydepend on the past findings. These two interpretations are equivalent in our setting sincethe principal has no private use for information.

A given feedback policy determines joint evolution of the player’s information. Thekey variables in the analysis are both players’ beliefs about the agent’s ability, or simplybeliefs, ptP and ptA. I assume both players are rational and have perfect recall. Wheneverthey obtain new information—observe performance or receive feedback—they update their

8Throughout the paper a “strategy” refers to an effort strategy chosen by the agent and a “policy” refersto a feedback policy chosen by the principal.

8

Page 9: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

beliefs according to Bayes’ rule and past observations:

ptP(yt−1, at−1

), Pr

(ω = 1 | yt−1, at−1

),

ptA(mt−1, at−1,m

), Pr

(ω = 1 | mt−1, at−1,m

).

The interpretation of a performance realization yt−1 depends on past actions. Performanceis informative about the ability only in periods in which the agent exerts effort. Similarly,interpretation of a feedback realization mt−1 depends on both past actions and the feedbackpolicym. A message is informative only about performance that affects its probability. Over-all, the evolution of beliefs can be very rich and depends jointly on performance technology,effort strategy, and feedback policy. In what follows, I omit the arguments of beliefs.

Payoffs. Players value performance and effort. Whenever the agent does not exert effort,the players receive flow payoffs cA ≥ 0 and cP ≥ 0. These payoffs are fixed throughout therelationship and can be interpreted as opportunity costs of the agent’s effort. The playersalso value performance at rates hA ≥ 0 and hP ≥ 0. For a given action at ∈ 0, 1, the flowpayoff of player i is9

athiyt + (1− at) ci, i = P,A.

The players maximize their expected payoffs discounted at a common rate δ ∈ (0, 1).Given a feedback policy m ∈M and an effort strategy a ∈ A, the players’ payoffs are

Ui (m, a) = (1− δ)E[∞∑t=0

δt (athiyt + (1− at) ci) | m, a], i = P,A, (3)

where the expectations condition on the probability laws induced by policy m and strategya. Note that the players’ preferences are determined by their value for performance relativeto the opportunity costs,

vi ,hici, i = P,A,

with the convention that vi =∞ if ci = 0. A conflict of interests arises between the playerswhenever vP 6= vA. The agent, for a given feedback policy m, chooses effort strategy a (m)

to maximize his total payoff UA (m, a). The principal anticipates the agent’s response andchooses the feedback policy to maximize her total payoffs UP (m, a (m)).

9Because performance affects payoffs, it follows that the agent does not observe his payoff. This is astandard assumption in communication literature and discussed in detail in Section 7.

9

Page 10: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

3 Individually Optimal Strategies

Consider the players’ individually optimal strategies—what each player would optimally doif they had full control over actions and could perfectly observe performance—that is, effortstrategies that solve

supa∈A

Ui (m, a) , i = P,A, (4)

s.t. m ≡ y.

This analysis constitutes a useful benchmark, highlights the players’ disagreement, and pro-vides intuition in the dynamic nature of the problem.

The problem (4) is a standard “bandit” problem. It is equivalent to a sequential choicebetween two arms of a bandit (slot) machine. The first arm is safe and gives known payoffci. The other arm is risky and gives random payoff hiyt. The problem is Markov in belief ptibecause it summarizes all payoff-relevant information available to the player. Define a cutoffstrategy as follows.

Definition 1. For a belief p ∈ [0, 1] and a probability α ∈ [0, 1], an effort strategy is calleda (p, α)-cutoff strategy if it prescribes to exert effort anywhere above p, with probability αat p, and to not exert effort below p,

Pr(acutt (p, α) = 1 | pti

)=

1, if pti > p,

α, if pti = p,

0, if pti < p.

The corresponding belief p is called a cutoff and α is called a probability at cutoff. Wheneverconvenient, I refer to a (p, α)-cutoff strategy strategy as a p-cutoff strategy or simply a cutoffstrategy.

If a player were myopic or, equivalently, would not value the future at all δ = 0, thenthe player would choose actions to maximize current flow payoffs. As a result, pMi -cutoffstrategy would be optimal where

pMi ,cihi

=1

vi, i = P,A,

is amyopic cutoff at which the expected flow payoff from exerting effort equals to opportunitycosts.

However, if a player is forward-looking δ > 0, then the player would generally like to10

Page 11: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

p0

p∗i

1

0 t

pi

Work

Shirk

Figure 1: Evolution of belief under an individually optimal strategy. The player exerts effortif her current belief is above a cutoff p∗i and does not exert if it is below. Whenever sheexerts effort, the belief stochastically evolves. Otherwise, the belief stands still.

exert more effort. The main idea is that exerting effort has an information value—the playerlearns about her ability and can use the knowledge to optimally adjust future actions. Infact, an optimal strategy still has a cutoff structure.

Lemma 1. (Individually Optimal Strategy)Consider a player who values performance at hi ≥ 0, has opportunity costs ci ≥ 0, and hi,ci are not both equal to zero. Then

1. There exists a unique belief p∗i ∈ [0, 1] such that any (p∗i , α)-cutoff strategy is optimalfor any α and prior belief p0.

2. The cutoff p∗i = 1 iff hi ≤ ci, and p∗i = 0 iff ci = 0.

3. If p∗i ∈ (0, 1) then it strictly decreases in vi.

Figure 1 illustrates the optimal cutoff strategy and presents a sample path of player’sbeliefs. Denote by ui the individually optimal payoff of player i.

At the optimal cutoff, the player is indifferent to exerting effort. Consequently, any ran-domization at the optimal cutoff does not change the total payoff. However, randomizationat a cutoff that is not optimal generally affects the player’s payoff. For example, if the play-ers’ interests do not perfectly align vP 6= vA, then, as α is varied, a (p∗P , α)-cutoff strategydelivers the same payoff to the principal but different payoffs to the agent. This change ofsurplus will be important in Section 4 when I derive an optimal feedback policy.

11

Page 12: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

p∗A

p∗P

1

0 t

p Agent Principal

a = 1 a = 1

a = 0 a = 1

a = 0 a = 0

p∗P

p∗A

1

0 t

p Agent Principal

a = 1 a = 1

a = 1 a = 0

a = 0 a = 0

Figure 2: Players’ disagreement on the effort strategies under full feedback. Players agreeon the effort in the white areas and disagree in the red shaded areas. Left: p∗P < p∗A, in thedisagreement region the principal would like the agent to exert effort but the agent wouldnot. Right: p∗P > p∗A, in the disagreement region the principal would like the agent to notexert effort but the agent would.

The potential conflict between players stems from their different relative values for per-formance since vP = hP/cP can differ from vA = hA/cA. The disagreement on a relativevalue of performance naturally translates into disagreement over the optimal cutoffs. Underfull feedback, the players would disagree on the effort strategy at the beliefs between p∗P andp∗A. If the principal values performance more than the agent does vP > vA, then her cutoffis lower p∗P < p∗A (Figure 2, left). In this case, the agent would exert less effort than optimalfor the principal. In turn, the principal would like to design feedback to motivate the agent.In contrast, if the principal values performance less than the agent does vP < vA, then herbelief cutoff is higher p∗P > p∗A (Figure 2, right). In this case, under full feedback the agentwould exert more effort than optimal for the principal, so the principal would like to designfeedback to demotivate the agent.

I allow for both kinds of disagreement. However, I assume that the principal does notvalue performance much less than the agent does. In particular, I assume that the principal’soptimal cutoff is below the agent’s myopic cutoff,

p∗P ≤ pMA . (5)

This condition allows the principal to prevent further effort whenever her beliefs are below p∗Pby disclosing past performance and offering no further feedback. This condition is triviallysatisfied whenever vP > vA since then p∗P ≤ p∗A ≤ pMA .

12

Page 13: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Finally, note that the disagreement matters only if the ability is not certain and thelearning is gradual. The players would agree on the course of action if the ability were known:they both prefer exerting effort if the agent is of high ability and not exerting otherwise.A principal who knew the ability at the beginning of the relationship would immediatelyreveal it to the agent. Similarly, if the performance technology and a prior belief are suchthat under full feedback the players’ beliefs cannot enter the disagreement region, then thefull feedback is trivially optimal.10 However, if under full feedback the belief may enter thedisagreement region, then the optimal feedback policy is not obvious.

4 Optimal Feedback

4.1 Optimal Feedback Problem

The principal cannot use monetary transfers as in mechanism design problems or directlycontrol the agent’s actions as in delegation problems. Instead, the principal controls informa-tion available to the agent through the feedback policy. She chooses the policy to maximizeher expected total payoffs taking into account the agent’s rationality and the misalignmentof their interests. The optimal feedback problem can be stated as

supm∈M,a∈A

UP (m, a) (6)

s.t. a ∈ arg maxa∈A

UA (m, a) (IC). (7)

Note that the single incentive constraint (7) fully captures sequential rationality of the agentbecause for any given feedback policy m the agent’s problem is time consistent.

I do not restrict the class of available feedback policies in any way, so its scope is verylarge. However, one can easily identify the “upper” and “lower” bounds of the policy in asense of Blackwell’s (1953) informativeness: full feedback and no feedback respectively,

• full feedback: m ≡ y,

• no feedback: m ≡ ∅.

At one extreme, the agent is fully informed about his performance under a full-feedbackpolicy. This is clearly the most informative feedback policy the principal can provide, sinceit discloses the finest performance details. Formally, any other policy can be viewed as somegarbling of a full-feedback policy. It maximizes the agent’s payoff but not necessarily the

10 This could happen, for example, if the performance in each period were either not informative or fullrevealing (cf. Section 5.1 when λ = γ).

13

Page 14: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

principal’s payoff. Under full feedback, the agent’s belief stochastically evolves based on hiseffort strategy and performance technology. The agent responds optimally and, as shown inLemma 1, works all the time his belief is above p∗A. The principal’s losses under full feedbackcome from too little or too much effort in the disagreement region between p∗A and p∗P .

At the other extreme, the agent receives no performance information under a no-feedbackpolicy. This is clearly the least informative feedback policy the principal can provide, sinceit discloses nothing. Formally, a no-feedback policy is some garbling of any other feed-back policy. It minimizes the agent’s payoff but not necessarily the principal’s payoff. Nofeedback is also the hardest punishment available to the principal at any point in the relation-ship. The punishment is relatively mild because it leaves the agent with his outside option,max hAptA, cA. Under the no-feedback policy, the agent’s belief stands still independentlyof his actions.

Between the two extremes lies a plethora of other feedback policies. Here are a fewexamples:

• delayed feedback: T ∈ N, mt = ∅ if t < T , mt = yt−T if t ≥ T ,

• random disclosure: ρ ∈ [0, 1], mt = yt−1 with probability ρ and ∅ with probability1− ρ,

• single revision: T ∈ N, mT = yT−1 and mt = ∅ for t 6= T .

The optimal feedback problem is very complex. First, the set of all feedback policies, asnoted above, is very rich. Messages can arbitrarily inform about past performance and,moreover, depend on past information revealed. Second, incentive constraints that need tobe checked after each history depend on the potentially complicated value of future feedback.Third, dynamic programming techniques are not much help because players’ beliefs in generaldo not suffice as the relevant state variables. For example, if the principal randomizesseveral feedback policies at the beginning, then the agent’s belief over the outcome of therandomization is also relevant for incentives. Finally, an optimal policy might not even existif the supremum in (6) is not achieved.

Nevertheless, I show that none of these concerns matter. An optimal policy exists. It isremarkably simple.

4.2 Optimal Feedback Policy

In the search for an optimal feedback policy, two observations substantially reduce the prob-lem complexity.

14

Page 15: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

First, the principal does not need to provide any information besides what is required forthe agent to make decisions. That is, without loss of generality, an optimal policy is a rec-ommendation policy—in every period, the principal’s message is a recommendation whetheror not to exert effort and the agent is incentivized to follow the recommendation, M = A

and a ≡ m. This observation is a consequence of the revelation principle Myerson (1986)extended from static to dynamic games. For any other feedback policy and induced effortstrategy, the principal can induce the same strategy by directly recommending the corre-sponding actions. Under the recommendation policy, the agent achieves the same payoffs iffollowing the recommendation and weakly less if disobeying.

Second, the principal should present maximal threats for disobedience. Since the agentnever disobeys, the threat does not affect the players’ payoffs. However, maximum threatsfacilitate the agent’s incentives to follow the recommendations. In my setting, the maximalthreat after any history is no feedback. Hence, without loss of generality an optimal policyprovides no feedback if the agent ever disobeyed, mt = ∅ if at−1 66= mt−1.

Theorem 1. (Optimal Feedback Policy)An optimal feedback policy is a cutoff policy with the threat of no feedback. There is a cutoffp∗ ∈ [0, 1] and the probability at cutoff α∗ ∈ [0, 1] such that the principal informs the agentonly whether her current beliefs are above or below the cutoff:

m∗t = acutt (p∗, α∗) (8)

if at−1 = mt−1 and m∗t = ∅ otherwise.

Under the optimal policy, the agent obeys the recommendations and follows the effortstrategy acut

t (p∗, α∗).The result is proven by showing that under the optimal policy (8) the principal achieves

her maximal feasible payoffs among those that satisfy agent’s individual rationality. First, Ishow that these payoffs, as well as any others on the Pareto frontier, can be obtained by acutoff strategy. To this end, I show that any extreme points of the frontier can be obtainedby solving a decision problem of a fictitious player with an adjusted relative performancevalue. Second, I show the agent obeys the recommendations and indeed follows acut

t (p∗, α∗).The agent’s incentive at time 0 coincides with individual rationality. His incentives to exerteffort in other periods follow as he becomes increasingly optimistic yet faces the same futurecutoff. Incentives to stop exerting effort are also satisfied, since in this case the agent learnsto be below a cutoff; hence sufficiently pessimistic. As a result, either the principal achievesmaximal feasible payoffs, in which case p∗ = p∗P , or the agent is left with outside option cA,in which case p∗P ≤ p∗ ≤ p∗A (see Figure 3).

15

Page 16: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

cA

uP

uA

uP

uA

(u∗A, u∗P )

cA

uP

uA

uP

uA

(u∗A, u∗P )

Figure 3: Efficient players’ payoffs and payoffs resulting under an optimal feedback policy.Left: the agent’s individual rationality is not binding, u∗A > cA. Right: the agent’s individualrationality is binding, u∗A = cA.

The principal optimally condenses rich performance signals into binary ones and main-tains informational asymmetry between the players. I postpone detailed discussion of theresults until Section 7.

4.3 Full Feedback

In the previous section, I derived one optimal feedback policy. Other feedback policies, inparticular full feedback, may be optimal as well. In this section, I derive necessary andsufficient conditions for this to happen.

Proposition 1. (Full Feedback)The full-feedback policy m ≡ y, is optimal if either

1. The prior belief is too low, p0 ≤ p∗A, or

2. Under full feedback, the agent always takes the same actions as the principal would—hisbelief enters the disagreement region between p∗A and p∗P with zero probability.

In any other cases, the full-feedback policy is not optimal.

If p0 ≤ p∗A, then there are two cases. If p0 < p∗A, then the agent never works under anyfeedback policy, so any policy—including full feedback—is optimal. If p0 = p∗A then underfull feedback the agent is indifferent to exerting effort in the first period. In this case, ifp∗P > p∗A, then full feedback is optimal if the agent never starts exerting effort; if p∗P ≤ p∗A,then full feedback is optimal if the agent exerts effort.

16

Page 17: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

If the belief cannot enter the disagreement region, the full-feedback policy simultaneouslymaximizes payoffs of both players and hence is optimal. This happens trivially if the dis-agreement region is empty, vA = vP (i.e., when the players’ interests perfectly align). Thiscan also happen if the region is not empty, but the learning pattern is specific. For example,if performance is conclusive with the failure rate higher than the success rate, γ ≥ λ, as insection 5.1, then possible belief values are 0 and anything in [p0, 1]. If p0 is greater than bothp∗A and p∗P , then under full feedback the belief does not enter the disagreement region.

In any other cases, the full-feedback policy is not optimal. In contrast, assume thefull-feedback policy were optimal. It follows from Theorem 1 that it must maximize theprincipal’s payoffs given the agent’s individual rationality. Since p0 > p∗A, under full feedback,the agent’s expected payoff is strictly greater than cA so the principal must achieve hermaximal feasible payoff, u∗P . However, by Lemma 1, under full feedback the agent stopsexerting effort at p∗A, not at p∗P . Because the agent’s belief enters the disagreement regionwith positive probability, the principal’s payoff differs from u∗P and leads to a contradiction.

5 Applications

In this section, I consider two distinct performance technologies common in practice andthe literature. Under the first technology, performance comes as infrequent but conclusiveoutcomes. Under the second technology, performance comes as frequent but noisy outcomes.

For ease of exposition, in this section I concentrate on a continuous-time limit of theanalysis.11 In particular, I assume the players interact on a time grid of size dt > 0,0, dt, 2dt, . . . . Accordingly, I scale the discount factor as δ = e−rdt, opportunity costsas cidt, and performance technology as (F0 (y, dt) , F1 (y, dt)). Then I study the limit as theinterval length dt goes to 0. Also, I drop the normalization µ0 = 0, µ1 = 1 but maintain theassumption µ0 < µ1.

5.1 Conclusive Performance

Consider the case of conclusive performance, in which the outcome in each period is either afailure, a success, or a “nothing.” Failures happen only if ability is low; hence, they indicatelow ability. Successes, happen only if ability is high; hence, they indicate high ability.Specifically, the possible performance outcomes are Y = −1, 0, 1 and the performance

11The tractability of continuous-time bandit models is well known. See, for example Bergemann andVälimäki (2008).

17

Page 18: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

technology (F0, F1) isy = −1 y = 0 y = 1

ω = 0 1− e−γdt e−γdt 0

ω = 1 0 e−λdt 1− e−λdt,

for some γ ≥ 0 and λ ≥ 0 with rows being conditional probabilities over outcomes. Atthe continuous-time limit, the conclusive outcomes arrive according to Poisson processes. Ifthe low-ability agent exerts effort, then failures arrive at a rate γ. If the high-ability agentexerts effort, then successes arrive at a rate λ. If the agent does not exert effort, nothinghappens. This corresponds to a continuous-time model of exponential bandits employed byKeller et al. (2005), Che and Hörner (2015), and many others.

This performance technology results in the following evolution of the principal’s beliefsif the agent exerts effort. Whenever a conclusive outcome arrives, yt = 1 or yt = −1, theprincipal updates her beliefs to ptP = 1 or to ptP = 0 respectively. If no conclusive outcomesarrive, yt = 0, which happens most of the time, the evolution of beliefs depends on therelative size of arrival rates. If λ > γ, then the absence of conclusive outcomes is interpretedas “bad” news—the belief drifts down. If λ < γ, then the absence of conclusive outcomes isinterpreted as “good” news—the belief drifts up. If the rates coincide, λ = γ, the absence ofconclusive outcomes provides no new information—the belief stands still. In the limit, thebelief along the inconclusive path continuously evolves according to the differential equation,

dptP = − (λ− γ) ptP (1− ptP ) dt.

For ease of exposition in what follows, assume cP = 0 and cA > 0. The principal prefersthe agent to exert effort all the time, whereas the agent would like to exert effort only ifsufficiently optimistic, p∗A > p∗P = 0. Under this assumption, two cases are immediate. If theagent is initially very optimistic, p0 > pMA , then a no-feedback policy is optimal. It inducesthe agent to exert effort all the time, which is the principal’s individually optimal strategy.If p0 < p∗A, then under any feedback policy, even full feedback, the agent never exerts effort.In what follows, I concentrate on the case of intermediate prior beliefs, p0 ∈

(p∗A, p

MA

).

An optimal feedback policy is a cutoff policy as shown in Theorem 1. The qualitativefeatures of the optimal policy depend on the relative size of success and failure rates. Ifλ < γ, then full feedback is optimal. That corresponds to p∗ = 0, α∗ = 0 and the principalachieves maximal feasible payoffs. Indeed, in this case, the belief never enters the interiorof the disagreement region (p∗P , p

∗A), so Proposition 1 applies. If λ > γ, then a cutoff policy

is optimal that, at the limit, does not involve randomization at the cutoff, so α∗ = 0. Sincein the absence of conclusive outcomes the principal’s belief drifts down, the belief cutoff

18

Page 19: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

corresponds to a time cutoff. The cutoff feedback policy hence can be implemented by arevision policy that always discloses failures and provides feedback on successes at the singlerevision time. Formally:

Definition 2. For any time T ∈ [dt, 2dt . . . ,∞] a feedback policy is called a T -revisionpolicy if it always discloses failures and reveals past history of successes at a single time T :

mrevt =

yt−dt, if t = T,

−1, if yt−dt = −1,

∅, o/w.

The corresponding time T is called a revision time.

Define the T -revision strategy, arev (T ) as a strategy that prescribes exerting effort beforeT if a failure was not observed and continuing to exert effort after T only if a success occurred.Denote the corresponding players’ payoffs from a T -revision effort strategy by

U revi (T ) , Ui (y, a

rev (T )) , i = P,A. (9)

These payoffs are explicitly calculated in the appendix. The principal’s payoff U revP (T ) is

increasing in T because it results in more expected effort from the agent. The agent’s payoffU revA (T ), in contrast, is single-peaked with the maximizer being the time by which the belief

drops from p0 to the agent-optimal cutoff p∗A under full-feedback policy in the absence ofconclusive outcomes. Denote by T ∗i the revision time that induces the player’s i individuallyoptimal strategy. By the arguments above an optimal feedback policy is a revision policy.12

Proposition 2. (Conclusive Performance)If performance is conclusive, then an optimal feedback policy is a T ∗-revision policy with athreat of no feedback. There is a revision time, T ∗ ∈ [0,∞) such that

m∗t = mrev (T ∗)

if at−1 ≡ 1 and m∗t = ∅ otherwise.Under this feedback policy, the agent starts exerting effort and stops whenever observing

a failure or reaching T ∗ with no successes. The optimal revision time T ∗ is the latest timethat satisfies agent’s individual rationality, U rev

A (T ∗) = cA.12Bimpikis and Drakopoulos (2014) studied a team of many agents working on the same project with

performance coming as successes, λ > 0, γ = 0. They showed that a revision feedback policy improves uponfull feedback. My findings show that, at least in the case of a single agent, a revision feedback policy isoptimal among all possible policies.

19

Page 20: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

T ∗

p0

p∗

1

0 t

pP

T ∗

p0

p∗

1

0 t

pA

Figure 4: Conclusive performance technology with a success rate greater than a failurerate, λ > γ > 0. Evolution of players’ beliefs under an optimal feedback policy along theperformance path of no conclusive outcomes.

Figure 4 illustrates the evolution of players’ beliefs under the optimal feedback policy. Itpresents a sample belief path with no conclusive outcomes when success rate is higher thana failure rate, λ > γ > 0. The principal’s belief drift down since λ > γ. The agent’s belief,in contrast, drifts up before the time T ∗ since γ > 0. The agent does not observe successes,so the only feedback received is that no failures occurred. Thus, the agent continues toexert effort and becomes increasingly more optimistic. At revision time, however, when heobserves no successes, his belief plummets and he never again exerts effort.

5.2 Inconclusive Performance

Now, consider the case of inconclusive performance in which no outcome perfectly revealsthe agent’s ability. In particular, the outcomes are normally distributed with a mean thatdepends on the agent’s ability. The possible performance outcomes are Y = R and theperformance technology (F0, F1) is

F0 ∼ Φ(µ0dt, σ

2dt),

F1 ∼ Φ(µ1dt, σ

2dt),

where µ0 < µ1, σ2 > 0, and Φ (µ, σ2) is a Gaussian distribution function with the mean µand variance σ2. This performance technology corresponds to the Brownian motion modelused by Bolton and Harris (1999).

Under this performance technology, the sufficient statistic for principal’s belief at time t

20

Page 21: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

is the average of past performance

yt ,1

t

t−1∑s=0

ys.

All performance paths yt that lead to the same average performance yt result in the samebelief. If the agent exerts effort in all periods before a calendar time t, this average isdistributed according to a Gaussian distribution, yt ∼ N (µω, σ

2/t) . By Bayes’ rule, anybelief cutoff p corresponds to a cutoff in the average performance yt,

yt (p) = µ+ y0 (p)1

t,

where µ , µ1+µ02

and y0 (p) , σ2

µ1−µ0 ln(

1−p0p0

p1−p

). For a given belief cutoff, the performance

cutoff converges to the average of the conditional means over time. Intuitively, in the longrun the principal’s beliefs are very sensitive to small changes around the average. For agiven time, however, the performance cutoff increases in the belief cutoff. In particular,limp→0 y0 (p) = −∞, limp→1 y0 (p) = +∞, y0 (p0) = 0. In other words, the principal’sbelief increases in the average performance, being equal to the prior whenever the averageperformance yt hits the average of conditional means µ. Denote the individually optimalperformance standards corresponding to the individually optimal belief cutoffs by y∗0i,

y∗0i , y0 (p∗i ) , i = P,A.

Note that limt→0 ytt = y0 so y0 corresponds to the stopping boundary on total performanceat the very beginning of the relationship.

Definition 3. For any performance outcome y0 ∈ R, a feedback policy is called a y0-standardpolicy if it informs the agent only whether or not his average performance is greater than y0,

mstat (y0) =

1, if yt > µ+ y01t,

0, if yt ≤ µ+ y01t,

The corresponding y0 is referred to as a starting standard.The following proposition is a straightforward consequence of Theorem 1.

21

Page 22: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

p0

p∗

1

0 t

pP

p0

p∗

1

0 t

pA

Figure 5: Inconclusive performance technology. Evolution of players’ beliefs under an optimalfeedback policy along a sample performance path.

Proposition 3. (Inconclusive Performance)If performance is inconclusive, then an optimal feedback policy is a y∗0-standard policy witha threat of no feedback. There is a starting standard, y∗0 ∈ R, such that

m∗t = msta (y∗0) ,

if at−1 = mt−1 and m∗t = ∅ otherwise.

As before, under this policy the agent exerts effort if and only if he is told to perform onaverage above a standard, µ+y∗0

1t. If the agent ever exerts effort, p∗ < p0, then y∗0 < 0, so the

standard increases over time that reflects increasing precision of the average performance.The optimal starting standard y∗0 maximizes the principal’s payoff given the agent’s individualrationality. Either the principal achieves her maximal feasible payoffs, in which case y∗0 = y∗0P ,or the agent is left with payoff cA, in which case y∗0P ≤ y∗0 ≤ y∗0A.

Figure 5 illustrates the evolution of the players’ beliefs along the sample path under theoptimal feedback policy. The principal’s belief stochastically evolves in small steps due tothe Gaussian noise. The agent’s belief, in contrast, drifts up whenever above a standard.He does not observe fine performance details, so his belief deterministically increases and hekeeps exerting effort. Whenever the principal’s belief hits the standard, however, she informsthe agent. His belief then plummets to the standard and he does not exert effort anymore.

22

Page 23: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

6 Extensions

6.1 Moral Hazard

Until now, I have assumed that the principal can monitor the agent’s effort. This is a plau-sible assumption in many economic settings. It simplifies the agent’s incentive constraintand allows me to focus solely on the tradeoff between the need for feedback and the dis-couragement effect. In other economic settings, however, the agent’s efforts are difficult tomonitor. In these cases, the feedback policy cannot condition on past actions. Thus, itsoriginal definition must be modified into13

mt :yt−1,mt−1

→M (M) , (10)

The agent’s incentive constraints in this case are potentially much more complicated.The agent’s best response to a given feedback policy can be very rich with potential gapsin effort. Whenever deviating from the expected equilibrium strategy, the agent’s belief candiffer from what the principal expects. These issues, known to be challenging, have beenextensively studied in the literature of dynamic contracts with learning and moral hazard.

In what follows I first consider the case without monetary transfers, as in the mainmodel. I show that under specific performance technology, the cutoff feedback policy remainsoptimal, although the cutoff must be adjusted. Second, I introduce monetary transfers intothe model. Perhaps surprisingly, I show in this case that the principal can extract fullsurplus—she can implement the efficient strategy and leave the agent with zero rents.

6.1.1 No Monetary Transfers

Consider the case without monetary transfers, as in the main model. To proceed, I narrowthe performance technology to conclusive successes and concentrate on the continuous timelimit, equivalent to the setup in Section 5.1 with γ = 0. Under this technology, the outcomein each period is binary—with or without success—with success indicating a high ability.Specifically, the possible performance outcomes are Y = 0, 1 and performance technology(F0, F1) is

y = 0 y = 1

ω = 0 1 0

ω = 1 e−λdt 1− e−λdt.

13This definition does not allow screening the agent by offering a contingent menu of feedback policies.Since the agent might privately randomize his actions and thus accumulate private information, this is apriori with loss of generality. However, one can use the revelation principle to show an optimal feedbackpolicy does not require such screening.

23

Page 24: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Restriction to this technology reduces the set of possible performance paths and simplifiesthe incentive provision problem. A cutoff feedback policy remains optimal in this case.

Proposition 4. (Moral Hazard without Transfers)An optimal feedback policy is a cutoff policy. There is a cutoff, p∗MH ∈ [0, 1], such that

m∗t =

1, if ptP > p∗MH ,

0, if ptP ≤ p∗MH .

Under the optimal policy, the agent exerts effort only if told to be above the cutoff. Asshown in Section 5.1, in this case the cutoff policy equates to a deterministic revision policy.

Corollary 1. An optimal feedback policy is a revision policy. There is a revision timeT ∗MH ≥ 0 such that

m∗t =

yt−dt, if t = T ∗MH ,

∅, if t 6= T ∗MH .

The optimal belief cutoff p∗MH and the corresponding revision time T ∗MH maximize theprincipal’s payoff given that the agent does not want to one-shot deviate at time 0. Thatis, he does not want to not exert effort at time zero while exerting effort at all other timesbefore the revision. This incentive constraint is conveniently illustrated in terms of revisiontime. It is equivalent to the revision time T being less than T , which is the unique solutionto the equation

U revA

(T)

= c− U rev′A

(T)

r, (11)

where is U revA (T ) is defined in (9). The intuition behind equation (11) is simple. If the

agent follows the expected strategy, then he obtains a payoff U revA

(T). If the agent one-shot

deviates at time 0, then he obtains an immediate-flow payoff of rcAdt and faces a marginallycloser revision time in the continuation game. At the maximal time T , the agent is indifferentto these two options, and equation (11) follows. The optimal revision time T ∗MH then can befound as a minimum between T ∗P and T .

Note how moral hazard constrains the principal. If the principal set the revision timegreater than T , then she would face a procrastination problem. Instead of exerting effortall the time before the revision, the agent would procrastinate at the beginning and startexerting effort exactly T periods before the revision. As a result, the agent enjoys positive“procrastination” rents. In contrast, I show in the next section that the principal can extractfull surplus if she can use monetary transfers.

24

Page 25: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

6.1.2 Monetary Transfers

Until this point, I have considered a fixed-compensation scheme and concentrated on feedbackpolicy as the only incentive tool available to the principal. This section considers an integralapproach in which the principal can affect the agent through both feedback and bonuspolicies. The bonuses are one-to-one transfers from the principal to the agent, paid at thebeginning of a period, and can depend arbitrarily on the principal’s history. To avoid trivialsolutions, I assume the agent is subject to limited liability and so I do not allow negativebonuses.14 Define the corresponding bonus scheme b as

bt :yt−1,mt

→ R+. (12)

Note that the bonuses themselves have informational content whenever linked to perfor-mance (cf. Fang and Moscarini (2005)). However, the principal has full commitment powerand players are risk neutral and discount future at the same rate. Thus, bonuses can bebackloaded to not interfere with information provision (cf. Fuchs (2007)). In this case, eachbonus should be interpreted as an additional payment promised to the agent at the end ofthe relationship, such as when the principal asks the agent to permanently stop exertingeffort or, under the corresponding interpretation of a discount factor, when the relationshipexogenously breaks. Denote the (random) ending time of the relationship by τ . The bonuspolicy (12) and the feedback policy (10) constitute a contract,

(b,m) .

It is instrumental to analyze an efficient effort strategy policy—the strategy that maxi-mizes the sum of players’ payoffs,

maxa∈A

UA (y,m) + UP (y,m) .

By the same arguments as in Section 4, this maximization problem can be viewed as adecision problem of a fictitious player with value for performance hA + hP and value forshirking cA + cP . Hence, by Lemma 1, an efficient strategy is a cutoff strategy. Denote thecorresponding efficient belief cutoff by p so an efficient strategy is

a = acut (p) ,

14Otherwise, the principal could extract full surplus simply by “selling the firm to the agent” with a promiseof full feedback.

25

Page 26: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

where randomization at the cutoff is omitted. For ease of exposition in what follows, I assumethat the agent has no direct value for performance, hA = 0. That is, all incentives to exerteffort come from the contract offered by the principal.

Under full feedback, the principal cannot generally extract full surplus in the relation-ship. The agent must be left with some “procrastination” rents (see, for example, Berge-mann and Hege (1998) and Moroni (2015)). Primarily, after negative feedback, the agentmust be promised increased compensation to induce effort. Consequently, the agent can“procrastinate” and not exert effort, prompting the negative feedback and hence increasedcompensation. In this case, the agent will be more optimistic than the principal expects andwill earn positive rents. Nevertheless, and perhaps surprisingly, the principal can still extractfull surplus if she controls the performance information as the following theorem shows.

Theorem 2. (Moral Hazard with Transfers)If the principal does not observe the agent’s effort and can commit to a bonus policy, then anoptimal contract is as follows. An optimal feedback policy is a cutoff policy. The principalinforms the agent whether her beliefs are above or below the efficient cutoff belief:

m∗t =

1, if ptP > p,

0, if ptP ≤ p.

An optimal bonus policy is “pay-per-performance” bonuses backloaded to the end of the rela-tionship. The agent is compensated as

b∗t =cAptA

yt,

coming as a total bonus Bτ =∑τ

t=0 b∗t er(τ−t) at the first time τ the agent fails the standard,

m∗τ = 0.

Under this scheme, the agent exerts effort only if told to be above the cutoff. The principalextracts full surplus. The scheme implements the efficient-effort strategy and leaves the agentwith zero rents (his payoff is cA). The main idea is that under the optimal feedback policythe agent’s belief always increases, which translates to a deterministically decreasing bonusschedule over time. The procrastination incentives do not arise. If the agent follows theexpected-effort strategy, then he expects payoff cA. If the agent ever deviated, then hisbeliefs would be lower and the bonus schedule would be insufficient to compensate for theeffort. Consequently, the most profitable deviation is to never exert effort. That, however,delivers the agent payoff cA, the same as under the expected play.

This result provides insights into the common use of end-of-career bonuses, so called26

Page 27: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

“golden parachutes.” These payments may capture all past bonuses, backloaded to not inter-fere with optimal provision of incentives.

6.2 No Commitment

So far, I assumed the principal had full commitment power. In this section, I relax the com-mitment assumption and consider the opposite extreme. I assume the principal’s messagesare “cheap talk” and analyze perfect Bayesian equilibria of the dynamic game. The equi-librium feedback policy must be sequentially optimal for the principal and the equilibriumeffort strategy must be sequentially optimal for the agent.

A no-feedback policy, babbling, is an equilibrium policy. Indeed, if the principal sendsthe same message irrespective of performance, then the agent should ignore the message andstick to his myopic best response. At the same time, if the agent ignores the principal’smessages, then the principal may as well always send the same message.

The existence of noninformative equilibria is common for cheap-talk games. The challengeis to study informative equilibria. I do not attempt to characterize all equilibria of the game.Rather, in the spirit of analysis, I concentrate on equilibria that deliver maximal payoffs tothe principal and show they crucially depend on misalignment of the players’ preferences. Iderive the results for cases when the players’ interests either sufficiently align or extremelymisalign.

Proposition 5. (No Commitment)If the principal cannot commit to her messages, then the following is true.

1. If misalignment of players’ interests is so low that the principal achieves her maximalfeasible payoffs in the game with commitment, p∗ = p∗P , then the optimal cutoff policyand induced effort strategy constitute an equilibrium in the game with no commitment.

2. If misalignment of players’ interests is extremely high, p∗A > p∗P = 0, and no out-come can indicate a low ability—that is, the principals belief cannot reach 0, then noinformative communication can be sustained in equilibrium.

If the misalignment of interests is not too high, then the principal can achieve maxi-mal feasible payoffs. Since no feedback—babbling—is still a valid equilibrium threat, theprincipal can implement the optimal feedback policy of the game with commitment.

If the misalignment of interests is severe, cA > cP = 0, or, equivalently, p∗A > p∗P = 0,then no informative equilibria exist and the unique outcome of the game is babbling. Thedriving force for this result is that the principal strictly prefers the agent to work after allpossible histories. Consequently, to make communication credible, the agent must not exert

27

Page 28: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

effort with some probability whenever his belief increases. For this to occur, the principalmust promise some information in the future. This promise would result in a further splitin belief. I follow the path of an ever-increasing agent’s belief and show that this sequencecannot go on forever. It leads to contradiction and proves no credible communication inequilibrium.

6.3 Belief Disagreement

Until now, I have assumed the principal and the agent have common prior beliefs about theagent’s ability, p0. Hence, the only disagreement was on the relative value of performance.However, studies routinely found apparent belief disagreement, usually overconfidence ofthe agent (see Benoît and Dubra (2011), and references therein). This section introducesopen disagreement—players have different prior beliefs about the agent’s ability, p0A andp0P . These priors are commonly known, so the players do not have any private informationbefore the relationship begins. The belief disagreement simply captures potential over- orunder-confidence of either player.15

Given their priors and a feedback policy, players continue to update their beliefs accordingto Bayes’ rule. The immediate consequence is that the players’ beliefs remain in a one-to-onecorrespondence (cf. Alonso and Câmara (2015)),

1− ptAptA

p0A

1− p0A

≡ 1− ptPptP

p0P

1− p0P

. (13)

Consequently, a cutoff in the principal’s belief corresponds to a cutoff in the agent’s be-lief. Not surprisingly, applying arguments similar to those in Section 4 shows that a cutofffeedback policy remains optimal.

Proposition 6. (Belief Disagreement)If the players have different prior beliefs, then an optimal feedback policy is a cutoff policywith a threat of no feedback. There is a cutoff, p∗BD ∈ [0, 1], and the probability at the cutoffα∗ ∈ [0, 1] such that

m∗t = acut (p∗BD, α∗)

if at−1 = mt−1 and m∗t = ∅ otherwise.

Under this policy, the agent exerts effort only if told to be above the cutoff. The optimalcutoff p∗ and probability α∗ maximize the principal’s (subjective) payoff given the agent’s

15The true probability of high ability does not matter for the analysis. The principal designs a feedbackpolicy according to her subjective belief. The agent interprets the feedback and chooses his effort strategyaccording to his subjective belief.

28

Page 29: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

individual rationality. Either the principal achieves her maximal feasible payoffs, in whichcase p∗ = p∗P , or the agent’s individual rationality binds, in which case p∗P ≤ p∗ ≤ p∗A.

Consequently, the greater the agent’s overconfidence, the greater leverage the principalhas. Thus, the principal’s optimal payoffs increase in p0A as long as the principal can induceno effort when her beliefs are below p∗P , that is if the analog of (5) holds.

6.4 Advisory Feedback

Throughout the paper, I concentrated on evaluation—feedback through which the principalinforms the agent about his performance. The corresponding allocation of effort is verti-cal—whether or not to exert effort. Another important kind of feedback is advice—feedbackthat guides the agent’s actions. For example, a professor advises a student on how to improvea project and a manager guides an employee to optimally allocate time. The correspondingallocation of effort is horizontal—how to allocate the effort exerted. In this section, I extendthe main model to allow for both kinds of feedback. I show that if the players’ interestsalign in the horizontal dimension, which is arguably common in practice, then the principaloptimally provides exhaustive advice and coarse evaluation.

In particular, I extend the action space to allow for richer allocation of effort. The agent’saction in each period now consists of two components,

(at, kt) ,

where at ∈ A = 0, 1 is the decision of whether or not to exert effort, and kt ∈ K isallocation of effort if exerted. I introduce a process θ that captures an allocative uncertaintywith states θt ∈ Θ being identically distributed over time, independent of past history or theagent’s ability. The states do not change the performance technology but the performancevalue is now a function of the effort allocation and the state,

hi (kt, θt) , i = P,A.

The agent does not observe the state, but the principal does and can communicate it, aswell as performance, to the agent. The state θt is realized at the beginning of each periodand can be immediately revealed to guide the agent’s effort allocation. The definition of thefeedback policy is appropriately modified as

mt :yt−1, at−1,mt−1, θt

→M (M) . (14)

29

Page 30: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

For a given effort strategy (a, k) and a given feedback policy m the payoff of player i is

Ui (m, a) = (1− δ)E[∞∑t=0

δt (athi (kt, θt) yt + (1− at) ci) | m, a, k], i = P,A.

Importantly, I make two assumptions about the players’ preferences. First, I assume nodisagreement between the players on the allocation of effort if exerted. A unique effortmaximizes performance values for both players,

arg maxkt∈K

hA (kt, θt) = arg maxkt∈K

hP (kt, θt) ∀ θt ∈ Θ.

Second, I assume no aggregate uncertainty in effort allocation. Maximal performance valuesdo not depend on the state. They are normalized to the values in the main section,16

maxk∈K

hA (k, θ) = hA, (15)

maxk∈K

hP (k, θ) = hP .

In this case, I show that an optimal feedback policy provides exhaustive advice to perfectlyinform the agent about the allocative uncertainty θt. Intuitively, since the players’ interestsalign along the horizontal dimension, it is in the principal’s best interest to guide the agent.Formally, the Pareto frontier is obtained with exhaustive advice, and the problem reduces tothe one in Section 4. The same arguments then apply. The following theorem summarizesthe findings.

Theorem 3. (Advisory Feedback)An optimal feedback policy consists of two components m∗ =

(m∗θ,m

∗y

). The first component

provides exhaustive advice on the allocative uncertainty and the second component providesoptimal evaluation. There is a cutoff, p∗ ∈ [0, 1], and the probability at cutoff α∗ ∈ [0, 1] suchthat

m∗θt = θt,

m∗yt = acutt (p∗, α∗) ,

if at−1 = m∗t−1 and m∗t = ∅ otherwise.

Given the normalization (15), the optimal cutoff p∗ is the same as in Theorem 1.16An extreme example of the payoff structure is one in which performance is valuable only if the effort

matches the state, K = Θ, hA (kt, θt) = hA× I (kt = θt), hP (kt, θt) = hP × I (kt = θt). At the same time, thepayoff structure allows for a richer structure of losses from a mismatch between effort allocation and states.

30

Page 31: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

7 Discussion

7.1 Discussion of the Model

Payoff Structure. I assume the preference of a player i, i = P,A, has a linear andstationary structure and the effort and ability complement each other. The player values aunit of performance at hi and the effort at ci. This simple structure, coupled with the binaryability, gives the model its tractability. To understand suitability of this payoff structure ina particular application, it is useful to see what tradeoffs it entails. At belief pti, the playerfaces the flow trade-off between exerting effort or not as

hi︸︷︷︸performance

value

× pti︸︷︷︸belief

vs. ci︸︷︷︸opportunity

cost

.

The first term, hi, is the performance value and can be seen as player engagement. It isfixed throughout the game and can be interpreted in at least two ways. In a contractualenvironment such as professional firms, this engagement may come from a promise of futurepay-for-performance bonuses, postponed and properly undiscounted. In a noncontractualenvironment such as education, it may capture the agent’s internal motivation to perform.The second term, pti, is a private belief of the player and can be seen as the player’s con-fidence in the agent’s ability. It is complementary to engagement and is updated based onthe incoming information. If the information is encouraging, then the belief increases andexerting effort become more valuable for the player. If the information is discouraging, thenthe belief decreases and exerting effort becomes less valuable for the player. The third termci is the opportunity costs of effort. It captures the player’s outside option and provides abenchmark to which she compares her prospects whenever deciding to exert effort.

Exerting effort has also an informational value reflected in the difference between a myopicand an individually optimal cutoff. However, for any given belief, the trade-off of the flowpayoff remains the same, as illustrated by the fact that payoffs can be equivalently writtenwith beliefs instead of performance, (cf. (3))

Ui (m, a) = (1− δ)E[∞∑t=0

δt (athipti + (1− at) ci) | m, a]. (16)

While appropriate in many circumstances, this payoff structure does not capture nonlin-ear environments, such as many effort levels with convex costs or if performance is rewardedonly when a target is reached. Also, it does not include nonstationary settings, such as ifthe game horizon is finite or performance brings at most one success. Finally, it is not suited

31

Page 32: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

for settings in which effort and ability are substitutes for each other—that is, when greateroptimism translates to fewer incentives to exert effort. Even though my analysis may pro-vide some insights in these settings, further investigation is required to understand optimalfeedback provision there.

Monitoring Structure. The novelty of this paper is that the performance can be readilyobserved by the principal but not the agent. This informational asymmetry is essential forthe study of feedback, otherwise the principal would have no new information to share. Theasymmetry can be caused by many factors including the principal’s superior expertise ordata in evaluating the agent’s performance. For ease of exposition, this paper brings thisasymmetry to the extreme in which the agent does not observe the performance at all. Thisassumption may work reasonably well in environments in which the agent lacks expertise,such as internships, or in which the principal has superior data, such as law firms. It may beless applicable in environments with objective and transparent performance measures, suchas in stock trading companies.

Another assumption maintained in the main section is that the principal perfectly ob-serves the agent’s actions. This assumption allowed simplifying the incentive constraintsand focusing on the question of optimal feedback provision. It is reasonable in many cases.Moreover, as some argued, technological advances have made such monitoring particularlyeasy.17 However, I recognize there are many cases in which observability of action is notrealistic and I provide a detailed analysis of the moral hazard case in Section 6.1.

Feedback Structure. I model feedback as an information policy (2) to which the principalcommits at the beginning. The commitment assumption can be maintained in at least twoways. First, the commitment can be enforced by the means of a third party, either internal,like a human-resource department, or external, like an auditing agency. The feedback policyin this case can be interpreted as a formal appraisal system used by the organization. Second,the assumption may be an outcome of a game in which the principal faces a sequence ofrelationships with many agents (e.g., advisor and graduate students). The feedback policyin this case can be interpreted as an established way the principal approaches the evaluationprocess. The commitment in this case can be maintained by the threat of ruining the agent’strust if deviating from the announced feedback policy.

This specification provides an upper bound on what the principal can achieve in therelationship. Any restrictions, strategic, legal, or technological, as well as potential costs of

17“The End of Asymmetric Information?”, Cato Unbound, April 6, 2015, http://www.cato-unbound.org/2015/04/06/alex-tabarrok-tyler-cowen/end-asymmetric-information.

32

Page 33: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

providing the feedback, could only worsen the principal’s payoff. For example, the speci-fication does not allow the principal to lie; every his message is verifiable and interpretedaccording to chosen feedback policy. One might think this is restrictive, and the principalcould do better if she could lie, unbound to the commitment. However, if the principalcould lie, then the agent would anticipate it and adjust his inferences accordingly. In factevery equilibrium communication without commitment can be replicated with a committedfeedback policy, but not vice versa. I study the case with no commitment in Section 6.2.Similarly, the specification (2) can replicate randomization over several feedback policiesunknown to the agent.

7.2 Discussion of the Results

Performance Transparency. The impact of feedback on overall performance has beenempirically studied in organizational psychology and recently in experimental economics.For the most of the 20th century psychology studies routinely concluded that feedback in-terventions are unambiguously good. Murphy and Cleveland (1995) were among the firstto alert of potentially adverse effects of feedback. Soon after, Kluger and DeNisi (1996)supported their concerns in a broad meta-analysis. They revealed many inconsistencies inprevious studies and showed that while on average the feedback intervention increased overallperformance, in more than a third cases it decreased it. Recently, Fu et al. (2015) confirmedin a controlled experiment that performance feedback has adverse effects in tournaments.

My analysis confirms the empirical findings of potentially adverse effects of increasingperformance transparency in organizations. I show that the adverse effects may come fromnegative feedback that discourages employees and lead them to decrease the efforts theyput into their job. In fact, full feedback is generically not optimal in my setting. Thesefindings become even more relevant now since, as discussed in the introduction, advancesin information technologies make it easier to provide feedback, and many companies usethis opportunity to provide more feedback to their employees. Note that increasing feed-back frequency does not mean increasing performance transparency per se. After all, mostfeedback can be uninformative. In fact, the optimal feedback policy is frequent; it providessome information to the agent in every period. However, my results suggest the contents offeedback must be chosen carefully.

Optimal Feedback Practices. My analysis reveals that many observed features of pro-fessional contracts may facilitate optimal feedback provision. When performance comes asinfrequent conclusive outcomes, then an optimal policy can be implemented through a tenure-track contract with an appropriately chosen date of revision. This performance technology

33

Page 34: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

is common in academia where a single brilliant idea can indicate a strong researcher. Indeed,Siow (1998) observed that up-or-out, or tenure, contracts dominate in academia. When per-formance comes as frequent but noisy outcomes, an optimal policy can be implemented by astandard on average performance. This performance technology is arguably common to themajority of organizations. Indeed, performance standards that increase over the career spanis a common practice.

In general, however, my analysis highlights the principal’s belief as a necessary andsufficient statistic for implementation of optimal feedback policy. The particular shape of theperformance path that led to a given belief and a calendar time are irrelevant. This simplifiespolicy implementation under general performance technology, as the principal needs only totrack his belief and correctly update it upon observing new performance. Then, in everyperiod, the principal needs only to inform the agent whether or not this belief is above acutoff that is commonly known and fixed throughout the relationship.

8 Concluding Remarks

Every feedback comes at the cost of possible discouragement. In this paper, I formally studiedthe tradeoff between the discouragement and the individual’s need for feedback. I showedthat optimal feedback condenses rich performance history into coarse signals. Under thisoptimal policy, the employee exerts effort and becomes increasingly optimistic over careerspan before his belief drops, after which he never exerts effort. This analysis alerts to theimplicit costs of increasing performance transparency in organizations and reveals that manyfeatures of professional contracts may facilitate optimal feedback provision.

This paper is a small step towards systematic economic analysis of feedback policies inorganizations. I abstracted away many aspects of feedback that are relevant in practiceand can be found in the organization behavior and education literature. These can andshould be formally studied within a game-theoretic framework. There is also a need forcomprehensive empirical studies of feedback. The systematic studies of existing feedbackpractices and their effects on performance are essential to inform further research in thisarea. The integral approach that includes both theoretical and empirical studies would helpbolster understanding of feedback provision and guide feedback practices in organizations.

34

Page 35: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

References

Alonso, R. and O. Câmara (2015): “Persuading voters,” Discussion Paper.

Beer, M. (1987): “Performance appraisal,” in Handbook of organizational behavior, ed. by

J. W. Lorsch, Prentice Hall.

Benoît, J.-P. and J. Dubra (2011): “Apparent overconfidence,” Econometrica, 79, 1591–

1625.

Bergemann, D. and U. Hege (1998): “Venture capital financing, moral hazard, and

learning,” Journal of Banking & Finance, 22, 703–735.

Bergemann, D. and S. Morris (2013): “Robust predictions in games with incomplete

information,” Econometrica, 81, 1251–1308.

Bergemann, D. and J. Välimäki (2008): “Bandit problems,” in New Palgrave Dictionary

of Economics, ed. by S. N. Durlauf and L. E. Blume, Basingstoke, UK: Palgrave Macmillan.

Bimpikis, K. and K. Drakopoulos (2014): “Disclosing information in strategic experi-

mentation,” Discussion Paper.

Blackwell, D. (1953): “Equivalent comparisons of experiments,” Annals of Mathematical

Statistics, 24, 265–272.

Bolton, P. and C. Harris (1999): “Strategic experimentation,” Econometrica, 67, 349–

374.

Che, Y.-K. and J. Hörner (2015): “Optimal design for social learning,” Cowles Founda-

tion Discussion Paper 2000.

Ederer, F. (2010): “Feedback and motivation in dynamic tournaments,” Journal of Eco-

nomics & Management Strategy, 19, 733–769.

35

Page 36: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Ely, J. (2015): “Beeps,” Discussion Paper.

Ely, J., A. Frankel, and E. Kamenica (2015): “Suspense and surprise,” Journal of

Political Economy, 123, 215–260.

Fang, H. and G. Moscarini (2005): “Morale hazard,” Journal of Monetary Economics,

52, 749–777.

Fu, Q., C. Ke, and F. Tan (2015): “’Success breeds success’ or ’Pride goes before a fall’?:

Teams and individuals in multi-contest tournaments,” Games and Economic Behavior, 94,

57–79.

Fuchs, W. (2007): “Contracting with repeated moral hazard and private evaluations,”

American Economic Review, 97, 1432–1448.

Gittins, J. C. and D. M. Jones (1974): “A dynamic allocation index for the sequential

design of experiments,” in Progress in Statistics, ed. by I. Vincze, J. Gani, and K. Sarkadi,

Amsterdam: North-Holland Pub. Co., 241–266.

Goltsman, M. and A. Mukherjee (2011): “Interim performance feedback in multistage

tournaments: The optimality of partial disclosure,” Journal of Labor Economics, 29, 229–

265.

Guo, Y. (2015): “Dynamic delegation of experimentation,” Discussion Paper.

Halac, M., N. Kartik, and Q. Liu (2015): “Contests for experimentation,” Discussion

Paper.

Hansen, S. E. (2013): “Performance feedback with career concerns,” Journal of Law, Eco-

nomics, and Organization, 29, 1279–1316.

Hörner, J. and Y. Guo (2015): “Dynamic mechanisms without money,” .

Hörner, J. and N. Lambert (2015): “Motivational ratings,” Discussion Paper.36

Page 37: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Kamenica, E. and M. Gentzkow (2011): “Bayesian persuasion,” American Economic

Review, 101, 2590–2615.

Keller, G., S. Rady, and M. Cripps (2005): “Strategic experimentation with exponen-

tial bandits,” Econometrica, 73, 39–68.

Kluger, A. N. and A. DeNisi (1996): “The effects of feedback interventions on per-

formance: A historical review, a meta-analysis, and a preliminary feedback intervention

theory,” Psychological Bulletin, 119, 254–284.

Kolotilin, A., M. Li, T. Mylovanov, and A. Zapechelnyuk (2015): “Persuasion of

a privately informed receiver,” Discussion Paper.

Kremer, I., Y. Mansour, and M. Perry (2014): “Implementing the ’Wisdom of the

Crowd’,” Journal of Political Economy, 122, 988–1012.

Lizzeri, A., M. A. Meyer, and N. Persico (2002): “The incentive effects of interim

performance evaluations,” Discussion Paper.

Moroni, S. (2015): “Experimentation in organizations,” Discussion Paper.

Murphy, K. R. and J. N. Cleveland (1995): Understanding performance appraisal: so-

cial, organizational, and goal-based perspectives, Thousand Oaks, CA: SAGE Publications.

Myerson, R. B. (1986): “Multistage games with communication,” Econometrica, 54, 323–

358.

Orlov, D. (2015): “Optimal design of internal disclosure,” Simon Business School Working

Paper No. FR 15-06.

Renault, J., E. Solan, and N. Vieille (2014): “Optimal dynamic information provi-

sion,” Discussion Paper.

37

Page 38: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Rothschild, M. (1974): “A two-armed bandit theory of market pricing,” Journal of Eco-

nomic Theory, 9, 185–202.

Siow, A. (1998): “Tenure and other unusual personnel practices in academia,” Journal of

Law, Economics, & Organization, 14, 152–173.

38

Page 39: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

9 Appendix

A Omitted Proofs

Proof of Lemma 1. The player’s problem is a standard bandit problem. This is a Markovdecision problem with the belief pti being a relevant state. Its solution is characterized byGittins and Jones (1974) to be an index policy. At every belief pti the player chooses anaction with the highest index ξa (pti); the player is indifferent between actions with equalindices. The indices are defined as solutions to reduced maximization problems that max-imize expected normalized payoffs over stopping times τ measurable with respect to pastperformance. In our setting the indices can be calculated as

ξ0 (pti) , supτ

E [∑τ

t=0 δtci | pti]

E [∑τ

t=0 δt | pti]

= ci,

ξ1 (pti) , supτ

E [∑τ

t=0 δthiyt | pti]

E [∑τ

t=0 δt | pti]

= hi supτ

ptiE [∑τ

t=0 δtyt | ω = 1] + (1− pti)E [

∑τt=0 δ

tyt | ω = 0]

ptiE [∑τ

t=0 δt | ω = 1] + (1− pti)E [

∑τt=0 δ

t | ω = 0]

= hi supτ

pti

pti + (1− pti)E[

∑τt=0 δ

t|ω=0]E[

∑τt=0 δ

t|ω=1]

=hipti

pti + (1− pti)κ.

where κ , infτE[

∑τt=0 δ

t|ω=0]E[

∑τt=0 δ

t|ω=1]> 0 and the forth line follows from the optional stopping theorem

and the normalization (1). Note that κ depends only on the performance technology andnot on hi, ci, or pti. Consequently, ξ1 (pti) does not depend on ci and is strictly increasing inpti and in hi whenever pti > 0. At the same time ξ0 (pti) does not depend on pti or hi andis strictly increasing in ci. The optimality of a cutoff strategy and the comparative staticswith respect to vi immediately follow.

Proof of Theorem 1. To prove the theorem I first show that any payoffs on the Paretofrontier can be obtained with cutoff feedback policies and the agent following the recom-mendations. As standard, define the set of feasible payoffs and the set of (strongly) efficient

39

Page 40: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

payoffs as,

V ,

(uA, uP ) ∈ R2 | ∃ a ∈ A,m ∈M : uA = UA (m, a) , uP = UP (m, a),

V , (uA, uP ) ∈ V |6 ∃ (u′A, u′P ) ∈ V : (u′A, u

′P ) > (uA, uP ) ,

where the vector inequality means both players are better off and at least one player isstrictly better off. By standard arguments, the set V is closed and convex.

Lemma 2. (Efficient Payoffs)For any (uA, uP ) ∈ V there exist p ∈ [0, 1] and α ∈ [0, 1] such that

uA = UA(y, acut (p, α)

),

uP = UP(y, acut (p, α)

).

Proof. Since V is closed and convex, by the supporting hyperplane theorem, each point onits boundary can be supported by a hyperplane. In particular, any (uA, uP ) ∈ V can beobtained as a solution to the maximization problem

maxa∈A,m∈M

γAUA (m, a) + γPUP (m, a) ,

or, equivalently, tomaxa∈A

γAUA (y, a) + γPUP (y, a) , (17)

with Pareto weights (γA, γP ) > 0 (see Figure 6). Due to the linearity of payoffs, this problemcan be viewed as a decision problem of a fictitious player with value for performance γAhA +

γPhP and value for shirking γAcA + γP cP . If (uA, uP ) is an extreme point of V , then anysolution to (17) delivers (uA, uP ) to the players. In particular, by Lemma 1 the payoff (uA, uP )

can be obtained by some cutoff strategy acut (p, α). If the pair (uA, uP ) is not extreme, thenthere are two extreme points such that (uA, uP ) is their linear combination. Moreover, thoseextreme points can be obtained by cutoff strategies acut (p, 0), acut (p, 1) for some p ∈ [0, 1].Since the payoffs change continuously with α, by the intermediate value theorem there existsα ∈ [0, 1] such that (uA, uP ) can be obtained by acut (p, α). The result follows.

I proceed to the main step of proving the theorem. Note that the agent can guaranteepayoff cA by never exerting effort, a ≡ 0. Incentive compatibility (7) then requires that theagent achieves at least cA under an optimal feedback policy. Denote by (u∗A, u

∗P ) the feasible

payoff pair that maximizes the principal’s payoff given that the agent obtains at least cA (seeFigure 3).

40

Page 41: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

uA

uP

uA

uP

(γA, γP )

(γ′A, γ′P )

Figure 6: Pareto weights and the corresponding efficient payoffs. The weights (γA, γP )correspond to an extreme point on the efficient frontier. The weights (γ′A, γ

′P ) correspond to

a facet of the efficient frontier.

I show there is a feedback policy and the corresponding induced effort strategy that ob-tains (u∗A, u

∗P ). By construction (u∗A, u

∗P ) is in V . By Lemma 2, there is an effort strategy

acut (p∗, α∗) that obtains it. Consider a recommendation feedback policy m∗ that recom-mends acut (p∗, α∗) and threatens with no feedback for disobeying. I show that the agent isincentivized to follow the recommendations.

Consider the agent’s incentives at time 0. If p0 ≤ p∗P , then the principal can achieve hermaximal feasible payoffs by setting p∗ = p∗P since I assumed that p∗P < pMA . Otherwise, theprincipal at time 0 recommends the agent to exert effort. If the agent disobeys, then theprincipal punishes with no feedback. The incentive compatibility in this case is equivalent forthe agent’s payoff to be greater than cA and is satisfied by construction. Consider the agent’sincentives after other histories. If the agent has always been recommended to exert effort,then his belief is, by the martingale property, greater than the prior, ptA ≥ p0. The incentivecompatibility after such history follows from the incentive compatibility at time 0, since thecutoff stays the same. If the agent is recommended to stop exerting effort, then his beliefdrops below p∗ and hence below pMA , and he is again willing to follow the recommendation

This shows that m∗ obtains (u∗A, u∗P ). Since by construction the principal cannot achieve

payoffs higher than u∗P , the result follows.

41

Page 42: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Proof of Proposition 4. An optimal feedback policy must induce the agent to exerteffort at time 0 with probability 1. Otherwise, the policy can be improved by inducing thecontinuation play following a0 = 0 earlier by one period. This change would benefit bothplayers due to discounting.

Consider the agent’s incentive at time 0. Denote by uωyi the player i’s expected continu-ation payoff induced by the optimal feedback policy conditional on ability being ω and theoutcome at time 0 being y. If the agent does not exert effort then a success at time 0 cannotoccur so his expected payoff is

(1− δ) cAdt+ δ(p0u

10A + (1− p0)u00

A

).

If he exerts effort, then a success happens with probability λdt if the ability is high so hisexpected payoff is

(1− δ) p0λhAdt+ δ(p0

(λu11

A dt+ (1− λdt)u10A

)+ (1− p0)u00

A

).

The agent exerts effort at time 0 only if the former is less than the latter or, rearrangingterms,

u11A − u10

A ≥1− δδ

cA − p0λhAp0λ

. (18)

Consider the maximization problem of a principal, ignoring all other incentive constraintsbut (18),

maxa∈A

(1− δ) p0λhP + δ(p0

(λu11

P + (1− λ)u10P

)+ (1− p0)u00

P

), (19)

s.t. u11A − u10

A ≥1− δδ

cA − p0λhAp0λ

.

Observe that u11P and u11

A are jointly maximized by exerting effort all the time after a successin the first period—delivering payoffs λhP and λhA respectively. Further, note that

uωyi = (ωhi − ci)Aωy + ci,

whereAωy , (1− δ)E [∑∞

t=1 δt−1at | m, a, ω, y] is the continuation discounted amount of ef-

fort the agent of ability ω is expected to put into his work after outcome y (cf. Guo (2015)).

42

Page 43: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Consequently, the problem (19) can be restated as

maxa∈A

p0 (1− λ) (hP − cP )A10 − (1− p0) cPA00,

s.t. (hA − cA)A10 ≤ λhA − cA −1− δδ

cA − p0λhAp0λ

.

By arguments similar to those in the proof of Theorem 1, the south-east boundary of the setof feasible pairs

(A10,A00

)can obtained by cutoff strategies so a strategy acut (p∗MH , α

∗MH)

solves the problem for some p∗MH , α∗MH . At the continuous-time limit, the probability atcutoff does not matter and can be set α∗MH = 1. Furthermore, this strategy induces exertingeffort all the time after success, so acut (p∗MH , 1) solves (19) as well.

Finally, observe that the cutoff effort strategy can be induced by a revision policymrev (T ∗). Under this policy, before revision the agent’s belief stands still. At the revi-sion time, it randomly splits into 1 or p∗MH < p∗A. The incentive compatibility at timest ≥ T ∗, after revision, is clearly satisfied. The incentive compatibility at times t < T ∗ isequivalent to nonprofitability of a one-shot deviation at time 0. The latter can be calculatedas follows. Recall that U rev

A (T ) is the agent’s payoff if he exerts effort until time T andcontinues exerting effort only if a success occurred, see (9) and (20). If the agent follows theexpected strategy then he obtains U rev

A (T ). If the agent one-shot deviates then he obtainsan opportunity flow cost of rcAdt and faces a revision policy with revision time T −dt in thecontinuation game. Consequently, the incentive compatibility is equivalent to

U revA (T ) ≥ rcAdt+ (1− rdt) (U rev

A (T )− U rev′A (T ) dt) ,

U revA (T ) ≥ cA −

U rev′A (T )

r.

At the maximal incentive compatible revision time T , the constraint binds, hence the result.

Proof of Proposition 5. If p∗ = p∗P , then the result is straightforward. The agent’sincentives are the same as in the game with commitment, since babbling is an equilibriumthreat. The principal’s incentives are satisfied, since she achieves her maximal feasible payoff.

If p∗A > p∗P = 0, then the argument is more elaborate and goes as follows. For feedbackto be informative, the agent must exert effort for at least one period. The agent is initiallypessimistic, p0 < pMA , otherwise the optimal policy would be no feedback, p∗ = p∗P = 0.Hence, for the agent to exert effort, he must be promised some information. In particular, ahistory at which the agent’s belief goes strictly above pMA must occur with positive probability;

43

Page 44: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

otherwise his expected payoff would be less than his outside option cA. Consider any suchhistory just before the crossing so that pt−1A ≤ pMA and ptA > pMA after some message mt.Since beliefs are martingale, there must be an alternative message m′t such that ptA < pMAafter m′t.

I show that the principal has strict incentives to send mt over m′t making the messageuninformative and contradicting the purported split of beliefs. Indeed, the agent must notexert effort with positive probability following m′t for his continuation payoff to be above cA.Consequently, for the message to be informative, the agent must not exert effort followingmt with positive probability: at any belief pt−1P ∈ (0, 1), the principal strictly prefers acontinuation path at which the agent exerts effort all the time. But the agent myopicallyprefers to exert effort at belief ptA > pMA . For the agent not to exert effort after mt, theprincipal must promise further informative feedback. But once again, for the feedback tobe informative, the agent with positive probability must not exert effort whenever his beliefgoes up. In what follows, I show this sequence cannot go on forever. At some point, theagent will be sufficiently certain to ignore the principal’s threat of no feedback and exerteffort forever; that would lead to a contradiction.

Assume it is known that the agent’s belief does not jump with positive probability abovesome bound pA, a natural starting candidate being pA = 1. In this case, the maximal possiblefeedback is the one at which the agent’s belief in the next period splits from pA into pA and0 with the respective probabilities pA/pA and 1 − pA/pA.18 The minimal feedback is nofeedback. Consequently, the following condition must be satisfied for the agent to be willingnot to exert effort for at least one period,

(1− δ) cA + δ

(pApAhApA +

(1− pA

pA

)cA

)≥ pAhA,

that is equivalent topA ≤ ψ (pA) ,

cAδcA + (1− δhApA)

pA.

Note that pA ≤ pA whenever pA ≥ pMA . But then, ψ (pA) is a new upper bound on theagent’s beliefs. Upon iteration, this process converges to a fixed point of ψ (pA), which ispMA , ψ

(pMA)

= pMA . Consequently, the agent’s belief can never go above pMA , so he never startsexerting effort, and no informative communication can be sustained in equilibrium.

18This upper bound may not be achieved for a given performance technology.

44

Page 45: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

Proof of Proposition 6. The proof follows the proof of Theorem 1. As there, denote theset of feasible payoffs and the set of (strongly) efficient payoffs respectively by V and V .

Lemma 3. (Efficient Payoffs with Belief Disagreement)For any (uA, uP ) ∈ V , there exists a cutoff in the agent’s belief pA ∈ [0, 1] and a probabilityat cutoff α ∈ [0, 1] such that

uA = UA(y, acut (pA, α)

),

uP = UP(y, acut (pA, α)

).

Proof. Consider any payoffs on the Pareto frontier (uA, uP ) ∈ V . As before, these payoffscan be obtained by an effort strategy that solves the following maximization problem

maxa∈A

γAUA (y, a) + γPUP (y, a) ,

for some Pareto weights (γA, γP ) > 0. This problem can be written as

maxa∈A

γA (p0A (hA − cA)A1 − (1− p0A) cAA0) + γP (p0P (hP − cP )A1 − (1− p0P ) cPA0)

where Aω , (1− δ)E [∑∞

t=0 δtat | m, a, ω]. Consequently, this problem can be viewed as a

decision problem of a fictitious player with a prior belief p0, value for performance h, andopportunity costs c:

maxa∈A

p0

(h− c

)A1 − (1− p0) cA0,

p0 = p0A,

h = γAhA + γP

(1− p0P

1− p0A

cP +p0P

p0A

(hp − cp)),

c = γAcA + γP1− p0P

1− p0A

cP .

By Lemma 1, the problem has a cutoff strategy acut (p, α) as its solution. If (uA, uP ) is anextreme point of V , then it can be obtained by some cutoff strategy acut (pA, α). If (uA, uP ) isnot extreme, then there are two extreme points such that (uA, uP ) is their linear combination.Moreover, those extreme points can be obtained by cutoff strategies acut (pA, 0), acut (pA, 1)

for some p ∈ [0, 1]. Since the payoffs change continuously with α, by the intermediate valuetheorem there exists α ∈ [0, 1] such that (uA, uP ) can be obtained by acut (pA, α). The resultfollows.

The agent can guarantee payoff cA by never exerting effort, a ≡ 0. Incentive compatibility45

Page 46: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

(7) then requires that the agent achieves at least cA under an optimal feedback policy. Denoteby (u∗A, u

∗P ) the feasible payoff pair that maximizes the principal’s payoff given that the agent

obtains at least cA.By construction, (u∗A, u

∗P ) is in V . By Lemma 3, there is an effort strategy acut (p∗A, α

∗)

that obtains it. Denote by p∗P the principal’s belief that corresponds to p∗A according to(13). It is straightforward to check that a (p∗P , α

∗)-cutoff feedback policy with a threat of nofeedback achieves (u∗A, u

∗P ); the arguments are the same as in the case with no disagreement.

The result follows.

B Calculations for Section 5

Conclusive Performance. The player’s payoff under the revision policy can be calculatedas follows. If the ability is high, then the agent exerts effort until time T and continuesexerting effort only if a success occurred. The corresponding payoff is

U revi1 (T ) =

(1− e−rT

)λhi + e−rT

((1− e−λT

)λhi + e−λT ci

).

If ability is low, then the agent stops exerting effort at rate γ before time T and never exertseffort after T since successes cannot occur. The corresponding payoff is

U revi0 (T ) =

ˆ T

0

γe−γte−rt (−hi + ci) dt+ e−rT e−γT ci

γ + r

(ci

(1 +

r

γe−(γ+r)T

)− hi

(1− e−(γ+r)T

)).

The total expected payoff is the average of these two terms weighted at the prior probability,

U revi (T ) = p0U

revi1 (T ) + (1− p0)U rev

i0 (T ) . (20)

Inconclusive Performance. By Bayes’ rule, the principal’s belief given the average per-formance yt can be calculated as

ptP (yt) =p0φt1 (yt)

p0φt1 (yt) + (1− p0)φt0 (yt)

=1

1 + (1−p0)p0

φt0(Yt)φt1(Yt)

,

46

Page 47: Optimal Feedback Design - Kelley School of Business · concept of feedback policy and the very design question is motivated byKamenica and Gentzkow(2011). I enrich their static persuasion

where φtω is a probability density of a normally distributed variable with mean µω andvariance σ2/t, so

φt0 (yt)

φt1 (yt)=

exp(− (yt−µ0)2

2σ2 t)

exp(− (yt−µ1)2

2σ2 t)

= exp

(−µ1 − µ0

σ2

(yt −

µ0 + µ1

2

)t

).

Hence, any belief cutoff p corresponds to the a cutoff in the average performance yt accordingto

1− pp

=1− p0

p0

exp

(−µ1 − µ0

σ2

(yt −

µ0 + µ1

2

)t

),

or, rearranging,

yt (p) = µ+ y0 (p)1

t,

where y0 (p) = σ2

µ1−µ0 ln(

1−p0p0

p1−p

)and µ , µ1+µ0

2.

47