sequential procurement with contractual and experimental ... · client can set up a job (which...

Sequential Procurement with Contractualand Experimental Learning

Yonatan GurStanford University

Gregory MacnamaraStanford University

Daniela Saban∗

Stanford University

June 5, 2019

Abstract

We study the design of sequential procurement strategies that integrate stochastic and strategicinformation. We consider a buyer who repeatedly demands a certain good and is unable tocommit to long-term contracts. In each time period, the buyer makes a price offer to a sellerwho has private, persistent information regarding his cost and quality of provision. If the offer isaccepted, the seller provides the good with a stochastic quality that is not contractible; otherwise,the buyer sources the good from a known outside option. The buyer can therefore learn fromthe (strategic) acceptance decisions taken by the seller, and from evaluations of the (stochastic)quality delivered whenever a purchase occurs. Hence, the buyer not only faces a tradeoff betweenexploration and exploitation, but also needs to decide how to explore: by facilitating qualityobservations, or by strategically separating seller types. We characterize the Perfect BayesianEquilibria of this sequential interaction and show that the buyer’s equilibrium strategy consistsof a dynamic sequence of thresholds on her belief on the seller’s type. In equilibrium, thebuyer offers high prices that incentivize trade and quality experimentation at early stages ofthe interaction and, after gathering enough information (and if her belief is sufficiently low),she advances to offering low prices that may partially separate seller types. Contrasting thebuyer’s strategy with two benchmark strategies designed to learn from each form of informationin isolation, we identify the effect that strategic sellers may have on the buyer’s optimal strategyrelative to more traditional learning dynamics, and establish that, paradoxically, when sellersare strategic, the ability to observe delivered quality is not always beneficial for the buyer.Keywords: Incomplete information, quality experimentation, learning, dynamic contracts,dynamic games, procurement, perfect Bayesian equilibrium, Gittins index

1 Introduction

When making sourcing decisions, buyers’ primary concerns are typically the quality of the good

that they purchase and the price that they pay. A prominent challenge in evaluating potential

suppliers is that only limited information may be available to the buyer; important characteristics∗Correspondence: [email protected], [email protected], [email protected].

1

of a candidate seller, including ones that are relevant to his value proposition, are typically the

seller’s private information. These characteristics include the cost of provision, which impacts the

price, as well as institutional attributes such as technology, knowledge, and logistics, which impact

the seller’s ability to consistently deliver high quality.

However, each time that the buyer interacts with a seller, information about these characteristics

may be revealed to the buyer in multiple ways. In particular, before a purchase, the buyer and seller

must agree on a price, which may be informative about the seller’s cost of provision. In addition,

after a purchase, the delivered quality may be informative about the quality that the seller will

be able to deliver in the future. When the buyer and seller interact repeatedly, the information

acquired by the buyer can, in turn, be essential for improving her future decisions.

To illustrate, consider hiring a freelancer through an online labor platform such as Upwork, which

often hosts repeated interactions between providers and clients (Edelman Intelligence 2017). The

client can set up a job (which includes a description and a price) and reach out to a freelancer.

Relative to the client, the freelancer is likely to be better informed of his ability to perform the task

and the costs associated with that task (e.g., the value of his time or his cost of logistics). Therefore,

whether or not the freelancer accepts the price may be informative about his opportunity costs.

Moreover, if the freelancer chooses to accept that price, observing the quality of the completed

job may be informative about his skills and ability to perform similar tasks in the future. Based

on what she learned, should the client invite the freelancer again the next time she needs to hire

someone to complete a similar task? If so, what price should she offer?

One may observe that the forms of information mentioned in the above example tend to be fun-

damentally different in nature. When quality is driven by attributes that are not easily modifiable

within the scope of the interaction with the buyer (for instance, when the task is based primar-

ily on the provider’s education, technology, or logistics), information from quality observations is

stochastic (and non-strategic) in nature. On the other hand, information that arises from agreed

upon transaction prices is the result of strategic decisions made by the provider. When the cost of

provision and the ability to provide quality are both the seller’s private information, stochastic and

strategic information can potentially be integrated together to design effective sourcing strategies.

The objective of this paper is to study the design of dynamic sourcing strategies that effectively

integrate these forms of information.

2

1.1 Main Contributions

Our main contributions are three-fold. First, we introduce a stylized model to study dynamic

sourcing strategies that effectively integrate stochastic and strategic information. Second, we char-

acterize the equilibria in a manner that allows us to analyze the main tradeoffs faced by a buyer.

Finally, we showcase the impact that the interplay between these different forms of information

has on the sourcing strategies, thus highlighting the importance of integrating both sources of

information when modeling dynamic procurement settings. We provide more details next.

Modeling. We develop an integrated learning model of contracting in a repeated, strategic envi-

ronment where the seller has private, persistent information on his cost and (stochastic) quality

of provision. The seller’s cost and quality are perfectly correlated, so the seller’s type is either

high-cost, high-quality or low-cost, low-quality. In each period, the buyer offers a price to the seller

for providing one unit. The seller either accepts or rejects the offer. If the seller accepts, he then

delivers a unit with stochastic quality. If the seller rejects, then the buyer relies on an outside op-

tion (e.g., in-house production team) with known cost and quality distribution. The buyer cannot

commit to long-term contracts and realized quality is not contractible, in the sense that a price

is offered at the beginning of each period and is only valid for that period, and payments cannot

depend on present (or future) quality (but price offers can depend on past quality realizations).

The buyer can learn about the seller’s cost and quality through two sources of information. First,

the seller’s strategic acceptance decision is informative whenever the buyer makes a price offer that

partially separates seller types; since the ability to learn from the seller’s decisions depends on

the contract (price) offered by the buyer, we refer to this form of learning as contractual learning.

Second, if the seller accepts the price offer, the quality that he delivers is informative of his type;

we refer to this form of learning as experimental learning.

While our model is stylized and is not designed to fully describe the often complex interactions

between providers and clients, it introduces the integration of strategic and stochastic information

in a contracting setting and, by doing so, it captures key elements that are relevant to a broad

array of applications in which both forms of information are available. Studying general mechanisms

that incorporate these information forms is interesting, but not necessarily tractable. The class of

mechanisms we focus on (take-it-or-leave-it price offers) is common in practice (e.g., in Upwork, as

mentioned above), has been previously studied in the contracting literature (e.g. Hart and Tirole

(1988)), and allows for the tractability that is essential for characterizing learning dynamics.

3

Equilibrium Analysis. We characterize the perfect Bayesian equilibria of the above game. We

show that these equilibria can be simply described by a sequence of time-increasing thresholds on

the buyer’s belief that the seller is a high-type. When her belief is above the threshold, the buyer

offers a high price, the seller accepts (regardless of his type) and provides the good, and then the

buyer updates her belief based on the realized quality. Thus, when the buyer’s belief is above the

equilibrium threshold, she engages in experimental learning but does not learn from the seller’s

decisions. On the other hand, when the buyer’s belief is below the threshold, both the buyer’s

offered price and the seller’s acceptance decision depend on the relative value of the buyer’s outside

option. If the outside option is more efficient than a low-type seller, the buyer offers a low price

that is rejected by the seller regardless of his type. In such cases, neither contractual learning nor

experimental learning occurs at low price offers, as these offers are always rejected and trade never

happens. In contrast, if the buyer’s outside option is less efficient than a low-type seller, she offers

a price that a high-type seller rejects and a low-type seller accepts with positive probability. Thus,

the buyer engages in contractual learning and updates her belief based on the seller’s acceptance

decision. Figure 1 demonstrates trajectories of the buyer’s belief over time.

15 10 5 00

0.2

0.4

0.6

0.8

1Parameters: qH = .6, qL = .3, cH = .2, cL = 0, λ = .2

Equilibrium Thresholds

Buyer’s Belief

Periods to Go

(a) Bad Outside Option

15 10 5 00

0.2

0.4

0.6

0.8

1

Equilibrium Thresholds

Buyer’s Belief

Parameters: qH = .6, qL = .3, cH = .2, cL = 0, λ = .35

Periods to Go

(b) Moderate Outside Option

Figure 1: Examples of belief trajectories when the outside option is less efficient than both seller types(Figure 1a), and when it is more efficient than a low-type seller but less efficient than a high-type seller(Figure 1b). In both cases, at beliefs above the threshold, the buyer offers a high price, the seller accepts,and the buyer uses quality realizations to update her belief. At beliefs below the thresholds the buyer offersa low price. In the case depicted in Figure 1a, this price is rejected until it is accepted with five periods togo; from that point on the buyer knows the seller is a low type and her belief is updated to zero. In the casedepicted in Figure 1b, the low price is always rejected, trade never happens, and the belief does not update.

Thus, when the buyer’s outside option is more efficient than a low-type seller, in equilibrium, the

buyer’s price offer in each period induces the same response by both seller types. Therefore, the

buyer’s strategy in the resulting equilibrium reflects an exploration/exploitation trade off that is

4

common in models of sequential decision making under uncertainty. When the outside option is less

efficient than the low-type seller, however, the buyer not only faces a tradeoff between exploration

and exploitation, but also needs to decide how to explore, as information can be gathered by

observing quality (through high price offers) as well as by separating seller types (through low price

offers). By balancing this new tradeoff, we show that the buyer’s equilibrium thresholds have two

roles as they not only capture how the value of information that is derived from experimentation

changes throughout the problem horizon, but also determine the incentives of a low-type seller to

accept low price offers. We find that, in equilibrium, the buyer explores through high price offers

that facilitate quality experimentation at early stages of the interaction with the seller, and after

gathering enough information (and if her belief is sufficiently low), she advances to offering low

prices that may partially separate seller types at later stages of the interaction.

Interplay between Stochastic and Strategic Information. To elucidate how the interplay

between stochastic and strategic information impacts the buyer’s strategy and payoff, we analyze

benchmark models in which the buyer has access to only one of the information forms that are

present in the integrated learning model. First, we introduce an experimental-only benchmark,

where all seller types follow the same strategy, and therefore the seller’s decisions themselves are

not informative and the buyer can only learn from quality observations. In this benchmark model,

the buyer’s optimal strategy balances a well-studied exploration/exploitation tradeoff, and corre-

sponds to a Gittins Index policy. Contrasting the equilibrium thresholds of our integrated learning

model with those of the experimental-only benchmark captures how the ability to extract strategic

information from sellers may impact the buyer strategy. When the buyer’s outside option is more

efficient than a low-type seller, the equilibrium thresholds under the two models coincide (as seller

types are pooled at equilibrium price offers). When the buyer’s outside option is less efficient than

a low-type seller, equilibrium thresholds no longer coincide, and in that regime, we identify two

opposite effects that impact the buyer’s equilibrium strategy in the presence of strategic seller be-

havior. On the one hand, the value of offering low prices increases because a low-type seller may

accept such offers with positive probability; this effect puts an “upward pressure” on the thresholds.

On the other hand, the value of offering high prices may also increase because observing a failure

reduces the buyer’s belief and makes a low-type seller more likely to accept low price offers in future

periods. This effect, which becomes more dominant as the number of future periods increases, puts

a “downward pressure” on the thresholds. The impact of these effects is illustrated in Figure 2.

Second, we introduce a contractual-only benchmark, where the seller’s strategic decisions are in-

5

15 10 5 0

0

0.2

0.4

0.6Parameters:qH = .6, qL = .3, cH = .2, cL = 0, λ = .2

Integrated Learning

Experimental Only(Gittins Index)

Downward EffectDominates

Upward EffectDominates

Periods to Go

EquilibriumThreshold

Figure 2: Impact of strategic information on equilibrium thresholds. In the depicted case, with access tostrategic information the seller’s equilibrium thresholds are higher when there are four or fewer periods togo, and lower when there are five or more periods to go.

formative but the buyer cannot observe quality realizations. We characterize the equilibria in this

benchmark model and show that, again, the buyer’s strategy consists of a sequence of thresholds

on her belief. Interestingly, comparing the buyer’s payoffs under both models we establish that

access to stochastic quality observations can be detrimental for the buyer. When her belief is suffi-

ciently low the buyer may prefer to remove her ability to observe quality realizations in subsequent

periods. By doing so, her equilibrium thresholds in subsequent periods increase because offering a

high price becomes less valuable as the buyer no longer benefits from additional information. The

increase in future thresholds impacts the seller incentives in the current period, and may increase

the probability that a low-type seller accepts a low price offer, which would reveal his type and

increase the buyer’s cumulative payoff.

1.2 Related Literature

Our objective in this paper is to study how access to two different types of information affects

a buyer’s purchasing strategy. In doing so, our work relates to several strands of literature in

operations and economics.

First, with the inclusion of stochastic quality realizations as a source of information, our work

relates to models of experimentation-based learning in which a decision maker confronts a tradeoff

between new information acquisition (exploration), and maximizing instantaneous payoffs based

on available information (exploitation). This tradeoff has been studied in various settings in the

6

operations literature, including pricing (e.g., Harrison et al. (2012), den Boer and Zwart (2014)),

retail assortment selection (e.g., Caro and Gallien (2007), Saure and Zeevi (2013)) and inventory

management (e.g., Huh and Rusmevichientong (2009), Besbes and Muharremoglu (2013), Besbes

et al. (2017), Papanastasiou (2019)). The ‘supply-learning’ model in Tomlin (2009) uses similar

ideas to characterize the optimal inventory and sourcing policy of a buyer who, in each period, can

source from two suppliers, where the reliability of one of these suppliers is unknown. Their model

differs from ours as the price schedule that the buyer must pay is exogenous, while we incorporate

the screening process by which the buyer and seller agree to a price each period.

Second, by considering a buyer who repeatedly makes price offers to a seller and can learn from his

strategic acceptance decisions, our work relates to the literature on dynamic contracting. Within

this stream of work, our model is closest to the literature on dynamic screening without commit-

ment, as we also assume that the buyer cannot commit to long-term contracts. A common theme in

this literature is that an agent is reluctant to reveal information because the principal will update

the terms of a contract once he does so; this is the so-called ‘ratchet effect’ which has been studied

in many settings including Hart and Tirole (1988) and Laffont and Tirole (1990). Moreover, two

recent papers, Acharya and Ortner (2017) and Ortner (2017), study how exogenous shocks to the

contracting environment help alleviate the ratchet effect. The key difference between our work

and this stream of literature is that, in our setting, the buyer observes (noisy) quality realizations

each period that she purchases, which is an additional source of information about the seller’s

private type. In fact, in §4.1, we introduce a benchmark model where quality realizations are not

observable, which is similar to the non-commitment, rental model of Hart and Tirole (1988).

In our model, we assume that the seller has private information regarding both his cost and quality

of provision (although it remains one-dimensional as cost and mean quality are perfectly correlated).

Previous work has looked at the design of optimal mechanisms for a buyer to purchase from a seller

who has cost and quality (or reliability) that are unknown to her, albeit in static settings (see, e.g.,

Che (1993), Wan and Beil (2009), Yang et al. (2012), Chaturvedi and Martınez-de Albeniz (2011)).

Our model differs, as well, from the dynamic mechanism design literature, (see, e.g., Kakade et al.

(2013) and Pavan et al. (2014)), in several ways. In their models, the authors assume that the

principal can commit to a dynamic mechanism, while we assume that she cannot. Moreover, in our

model, the buyer observes a noisy signal of the seller’s type after making a purchase whereas in the

dynamic mechanism design literature the principal only learns from the messages that agents send.

7

Our work also relates to literature on relational contracting, which is a general framework described

in Levin (2003) for studying repeated interactions where the prospect of future payoffs provides

incentives in situations where contracts cannot be perfectly enforced. This framework has been

adopted in Taylor and Plambeck (2007a) and Taylor and Plambeck (2007b) to study how future

interactions can incentivize a supplier to invest in capacity. Recently, Bondareva and Pinker (2018)

study how relational contracts can be used to induce costly effort from a supplier when quality

realizations need not be contractible; however, in their setting, they assume that the buyer can

perfectly commit to contracts and does so at time zero. Our framework differs from these models

as these do not involve learning (there is either no private information or types are drawn indepen-

dently in each period) and focus on using relational incentives to alleviate a moral hazard problem.

In contrast, we focus on learning when private information is persistent and quality is stochastic.

One model which also studies experimentation-based learning in a strategic setting is Bergemann

and Valimaki (1996); in their model, however, all parties have symmetric information and the sellers’

strategic pricing decisions do not reveal new information to the buyer. Thus, the dynamics of our

model differ substantially from theirs. Of note, equilibria of their model are efficient (i.e. maximize

social welfare) and the buyer purchases from the seller whose product’s quality is associated with

the largest Gittins Index. While we identify a similar outcome in one parameteric region, in general,

the equilibria of our model do not maximize social welfare and the buyer’s policy does not coincide

with a Gittins Index policy due to the seller’s private information.

Broadly speaking, the above studies consider either (1) learning agents that operate in non-strategic

environments (or, in the case of Bergemann and Valimaki (1996), an environment where strategic

actions do not reveal information) in which other players follow some myopic decision rules; or

(2) a decision maker, or principal, who learns about the seller’s private information only through

their strategic actions. In our framework, the buyer can learn from both the stochastic quality

observations and the strategic actions taken by the seller in response to price offers.

Many of the above studies also consider the risk of over-exploration that is typically driven by

characteristics of the reward (or cost) structure at hand. Our study identifies a new driver of over-

exploration in a strategic principal-agent setting, in the form of strategic reactions of agents to

the prospect of future experimentation conducted by a principal. In particular, our work identifies

cases in which the ability to learn from observing quality realizations can lower the buyer’s ex-

pected payoff. This also connects our study to previous research that has found that limiting one’s

8

information can be beneficial in strategic settings, for example in matching markets (e.g. Ostrovsky

and Schwarz (2010), Kanoria and Saban (2017)) or other contracting environments (e.g., Kessler

(1998), Roesler and Szentes (2017)). We are unaware of any other work showing that access to

information can make a principal worse off in a setting closer to ours.

2 Model of Integrated Learning

We formulate a repeated game of incomplete information between a principal (“buyer” or “she”)

and an agent (“seller” or “he”). Some aspects of the model are discussed in more detail in §2.2.

Let t = 1, ..., T be a finite sequence of discrete time periods. In each period, the buyer demands

one indivisible unit of a product or service that can be sourced from either a strategic seller or an

outside option. Before the first period, the seller privately draws a type θ ∈ {H,L}, which we refer

to as high and low, according to the distribution:

θ =

H, w.p. γ

L, w.p. 1− γ,

where 0 ≤ γ ≤ 1 is commonly known. The seller’s type is fixed throughout the horizon and

determines both the seller’s cost of provision, denoted by cθ, and mean quality, denoted by qθ.

Within-period dynamics. We model the interaction between the buyer and the seller using the

following class of mechanisms, which is depicted in Figure 3. In each period t, the buyer first updates

her beliefs (publicly) about the seller’s type based on the history of actions and observations. Based

on her belief, the buyer makes the seller a (possibly random) take-it-or-leave-it price offer, pt ∈ R,

to provide the good. The seller observes the price offer and decides whether to accept or reject it.

We denote the (possibly random) binary rejection/acceptance decision of the seller by at ∈ {0, 1}.

If the seller rejects the offer, no payment is made, the seller receives the value of his outside option,

which we normalize to zero, and the buyer uses her outside option for which she incurs a cost of

c0 and which has quality yt ∼ Ber(q0); the parameters c0 and q0 are common knowledge. If the

seller accepts the offer, he receives the payment pt from the buyer, incurs the cost cθ, and delivers a

unit with random quality yt ∼ Ber(qθ). Finally, both the buyer and the seller observe the realized

quality of the unit. These dynamics are summarized in Figure 3.

9

Period t Begins Period t+ 1 Begins

BuyerUpdates Belief

µt

BuyerOffers Price

pt

SellerSelects Response

at

BothObserve Quality

yt

Figure 3: Dynamics of each period

By making a price offer pt, the buyer commits to purchasing at that price in the current period

(in case this offer is accepted by the seller), but cannot credibly commit to future price offers or

purchase decisions; this is formalized in the equilibrium definition that will be advanced shortly.

Payoff functions. Both the buyer and the seller are expected value maximizers. The buyer values

quality according to a function u : R→ R, which is common knowledge. In each period, for a given

price offer p ∈ R, response a ∈ {0, 1}, and realized quality y ∈ {0, 1}, the buyer’s payoff is given by

u(p, a, y) =

u(y)− p, if a = 1

u(y)− c0, if a = 0,

and in addition, for each θ ∈ {H,L} the payoff of a seller with type θ is given by

vθ(p, a) =

p− cθ, if a = 1

0, if a = 0.

We use the parameter λ := q0u(1) + (1 − q0)u(0) − c0 to denote the buyer’s expected value from

using her outside option, and for simplicity we set u(y) = y without loss.1 We further assume

cH > cL without loss, and, for concreteness, focus on the case where the high-type seller is more

efficient than the low-type seller, that is, qH − cH > qL − cL.

Histories and strategies. The information that is revealed in each period includes the offered

price, the seller’s response to it, and the realized quality, that is, the triple (pt, at, yt). Note that

these are revealed sequentially: the distribution of the quality yt depends on the acceptance decision

at, which can depend on the offered price pt. We denote the history of publicly available information1Setting u(y) = y is without loss as the shape of the utility will always be linear because y can take on only two

values, and any changes in the intercept and/or slope can be implemented by adjusting the value of λ.

10

at the beginning of period t by:

h1 = 〈γ〉 , and ht =⟨γ, (pt′ , at′ , yt′)t−1

t′=1

⟩, for t > 1,

We denote by {Ht = σ(ht), t = 1, ..., T} the filtration associated with the process {ht}Tt=1, and we

denote the set of possible histories at the beginning of period t asHt = {γ}×(R× {0, 1} × {0, 1})t−1.

Given a space X we denote by ∆(X ) the set of probability measures on X . A buyer strategy is

a sequence of mappings from histories to probability distributions over price offers. We denote a

buyer strategy by ρ = {ρt}Tt=1, where ρt : Ht → ∆(R) for each t = 1, . . . , T . We denote by P the

set of buyer strategies that are measurable with respect to {Ht, t = 1, ..., T}. For a given period t

and history h ∈ Ht, the price offer pt is distributed according to ρt(h).

The seller is assumed to know his own type θ. In addition, the seller’s information at the time of his

action in period t consists of the public history ht and the current price offer pt. We denote the seller

history by hst = 〈θ, ht, pt〉, and denote by {Hst = σ(hst ), t = 1, ..., T} the filtration associated with the

process {hst}Tt=1. We denote the set of possible seller histories in period t as Hst = {L,H}×Ht×R.

A seller strategy is a sequence of mappings from seller histories to probability distributions over

acceptance decisions. We denote the seller strategy by α = {αt}Tt=1 where αt : Hst → ∆({0, 1})

for each t = 1, . . . , T . We denote by A the set of seller strategies that are measurable with respect

to Hst . For a given period t and seller history hs ∈ Hst , the acceptance decision at is distributed

according to αt(hs). In addition, for any θ ∈ {H,L}, for each t = 1, 2, . . . the quality yt is an

independent Bernoulli random variable such that E (yt|at = 0) = q0 and E (yt|at = 1) = qθ.

We define a belief system µ as a sequence of mappings from histories to a belief about the seller’s

type. We denote µ = {µt}Tt=1 where µt : Ht → [0, 1] and µt(h) is the probability that the buyer

assigns to the event {θ = H} given a history h ∈ Ht; in our equilibrium concept discussion below

we advance a concrete structure for these mappings. For a period t, a history h ∈ Ht, strategies

ρ ∈ P, α ∈ A and a belief system µ, we define the buyer’s expected continuation payoff at the

beginning of period t as:

Uρ,α,µt (h) = Eb

(T∑t′=t

u (pt′ , at′ , yt′)∣∣∣∣∣ht = h

),

where, Eb, here and throughout the paper, is taken with respect to {pt′}Tt′=t, {at′}Tt′=t, {yt′}Tt′=t,

11

and θ. In addition, for a given seller history hs ∈ Hst , we define the seller’s expected continuation

payoff at the time of his acceptance decision as:

V ρ,α,µt (hs) = Es

(T∑t′=t

vθ (pt′ , at′)∣∣∣∣∣hst = hs

),

where Es, here and throughout the paper, indicates that expectation is taken with respect to

{pt′}Tt′=t+1, {at′}Tt′=t, and {yt′}Tt′=t.

Equilibrium concept. The solution concept we use in this paper is Perfect Bayesian Equilibrium

(PBE); throughout the paper we use the term equilibrium to refer to PBE unless specified otherwise.

This equilibrium concept refines the concept of subgame perfection to environments of incomplete

information by requiring a belief system and imposing requirements on that belief system. A triple

(ρ,α,µ) is said to be a Perfect Bayesian Equilibrium if:

1. The buyer strategy ρ is sequentially rational; that is, for each period t and history h ∈ Ht:

Uρ,α,µt (h) ≥ Uρ′,α,µ

t (h) , ∀ρ′ ∈ P. (1)

2. The seller strategy α is sequentially rational; that is, for each type θ ∈ {H,L}, period t, and

seller history hs ∈ Hst :

V ρ,α,µt (hs) ≥ V ρ,α′,µ

t (hs) , ∀α′ ∈ A. (2)

3. At the beginning of each period t the buyer’s belief µt is updated according to Bayes rule;

that is, for all t and for any h ∈ Ht such that P(ht = h) > 0 given strategies ρ,α:

µt(h) = γP(ht = h|θ = H)γP(ht = h|θ = H) + (1− γ)P(ht = h|θ = L) . (3)

One may observe that this includes µ1(h1) = γ.

We further make the following three refinements for this equilibrium concept. The first two are

based on the notion of PBE from Fudenberg and Tirole (1991) and refine how to calculate the

buyer’s belief for some histories that occur with probability zero. First, for h ∈ ht, we require

that Bayes’ rule is used to update her belief from µt(h) to µt+1(〈h, (p, a, y)〉) whenever possible;

notably this includes histories where P(ht = h) = 0. Second, we specify a belief update rule (“no-

signaling-what-you-don’t-know condition,” see discussion in §8.2.3 of Fudenberg and Tirole 1991)

after unilateral deviations by the buyer that induce histories that occur with probability zero in

12

equilibrium.2 Third, we focus on equilibria in which the seller accepts a price offer of cH with

probability one when indifferent between accepting and rejecting it.3

2.1 Forms of Learning

We distinguish between two forms of information through which the buyer can learn about the

seller’s type. The first one consists of the strategic acceptance decisions taken by the seller. We

say that the buyer engages in contractual learning in period t if the realization of the acceptance

decision at reveals information about the seller’s type. These acceptance decisions are informative

whenever, conditioned on the public history and the realized price offer (which are already known

to the seller), the distribution over acceptance decisions employed by the seller depends on his type.

That is, when for given p ∈ R and h ∈ Ht one has:

P(at = 1|θ = L, ht = h, pt = p) 6= P(at = 1|θ = H,ht = h, pt = p). (4)

In this case we say that p is a partially separating price offer; otherwise, p is a pooling price offer.

The second form of information consists of quality realizations, which may be informative in periods

where the seller accepts the price offer and provides the good. We say that the buyer engages in

experimental learning in period t if at = 1 and the observation of yt reveals information about the

seller type. That is, if for a given history h ∈ Ht and price offer p ∈ R:4

µt+1(〈ht = h, (pt = p, at = 1, yt = 1)〉) 6= µt+1(〈ht = h, (pt = p, at = 1, yt = 0)〉). (5)2Note that these two refinements impose additional requirements over (3) because together, they provide a way

to update the belief even after histories h with P(ht = h) = 0 in equilibrium and, in particular, after a price offerp ∈ R that occurs with probability 0. Formally we require, for a, y ∈ {0, 1},

µt+1(〈h, (p, a, y)〉) =µt(h)P(at = a|θ = H,ht = h, pt = p)P(yt = y|at = a, θ = H)

µt(h)P(at = a|θ = H,ht = h, pt = p)P(yt = y|at = a, θ = H) + (1− µt(h))P(at = a|θ = L, ht = h, pt = p)P(yt = y|at = a, θ = L).

For further details see related discussion in Fudenberg and Tirole (1991), §8.2.3.3While equilibria are tractable even without it, this refinement simplifies analysis and leads to uniqueness of the

buyer’s equilibrium expected payoff as a function of her belief in each period (see Proposition 5), which allows us tofurther study the impact of the different sources of information on the design of effective learning strategies. Moreover,this refinement is equivalent to the introduction of a ‘commitment type’ in Schmidt (1993). Equilibria eliminated bythis assumption are substantially the same as the ones we characterize but have slight differences in the last periods.

4Note that experimental learning is defined through the buyer’s belief system. While it is defined for any history,we focus on cases where inequality (5) takes place on the equilibrium path.

13

2.2 Discussion of Model Assumptions

Exogenous quality. In our model, we focus on dynamics that are induced for a given prior γ.

Notably, this prior can be endogenized to reflect a private quality/cost decision by the seller at the

beginning of the horizon. Then, the seller’s resulting mixed strategy in equilibrium induces the

buyer’s prior γ; for more details see analysis and discussion in Appendix A.1.

PBE and Commitment. Our focus in this work is on studying the buyer’s strategy when she

integrates multiple sources of information. With this goal, we focus on a setting where the buyer

and the seller are sequentially rational; that is, in each period, they use all of the information that

they have gathered so far in order to take actions that maximize their expected payoff throughout

the present and future periods. We formalize this by using PBE as our equilibrium concept.

Note that because of the sequential rationality, in equilibrium, the buyer cannot credibly commit

to future actions as she will exploit all information about the seller’s type revealed so far; see,

the related discussion on the ratchet effect in Laffont and Tirole (1993). With full commitment to

prices and/or purchase decisions as a function of past quality, the buyer’s optimal contract could be

determined as the solution to a static mechanism design problem along the lines of Myerson (1981).

This mechanism would, broadly speaking, separate types more effectively and would increase the

surplus the buyer obtains from the relationship.

Quality not contractible. We focus on a class of mechanisms where, in each period, the payment

cannot depend on the realized quality in that period. This corresponds to many practical settings,

such as the provision of services and other complex products where verification of quality by a third

party is impossible or costly. One may observe that when quality is contractible, the buyer can

capture the first-best surplus for any number of periods by using quality-contingent payments.

Stochastic quality. In assuming that quality is stochastic and stationary our framework is better

suited to capture settings where quality is driven by intrinsic attributes such as technology, knowl-

edge, infrastructure, and supply chain, which are not modifiable within the scope of the interaction

between the buyer and the seller (as opposed to, say, effort in settings with moral hazard).

Two, one-dimensional seller types. Our focus on two types enables us to simplify the analysis

and study the impact of different forms of learning in equilibrium more effectively. Nevertheless, our

approach can be adapted to settings with more than two types; see discussion in §5.1. Moreover,

by assuming that cost and quality are perfectly correlated as we do, the seller’s private information

14

reduces to one dimension. This simplifies the analysis as contracting with multi-dimensional private

information is notoriously difficult; see §5 for a discussion of generalizing to two-dimensional types.

High type is more efficient. We focus on the case where a high-cost seller is more efficient than

a low-cost one (qH − cH > qL − cL), but our analysis easily extends to other scenarios. When a

low-cost seller is more efficient (qL−cL > qH−cH), there are three cases to consider. If λ > qL−cL,

the buyer always uses her outside option. If qL− cL > λ > qH − cH , then the equilibrium is trivial:

the buyer offers cL in every period, a low-type (high-type) seller always accepts (rejects) this offer.

If qH − cH > λ the dynamics are very similar to those characterized in Theorem 2.

Buyer makes price offers. We focus on the class of mechanisms where the buyer learns through

making price offers. As demonstrated in our freelance economy example, there are several markets

where buyers propose prices. Moreover, this approach, where the principal, who does not have pri-

vate information, makes price offers each period while the agent with private information, responds

to these price offers, is a standard modeling approach in the contracting literature; see, for exam-

ple, Hart and Tirole (1988) and Acharya and Ortner (2017). This approach is in contrast to the

signaling literature that dates back to Spence (1973), where the player holding private information

typically takes the first action, which would result in a different equilibrium analysis.

3 Equilibrium Analysis and the Impact of Strategic Information

One of the objectives of this paper is to study the effect that the presence of strategic sellers may

have on the buyer’s optimal strategy relative to more traditional learning dynamics that only lever-

age stochastic observations. Therefore, before analyzing equilibria in our integrated learning model,

we consider a simpler benchmark model where the buyer can only learn from quality observations

(the seller’s decisions themselves are not informative). In this experimental-only benchmark the

design of her optimal policy is driven by the well-known exploration-exploitation tradeoff.

3.1 An Experimental-Only Benchmark

A benchmark in which only experimental learning occurs requires that the seller’s acceptance

decisions do not carry new information to the buyer, so equation (4) always holds as equality. That

is, for any period t, history h ∈ Ht, and price p ∈ R, one has P(at = 1|θ = L, ht = h, pt = p) =

15

P(at = 1|θ = H, at = h, pt = p), and the mappings α1, . . . , αT are independent of the type θ. While

there are various assumptions one could make so that the seller’s actions confer no information to

the buyer, for simplicity, we assume that the seller accepts price offers greater than cH and rejects

those less than cH .5 One may observe that this strategy is followed in equilibrium in a special case

where cL = cH so that the seller’s cost is publicly known but quality remains private.

One may observe that in this case the buyer never offers a price greater than cH and every price

offer less than cH leads to a rejection. Notice that this implies our assumption on the seller strategy

results in the same outcome as simply assuming that there is an exogenously set price (similar to

Tomlin (2009)) equal to cH . Therefore, the buyer’s problem of maximizing the total expected payoff

over the horizon is equivalent to a one-armed bandit problem with a known retirement option, where

in each period pulling the unknown arm corresponds to offering a price equal to cH , and using the

retirement option generates λ, the value of the buyer’s outside option. In this setting, it has been

shown that the decision-maker’s optimal policy can be implemented using Gittins Indices (see, e.g.,

Berry and Fristedt (1985)). The policy specifies the action which has the largest Gittins Index,

where the index is defined as (see, e.g., Corollary 5.3.2 in Berry and Fristedt (1985)):

Λt(µt) = maxt+1≤τ≤T+1

Ey(∑τ−1

t′=t(yt′ − cH)∣∣∣µt)

Ey (τ − t|µt),

and where τ is a stopping time measurable with respect to quality realizations in periods where

cH is offered. Moreover, by Corollary 5.3.4 of Berry and Fristedt (1985), one may observe that

Λt is increasing in µt and therefore the buyer’s optimal policy can be expressed as a sequence of

thresholds on her belief. We denote this optimal sequence of thresholds by {µEt : t = 1, . . . , T}.

3.2 Integrated Learning Equilibria

Having introduced the experimental-only benchmark, we next characterize the equilibrium strate-

gies in the integrated learning framework that was advanced in §2 and compare these strategies

to those obtained in the experimental-only benchmark. Based on the value of the buyer’s outside

option, λ, one may separate the parameter space into different regions as depicted in Figure 4.5This modeling choice is sensible from the seller’s standpoint, and grants the tractability that allows one to identify

the buyer’s optimal strategy as a Gittins index policy. This helps elucidate how optimal thresholds in this benchmarkbalance exploration and exploitation in a manner that relates to many learning problems that have been studied inthe literature on sequential decision making.

16

0 qL − cH qL − cL

λ

qH − cH

Moderate Outside OptionBad Outside Option

Figure 4: Parametric Regions

When λ > qH − cH , the buyer uses her outside option in every period. Thus, we have two regions

of interest characterized by the buyer’s outside option, which we refer to as “moderate” or “bad.”

3.2.1 Equilibria with a Moderate Outside Option

When qH − cH > λ > qL − cL, the buyer’s outside option is less efficient than a high-type seller

but more efficient than a low-type seller; therefore, the buyer is only interested in purchasing from

the seller if he is a high type. Since the seller’s type is unknown, the buyer aims to learn about it

in order to determine whether to purchase from the seller in future periods. The following result

characterizes the equilibria with a moderate outside option.

Theorem 1 (Equilibria with a Moderate Outside Option). Suppose that qH − cH > λ > qL − cL.

The set of equilibria is non-empty. Moreover, there exists a sequence of time-dependant thresholds

0 < µ∗1 < · · · < µ∗T < 1 such that in any equilibrium (ρ,α,µ), for any time period t = 1, ..., T and

history h ∈ Ht:

1. If µt(h) > µ∗t , then with probability 1, pt = cH and the seller accepts.

2. If µt(h) < µ∗t , then with probability 1, the buyer offers a price pt < cL6 that is rejected by both

types with probability 1. Furthermore, at each period t′ > t, pt′ < cL with probability 1 and

the seller rejects with probability 1.

The equilibria characterized by Theorem 1 are unique7 up to the presence of two minor phenomena

which cause a multiplicity of equilibria. First, at histories where µt(h) = µ∗t , the buyer is indifferent

between offering cH and a price that is always rejected, so the buyer’s equilibrium strategy can mix

between these prices. Second, the price that the buyer offers to induce a rejection by the seller is

not unique; the buyer’s strategy can specify offering any price less than cL and achieve the same6Since a low-type seller is indifferent between accepting and rejecting p = cL, there are also equilibria in which

the buyer offers p = cL and the seller rejects with probability one. We omitted these equilibria for clarity.7More precisely, essentially unique. That is, unique up to differences that occur with probability zero.

17

equilibrium outcome. Note that the former phenomenon can lead to different equilibrium outcomes

but these occur with probability zero for almost all γ, and that the latter phenomenon has no

effect on equilibrium outcomes. Moreover, we show in Proposition 5 that the buyer’s expected

continuation payoff is uniquely determined by the periods remaining and the buyer’s belief.

The resulting equilibria dynamics with a moderate outside option can be summarized as follows.

(See Figure 1b for an example of a belief trajectory.) As long as the buyer assigns sufficiently high

probability to the seller being a high type, the buyer offers a high price, the seller accepts, the good

is delivered, and the quality is realized; thus only experimental learning takes place. When this

probability falls bellow the threshold, the buyer offers a low price that is always rejected by the

seller (thus, no contractual learning occurs), so there is no purchase (thus, no experimental learning

occurs); the buyer’s belief is not updated and, as thresholds are time-increasing, her belief will be

below the threshold in the next period as well and the same dynamic will take place. Therefore,

once her belief falls below the threshold, the buyer “retires” to use her outside option.

The proof of Theorem 1 is deferred to Appendix B (together with the proofs for all subsequent

results in the paper). The key idea in the proof lies in leveraging the sequential rationality that is

required from both the buyer and the seller in an equilibrium in order to prove the theorem through

backward induction over time periods (and within periods). In each period, we first characterize

the seller’s best response for every history and every price offer.8 Then, fixing the seller’s mapping

from histories to distributions over acceptance decisions, we characterize the buyer’s best response

in that period. We establish that, in equilibrium, the buyer offers either cH (which is accepted),

or a price p ≤ cL (which is rejected). While the buyer’s payoff from offering a rejected price is

λ, we establish that in equilibrium, in any period t and history h ∈ Ht, the expected payoff from

offering cH is increasing in the buyer’s current belief µt(h) and is equal across all history sample

paths that are consistent with this belief. This establishes that a threshold policy is optimal in

any period t. Finally, we show that µ∗t < µ∗t+1 by establishing that if offering cH is optimal for the

buyer in period t+ 1 at belief µ ∈ [0, 1], then it is also optimal in period t with the same belief µ.

Theorem 1 describes a fully pooling equilibrium in which both seller types take the same action at

every price offer made in equilibrium. Moreover, one may observe that at every equilibrium price

offer, this strategy coincides with the seller decision process that is described in the experimental-

only benchmark in §3.1. Based on this observation, the next result formalizes that, with a moderate8This reasoning is possible due to our equilibrium refinements that allow us to reason about histories and price

offers that occur with probability zero.

18

outside option, the buyer’s equilibrium strategy coincides with the Gittins Index policy that is

optimal in the experimental-only benchmark.

Proposition 1 (Buyer Equilibrium Strategy as Gittins Index Policy). Suppose that λ ≥ qL − cL.

In any integrated learning equilibrium (ρ, α, µ), if Λt(µt(ht)) > λ then with probability one the

buyer offers cH and the seller accepts, and if Λt(µt(ht)) < λ then with probability one the buyer

offers a price that the seller rejects.

Proposition 1 establishes that, with a moderate outside option, Λt(µ∗t ) = λ. That is, when the

buyer’s belief is equal to the equilibrium threshold, the Gittins Index of the unknown arm is

exactly equal to the known retirement option, so in an integrated learning equilibrium the buyer is

indifferent between offering cH and a price that is rejected with certainty. The buyer can therefore

use Gittins indices to determine her equilibrium strategy.

Note that the result established in Proposition 1 is not clear a priori. The buyer’s strategy space

in the integrated learning model is much larger compared to the one in a single-armed bandit

problem with retirement, where in each period there are only two possible actions. Thus, proving

Proposition 1 requires first showing, as we do in Theorem 1, that the buyer’s equilibrium price

offers take, effectively, only two values and that the seller’s acceptance decision at these prices is

independent of his type. Having established these, we prove Proposition 1 by inductively showing

the equivalence of the buyer’s equilibrium value function and her value function in the respective

bandit problem, and that the optimal action is the same at every belief and period.

So far we have seen that when the outside option is more efficient than a low-type seller but less

efficient than a high-type seller, seller types are pooled in equilibrium, only experimental learning

takes place, and the buyer’s equilibrium strategy coincides with a Gittins Index policy. As we will

see next, this outcome does not extend to cases where the outside option is less efficient than a

low-type seller, where contractual learning can occur in equilibrium and the retirement value (of

offering cL) is not fixed but rather depends on the buyer’s belief.

3.2.2 Equilibria with a Bad Outside Option

When λ < qL − cL < qH − cH , it is common knowledge that the seller is more efficient than the

outside option. This region can be separated into two sub-regions differentiated by the buyer’s

incentive to purchase from the seller. If λ < qL − cH , the buyer would be better off paying cH and

19

purchasing from the seller (regardless of the seller’s type), rather than using her outside option.

However, if qL − cH < λ < qH − cH , then purchasing from a low-type seller is preferable to the

buyer’s outside option only for a sufficiently low price. While this distinction will play a role later

on, the next theorem establishes that the equilibrium structure is the same across these sub-regions.

Theorem 2 (Equilibria with Bad Outside Option). Suppose that qH − cH > qL − cL > λ. The set

of equilibria is non-empty. There exists a sequence of time-dependant thresholds 0 < µ∗1 < · · · <

µ∗T < 1 such that in any equilibrium (ρ,α,µ), for any time period t = 1, ..., T and history h ∈ Ht:

• If µt(h) > µ∗t , then with probability 1, pt = cH and the seller accepts.

• If µt(h) < µ∗t , then with probability 1, pt = cL, which a high-type seller rejects with probability

1 and a low-type seller accepts with probability(

µ∗t+1−µt(h)(1−µt(h))µ∗t+1

)where µ∗T+1 = 1. If µt(h) = µ∗t

and pt−1 = cL, then pt = cL with probability 1.

Again, the equilibria characterized by Theorem 2 are essentially unique. In this case, there exists

a multiplicity of equilibria for only one reason. At histories where µt(h) = µ∗t and pt−1 = cH , the

buyer is indifferent between offering cH and offering cL, so the buyer’s equilibrium strategy can

specify mixing between these prices with any probability. While we do not have uniqueness of the

buyer’s strategy in equilibrium, we show in Proposition 5 that the buyer’s expected continuation

payoff is uniquely determined by the periods remaining and the buyer’s belief.

The dynamics of the equilibria with a bad outside option can be summarized as follows. (See

Figure 1a for an example of a belief trajectory.) As long as the buyer assigns sufficiently high

probability to the seller being a high type (i.e., her belief is above the threshold), the dynamics are

identical to those in the moderate outside option regime. That is, the buyer offers a high price, the

seller accepts regardless of his type, and the buyer engages only in experimental learning based on

the realized quality. However, once the buyer assigns sufficiently low probability to the seller being

a high type (i.e. her belief falls below the threshold), the dynamics are fundamentally different.

In this regime, the buyer offers p = cL, which a low-type seller accepts with positive probability.

Therefore, by offering cL, the buyer engages in contractual learning and her belief is updated based

on the seller’s response. If the seller accepts this offer, the buyer’s belief updates to µt+1 = 0

reflecting that the seller must be a low-type, and if the seller rejects, by applying Bayes’ rule, the

buyer’s belief updates to µ∗t+1. Even though the buyer is indifferent at this belief, an equilibrium

only exists if she continues to offer cL; if future price offers may be greater, a low-type seller would

20

deviate from her strategy and reject low price offers with certainty to have the chance of receiving

a higher offer later on. Therefore, once the buyer offers a low price, she never again offers a high

price. Moreover, one may observe based on the acceptance probability that once a low-type seller

accepts an offer of cL, he accepts such offers in all subsequent periods with certainty, as he has no

incentive to conceal his type anymore.

The proof of Theorem 2 is similar to the proof of Theorem 1. The main differences arise from the

fact that, since qL − cL > λ, the buyer prefers to purchase from a low-type seller at a price that is

low enough rather than use her outside option. Therefore, the proof requires showing that a low-

type seller accepts low price offers with positive probability, and characterizing these probabilities.

We first characterize the low-type seller response to a price offer p ∈ (cL, cH) and show that, at

low beliefs, a low-type seller must use a mixed strategy such that, after a rejection, the buyer

is indifferent between offering cL and cH in the next period. This implies that the acceptance

probability must be such that the buyer’s belief updates to µ∗t+1 upon rejection, and thus is the

same for any price p ∈ (cL, cH). As a low-type seller is indifferent between accepting and rejecting

cL, through a Bertrand-style argument we establish that, in equilibrium, the low-type seller accepts

offers of cL with same, positive probability that she accepts p ∈ (cL, cH). Thus, lower price offers

that are rejected with probability one are dominated by cL and greater prices that are accepted

with the same probability are also dominated by cL. Hence, the buyer either offers cL or cH . We

note that rejecting cL with probability one is not a profitable deviation for a low-type seller; this

rejection does not improve payoff in the current period and does not sufficiently increase the buyer’s

belief to induce her to offer a high price in any future period.

Recall Figure 2 that depicts the integrated learning equilibrium thresholds along with those obtained

in the experimental-only benchmark for a bad outside option. One may observe that the buyer’s

integrated learning equilibrium strategy no longer coincides with a Gittins Index policy. We note

that the dynamics illustrated in Figure 2 are consistent across all combinations of parametric values

in the respective regions (for summary statistics of simulations see Appendix §A.2). Comparing

the integrated learning equilibrium thresholds with the optimal thresholds in the experimental-only

benchmark, we identify two potential effects that strategic seller decisions may have on the learning

dynamics of the buyer, and particularly, on her incentive to offer cH .

Upward Effect: In contrast with the experimental-only benchmark, in the integrated

learning model the value of offering cL may be greater than λ because a low-type seller

21

accepts this offer with positive probability, which generates a greater expected payoff

whenever qL − cL > λ. This effect puts “upward pressure” on the thresholds.

Downward Effect: Relative to the experimental-only benchmark, the value of offering

cH may be greater because observing a failure reduces the buyer’s belief and makes a

low-type seller more likely to accept a low price offer in future periods. This effect puts

“downward pressure” on the thresholds.

From Figure 2, it is clear that the impact of these effects on equilibrium thresholds is time depen-

dent. With only few periods remaining, the upward effect dominates (integrated learning thresholds

are higher than experimental-only thresholds). With many periods to go, however, the downward

effect dominates because observing a (costly) failure in the current period increases the buyer’s

expected payoff in future periods because the belief that the seller is a high type decreases, and

conditional on being a low type, the seller is more likely to accept a low price offer. For the

downward effect to outweigh the upward effect the increase in potential payoff due to the increased

acceptance probability must make up for the cost of observing a failure in the current period. Thus,

there must be sufficiently many periods left for this to be possible. The relative strength of these

two effects also depends on the value of λ. For fixed qH , qL, cH , cL, as λ decreases to qL − cH , the

upward effect dominates for more periods.

Moreover, when λ < qL − cH (the outside option is very bad), one has µ∗t > µEt for all t. Recall

that when qL − cH < λ purchasing from a low-type seller is desirable only for a sufficiently low

price, but when λ < qL − cH , the buyer is better off paying cH and purchasing from the seller

regardless of the seller’s type. From Theorem 2, one has that µ∗t > 0 for all t. However, as the seller

never accepts prices below cH in the experimental-only benchmark, µEt = 0 for all t. That is, when

λ < qL − cH , in the experimental-only benchmark the buyer offers cH in every period regardless of

her belief because even if the buyer knows the seller is a low type (i.e. µt = 0), the buyer prefers to

purchase from the seller at price cH and obtain expected payoff qL− cH rather than use her outside

option and obtain a payoff λ.

3.3 Discussion

Figure 5 summarizes similarities and distinctions (emphasized) between equilibria across regions.

The buyer’s equilibrium strategy in both regions can be summarized as a sequence of thresholds on

22

0 qL − cH qL − cL

λ

qH − cH

Moderate Outside OptionBad Outside Option

(A) Equilibria characterized through atime-increasing sequence of thresholds onthe buyer’s belief. At each period:

– If the buyer’s belief is above thethreshold, she offers cH and bothseller types accept.

– If the buyer’s belief is below thethreshold, she offers cL and a lowtype accepts w.p. > 0.

(B) If the buyer offers pt = cL, she offerspt′ = cL for all t′ > t.

(C) Both experimental and contractuallearning are possible.

(A) Equilibria characterized through atime-increasing sequence of thresholds onthe buyer’s belief. At each period:

– If the buyer’s belief is above thethreshold, she offers cH and bothseller types accept.

– If the buyer’s belief is below thethreshold, she offers a price that isrejected by the seller w.p. one.

(B) If the buyer offers a price that is rejectedw.p. 1 at t, she offers prices that arerejected w.p. one for all t′ > t.

(C) Only experimental learning is possible.

Figure 5: Comparing Equilibria Structure

the buyer’s belief. The dynamics in the two regions coincide when the the buyer’s belief is above

the threshold: the buyer offers a high price, the seller accepts and delivers the good, and the buyer

engages in experimental learning based on realized quality. When the buyer’s belief falls below the

threshold, she offers a low price in both regions and, moreover, she never again offers a high price,

but critical aspects of the dynamics remain distinct. With a moderate outside option, the buyer

offers a price that is rejected with probability one, and learning stops forever as she retires to her

outside option. With a bad outside option, the buyer offers cL, which a low-type seller accepts with

positive probability and the buyer engages in contractual learning. Therefore, with a moderate

outside option the buyer engages in experimental learning only; by contrast, with a bad outside

option both types of learning occur.

A notable similarity across the regions is that the buyer’s equilibrium thresholds are increasing in

t. The most natural reason for this, which applies to both regions, is that the value of information

decreases with fewer periods remaining. This reason is why this feature is typical in problems of

sequential decision making with an exploration/exploitation tradeoff. When the outside option is

bad, however, thresholds also reflect the strategic nature of our problem, as equilibrium thresholds

also determine the incentives that a low-type seller has to accept low price offers. In fact, an

23

important property of the low-type seller’s acceptance probability in response to cL is that, because

a rejection causes the buyer’s belief to update to µ∗t+1, the acceptance probability is decreasing in

the current belief and increasing in the next period’s threshold (see Theorem 2). Therefore, an

increasing sequence of thresholds also sets incentives for a low-type seller to accept a low price offer.

Overall, in the presence of both stochastic and strategic information, the buyer not only faces a

tradeoff between exploration and exploitation but also needs to decide how to explore as information

can be gathered from observing quality as well as from separating seller types, and the structure

of equilibrium thresholds reflects this broad set of considerations.

4 The Impact of (Present and Future) Stochastic Information

In the previous section, we studied how the equilibrium dynamics that emerge in the integrated

learning model are affected by the interplay between experimental learning and contractual learning.

More specifically, we focused on the effects of introducing strategic sellers in the learning dynamics

of a buyer. In this section, we continue to study the interactions between stochastic and strategic

information from a different viewpoint that focuses on the effect that access to quality observations

may have on strategic interactions between the buyer and the seller. In order to do so, we introduce

a contractual-only benchmark model in which the seller’s strategic decisions might be informative,

but the buyer does not have real-time access to quality realizations. The main result of this section

establishes that access to quality observations may in fact be detrimental to the buyer, and that

when the buyer’s belief is sufficiently low, she may prefer to remove her ability to observe quality

realizations in subsequent periods.

4.1 A Contractual-Only Benchmark

To develop a natural benchmark in which the buyer can only use contractual learning, we adjust the

model of §2 so that both the buyer and the seller do not observe the realized qualities. That is, while

quality still impacts the buyer’s payoff in the same way as defined in §2, the observations {yt}t−1t′=1

are no longer included in the public history, and strategies cannot be based on them. Formally,

we define the set of histories without the quality observations as HCt = {γ} × (R× {0, 1})t−1, and

define the buyer’s belief system in an equilibrium as µ = {µt}Tt=1 where µt : HCt → [0, 1].

24

In the contractual-only benchmark, the buyer’s belief is not affected by quality realizations and she

cannot engage in experimental learning (in the sense formalized by equation (5)). This benchmark

captures instances where it is impossible or too costly for the buyer to evaluate quality in each

period, or where it takes a long time for quality to be realized (and the buyer receives her payoff

only at the end of the horizon). The dynamics are similar to the ones in the non-commitment,

rental model of Hart and Tirole (1988). Similarly to what we established in §3 for the integrated

learning model, equilibria are characterized by a sequence of time-dependent thresholds on the

buyer’s belief.

Proposition 2 (Contractual-Only Equilibria with Moderate Outside Option: No Learning). Sup-

pose that qH−cH > λ > qL−cL. The set of equilibria is non-empty. There exists a single threshold

µC = cH+λ−qLqH−qL such that in any contractual-only equilibrium (ρ,α,µ):

1. If γ > µC , then for all t, with probability one, the buyer offers pt = cH and both types accept.

2. If γ < µC , then for all t, with probability one, the buyer offers a price that both types reject.

Proposition 3 (Contractual-Only Equilibria with Bad Outside Option). Suppose that qL−cL > λ.

The set of equilibria is non-empty. There exists a sequence of time-dependant thresholds(cH+λ−qLqH−qL

)+<

µC1 < · · · < µCT =(

cH−cLqH−λ−cL

)such that in any contractual-only equilibrium (ρ,α,µ):

1. (No Learning) If µt(ht) > µCt , then with probability one the buyer offers pt = cH and both

types accept.

2. (Contractual Learning) If µt(ht) < µCt , then with probability one the buyer offers pt = cL,

which the high type rejects with probability one and the low type accepts with probabilityµCt+1−µt

(1−µt)µCt+1where µCT+1 := 1. Moreover, if µt(ht) = µCt and pt−1 = cL, the buyer offers pt = cL.

When the buyer has a moderate outside option, she either offers cH , which the seller accepts with

probability one, or a low price that the seller rejects with probability one. In equilibrium, the buyer

does not learn and actions do not change across periods.9

When the buyer has a bad outside option, she can benefit from offering a price that only a low-type

seller accepts with positive probability. Therefore, the buyer balances the expected payoff from

offering cH , which is (qH − qL)µt + qL − cH , with the expected payoff from offering cL, which is

determined by the acceptance probability of a low-type seller. Notably, as in the integrated learning9If γ = µC , then the buyer can make either offer in each period, and therefore there is not a unique equilibrium.

25

model, this probability also depends on the current belief and next period’s threshold, which further

showcases the importance of the thresholds for the buyer as tools to incentivize a low-type to accept

cL. The buyer never offers cH at a belief less than(cH+λ−qLqH−qL

)+because the expected payoff in

the current period is less than λ, and there is no additional value from experimentation since

quality is not observable. Equilibrium thresholds are decreasing in the number of periods to go and

asymptotically approach this limit threshold as the number of periods to go increases (for more

details see Lemma 7, Appendix B).

15 10 5 0

0

0.2

0.4

0.6

qH = .6, qL = .3, cH = .2, cL = 0, λ = .2

IntegratedLearning

Contractual Only

ExperimentalOnly

Periods to Go

Thr

esho

ld

(a) Bad Outside Option

15 10 5 00.2

0.4

0.6

0.8

Integrated Learning≡

Experimental Only

Contractual Only

qH = .6, qL = .3, cH = .2, cL = 0, λ = .35

Periods to Go

Thr

esho

ld

(b) Moderate Outside Option

Figure 6: Comparison of Thresholds

Figure 6 depicts the integrated learning equilibrium thresholds, together with those of the contractual-

only and experimental-only benchmarks for a moderate (λ > qL−cL) and bad (λ < qL−cL) outside

option. As Proposition 2 states, with a moderate outside option a contractual-only equilibrium uses

the optimal static threshold T times. There is no learning in the resulting equilibrium; by definition

there is no experimental learning, but also equation (4) always holds at equality. One may observe

that for both moderate and bad outside options the integrated learning thresholds are lower than

or equal to the contractual-only thresholds. This observation is formalized by the next result.

Proposition 4. Fix qH , qL, cH , cL, and T . For all λ, one has µ∗T = µCT and µ∗t ≤ µCt for all t < T .

The main intuition behind this result is that the ability to observe quality realizations increases

the buyer’s value from offering cH , as now observing quality realization allows her to update her

belief, and therefore she offers cH for a larger interval of beliefs. With zero periods remaining, the

thresholds are equal because experimental learning is not valuable anymore.

26

4.2 Payoff Comparison

We next compare the buyer’s payoff in an integrated learning equilibrium with her payoff in the

experimental-only and contractual-only benchmarks. Recall from §3 that, for a fixed set of primi-

tives, an equilibrium always exists; however, it need not be unique. Therefore, in principle, different

equilibria might yield different payoffs for the buyer. The next proposition establishes that this is

not the case and that, moreover, payoffs can be expressed as a function of only the buyer’s belief.

Proposition 5 (Unique Payoffs in Equilibrium). Fix qH , qL, cH , cL, λ, and T . There exists a se-

quence of functions {U∗t : t = 1, ..., T}, such that for every integrated learning equilibrium (ρ,α,µ),

and for every period t and history h ∈ Ht, one has U∗t : [0, 1] → R and Uρ,α,µt (h) = U∗t (µt(h)).

Similarly, there exists a sequence of functions{UCt : t = 1, ..., T

}such that for every contractual-

only equilibrium (ρ,α,µ), and for every period t and history h ∈ HCt , one has UCt : [0, 1]→ R and

Uρ,α,µt (h) = UCt (µt(h)).

0 0.2 0.4 0.6 0.8 10.1

0.2

0.3

0.4

Experimental Only

Integrated Learning

Contractual Onlyµ∗1µE1 µC1

qH = .6, qL = .3, cH = .2, cL = 0, λ = .2,

γ

ExpectedPer-Period

BuyerPayoff

(a) With Bad Outside Option and λ > qL − cH

0 0.2 0.4 0.6 0.8 10.3

0.35

0.4

0.45

Integrated Learning≡

Experimental Only

Contractual Only

µ∗1 = µE1 µC1

qH = .6, qL = .3, cH = .2, cL = 0, λ = .35

γ

(b) With Moderate Outside Option

Figure 7: Comparison of Buyer Payoffs (T = 20)

The functions U∗t and UCt are formally defined in Appendix B.8, where an extended version of

this proposition is stated and proved. Figure 7b demonstrates a comparison of the buyer’s per-

period payoffs as a function of γ, for a moderate outside option (λ > qL − cL). First, the figure

demonstrates that strategic information adds no additional value to the buyer over the value that

is already captured by experimental learning alone. In the contractual-only benchmark, the buyer’s

per-period equilibrium payoff for a given prior γ equals to the buyer’s payoff in a one-period static

model: the pointwise maximum between the outside option λ and (qH − qL)γ + qL − cH , the

payoff obtained by buying from the seller at price cH . The kink in the payoff occurs at µC , which

27

is characterized by Proposition 2. Figure 7b also illustrates that the buyer’s integrated learning

equilibrium payoff is strictly greater than the contractual-only equilibrium payoff for an intermediate

interval of beliefs. When γ ≤ µ∗1, offering a price that is rejected with certainty is optimal in all

three models, and when γ ≈ 1 offering cH with certainty in all remaining periods is optimal in all

three models (regardless of {yt′}Tt′=t). In that sense, in this region, access to stochastic information

can only increase the payoff of the buyer when she interacts with a strategic seller.

Figure 7a depicts a comparison of the buyer’s per-period payoffs with a bad outside option. Com-

pared to the experimental-only benchmark, integrated learning leads to a higher payoff at all beliefs

(this holds in all periods). Indeed, access to strategic information does not harm the buyer’s ability

to engage in experimental learning as the buyer can always achieve the experimental-only payoff

by limiting herself to offering either cH or prices that are rejected. Interestingly, Figure 7a reveals

that with an inefficient outside option, at an interval of low beliefs, access to stochastic information

can decrease the payoff of a buyer that interacts with a strategic seller. This implies that, when the

buyer’s belief is low enough, she would be better off giving up her access to quality realizations (so

she is unable to use experimental learning). We next formalize this observation.

Theorem 3 (The Benefit of Future Ignorance).

1. Suppose that the outside option is moderate, that is, λ > qL− cL. Then, for all µ ∈ [0, 1] and

t = 1, ..., T , U∗t (µ) ≥ UCt (µ).

2. Suppose that the outside option is bad, that is, qL − cL > λ. Then, if

qHqL

>qL − λ− cLcH − cL

, (6)

then there exists T ∈ N such that for any T and t such that 1 ≤ t < T − T one has µ∗t < µCt ,

and there exists µt > µ∗t such that U∗t (µ) < UCt (µ) for all µ ∈ (0, µt). However, when

qHqL≤ qL − λ− cL

cH − cL, (7)

then µCt = µ∗t for all t = 1, . . . , T , and for all µ ∈ [0, 1], one has U∗t (µ) ≥ UCt (µ).

Theorem 3 characterizes the relationship between the buyer’s expected payoff in the integrated

learning model and in the contractual-only benchmark. With a moderate outside option, observing

stochastic quality is always valuable, which is consistent with our result that the buyer relies entirely

28

on experimental learning in this region. However, with a bad outside option, the ordering of payoffs

depends on the number of periods to go and the buyer’s belief. If the condition in (6) is satisfied,

then with sufficiently many periods to go, there always exists a region of low beliefs for which the

buyer’s expected payoff is greater if she cannot make quality observations. The required number of

periods to go T is typically small, and often equals one (see Appendix A.2 for more details). If, on

the other hand, the condition in (7) is satisfied, then the buyer’s expected payoff in the integrated

learning model is always greater.

We can interpret condition (6) as follows. The left-hand side equals the ratio of the probability

that a high-type seller provides high-quality to the probability that a low-type seller does; hence,

it can be viewed as a measure of how informative a quality observation can be, and therefore how

valuable experimental learning is. In turn, the right-hand side is the value captured by the buyer

when a low-type seller accepts cL (which happens only under contractual learning), normalized by

the price difference. Intuitively, contractual learning is effective when the buyer has less incentive

to purchase from a low-type seller at a high price (which occurs when cH is substantially larger

than cL), or when the low-type seller provides little additional value relative to the outside option

(that is, when qL− cL−λ is small). Thus, the condition in (7) holds when the value of contractual

learning is high relative to the value of experimental learning (at the threshold belief).

The intuition for how access to quality outcomes decreases the buyer’s expected payoff is subtle

because in any period that the seller delivers the good, the buyer’s expected payoff increases with

access to realized quality. That is, the stochastic information is, in general, valuable in every

period and at every belief. However, the value that is created by experimental learning in a future

period t decreases the buyer’s payoff in preceding periods through its effect on contractual learning

in those periods. Recall from §3.3 that the threshold µ∗t also determines the incentives for a low-

type seller to accept low price offers. Therefore, the value that experimental learning provides to

the buyer in a future period t lowers the threshold relative to the contractual-only threshold (recall

Figure 6 and Proposition 4), which reduces the likelihood that a low-type seller accepts an offer

of cL in preceding periods and, consequently, the buyer’s expected payoff in those periods as well.

Thus, the additional value of the contractual-only benchmark stems from the buyer’s inability to

access quality observations in future periods, which increases her expected continuation value in

the present period by strengthening her ability to engage in contractual learning.10

10Moreover, in Appendix A.3 we establish that committing to not observe the quality in the current period doesnot affect equilibrium outcomes; the buyer can only gain by committing to not observe quality in future periods.

29

Interestingly, when condition (6) holds, not observing quality outcomes is preferable at an interval

of beliefs below and above µ∗t for all t < T − T . For beliefs µ ∈ (0, µ∗t ), as long as µ∗t+1 < µCt+1 the

buyer offers a low price with or without access to quality observations, and this offer is accepted

with higher probability when the buyer cannot access quality outcomes, which generates higher

expected payoffs for the buyer. For beliefs µ ∈ (µ∗t , µt), the buyer offers a high price in the

integrated learning equilibrium and her payoff increases without access to quality observations. In

most of these situations she would offer cL without access to quality observations, but in some rare

instances, µt > µCt and the buyer offers cH in both models.11

5 Concluding Remarks and Extensions

In this paper, we propose a model of buyer learning throughout a repeated relationship with a

seller when quality realizations are stochastic and not contractible, and the buyer is unable to

commit to long-term contracts. Our model integrates two forms of learning: by observing the

strategic actions taken by the seller, and by observing the stochastic quality delivered by the seller

whenever a purchase occurs. We find that, in equilibrium, the buyer explores through high price

offers that facilitate quality experimentations at early stages of the interaction, and after gathering

enough information (and if her belief is sufficiently low), she advances to offering low prices that

may partially separate seller types. We compare the buyer’s equilibrium strategy to benchmark

models designed to learn from only one form of information. We observe that the ability to separate

types through price offers may impact the buyer’s optimal strategy in two opposite ways, but never

decreases the buyer’s expected payoff and strictly increases it when the buyer has a bad outside

option. On the other hand, we observe that access to stochastic information can decrease the payoff

of a buyer that interacts with a strategic seller, and characterize when this happens.

In our model, we assumed that the prior γ is given exogenously. Our results can be easily extended

to the case where γ is determined strategically by the seller at time t = 0; this allows to model a

setting in which a seller may choose to, for example, invest in a better technology prior to interacting

with the buyer. In this extension, period t = 0 is comprised by two stages. First the seller selects

γ, which is then observed publicly. Second, the type of the seller is drawn according to a Bernoulli11In 96% of over 100,000 simulations that we performed over a broad parametric region we observe that in every

period, at beliefs where the buyer offers cH in both integrated learning and contractual-only equilibria, her expectedpayoff is higher in the integrated learning equilibrium; for more details see summary statistics in Appendix §A.2.

30

distribution with parameter γ, where the outcome of this draw is private information of the seller

and fixed throughout the horizon. Subsequent periods take place as described in §2. We show

that an equilibrium γ exists for this model, and that subsequent dynamics are identical to those

described in §3. (See Appendix A.1 for details.)

Much of the equilibrium structure and insight from the two-type model generalizes to richer type

spaces. Our extension to three types (which is detailed below) partially illustrates this; with

more periods/types a similar equilibrium structure arises as the buyer offers high prices and uses

experimental learning until her belief falls below a threshold at which point she offers a lower price

and engages in contractual learning. Though experimental learning may occur after contractual

learning (and within the same period), this does not detract from the insights of the two-type

model. In addition, similar insights arise when cost and quality are independent so types are multi-

dimensional: the buyer offers one of two prices at each period based on a threshold structure, though

the thresholds would be described by curves through a two-dimensional space. While equilibria in

these extensions still exhibit a threshold-type structure, the impact of integrating stochastic and

strategic information relative to learning from one form of information in isolation is not as clean,

and insights could not be communicated as clearly as in the more stylized model we adopt here.

5.1 Three Seller Types

Figure 8 depicts the buyer’s strategy as a function of her (two-dimensional) belief for a two period,

three-type example, where M denotes a “middle” type with cost cM such that cM ∈ (cL, cH) and

quality qM such that qM ∈ (qL, qH). With three types, the buyer’s belief takes the vector form

µ := (µM , µH) where µM denotes the probability that the buyer assigns to the seller being a

medium-type seller and µH denotes the probability that the buyer assigns to the seller being a

high-type seller. As is the case with two types, the buyer’s strategy is determined by her belief

except at certain thresholds when she is indifferent between two (or three) offers.

In period T , it is straightforward to extend our analysis and show that, on the equilibrium path,

when qH − cH > qM − cM > qL− cL > λ the seller accepts every price offer greater than or equal to

his cost. Therefore the buyer only offers cL, cM , or cH , and can select between them, for any fixed

belief, by solving a simple maximization problem.

In period T − 1, we observe several similarities to the two-type case. First, the seller rejects any

price less than his cost. Second, he rejects any price less than the buyer’s price offer in period T

31

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1− µH1 − µM1µM1

µH1

cLcMcH

Period T − 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1− µH2 − µM2µM2

µH2

Period T

Figure 8: Buyer Strategy: Three-Types with parameters:qH = .9, qM = .6, qL = .3, cH = .2, cM = .1, cL = 0, λ = 0.

under the same belief. Third, for a given belief µT−1, the probability that the seller accepts a price

offer p ∈ (cM , cH) is equal to the probability he accepts p = cM in equilibrium. The same holds for

p ∈ (cL, cH) and cL. Therefore, the buyer only offers p ∈ {cL, cM , cH}. Moreover, the set of beliefs

for which the buyer offers cH increases from period T to T − 1 while the set of beliefs for which

cL is offered decreases. On the equilibrium path, if cM is offered, then a low-type seller accepts

with probability one and a medium-type seller accepts with positive probability such that he is

indifferent between accepting and rejecting. At beliefs where cL is offered, a low-type seller mixes

between accepting and rejecting.

This example also allows one to observe more complicated equilibrium outcomes. For example,

with two seller types, contractual and experimental learning never occur in the same period in

equilibrium. With three seller types, however, this is not the case anymore. Consider the region

of beliefs where the buyer offers cM in period T − 1. Recall that a medium-type seller accepts cMwith positive probability and a low-type seller accepts cM with probability one. Therefore, if the

seller accepts, the buyer learns (through contractual learning) that the seller is not a high-type, but

the seller may still be either a medium-type seller or a low-type seller. Therefore, the buyer learns

(through experimental learning) by observing the realization yT−1 as well; that is, µT is different

if yT−1 = 0 and if yT−1 = 1. This implies that the “separability” of the two forms of information

that is described in our equilibrium characterization in §3 is not a broad feature of the integrated

learning model but rather an artifact of considering two seller types.

32

ReferencesAcharya, A. and J. Ortner (2017). Progressive Learning. Econometrica 85 (6), 1965–1990.

Bergemann, D. and J. Valimaki (1996). Learning and Strategic Pricing. Econometrica 64 (5), 1125–1149.

Berry, D. A. and B. Fristedt (1985). Bandit Problems: Sequential Allocation of Experiments (Monographson Statistics and Applied Probability). London: Chapman and Hall.

Besbes, O., J. Chaneton, and C. C. Moallemi (2017). The exploration-exploitation trade-off in the newsvendorproblem. Columbia Business School Research Paper (14-61).

Besbes, O. and A. Muharremoglu (2013). On Implications of Demand Censoring in the Newsvendor Problem.Management Science 59 (6), 1407–1424.

Bondareva, M. and E. Pinker (2018). Dynamic Relational Contracts for Quality Enforcement in SupplyChains. Forthcoming in Management Science.

Caro, F. and J. Gallien (2007). Dynamic Assortment with Demand Learning for Seasonal Consumer Goods.Management Science 53 (2), 276–292.

Chaturvedi, A. and V. Martınez-de Albeniz (2011). Optimal Procurement Design in the Presence of SupplyRisk. Manufacturing & Service Operations Management 13 (2), 227–243.

Che, Y.-K. (1993). Design competition through multidimensional auctions. The RAND Journal of Eco-nomics 24 (4), 668–680.

den Boer, A. V. and B. Zwart (2014). Simultaneously Learning and Optimizing Using Controlled VariancePricing. Management Science 60 (3), 770–783.

Edelman Intelligence (2017). Freelancing in America: 2017. Technical report. Retrieved fromhttps://www.slideshare.net/upwork/freelancing-in-america-2017.

Fudenberg, D. and J. Tirole (1991). Game Theory. Cambridge, Massachusetts: MIT Press.

Harrison, J. M., N. B. Keskin, and A. Zeevi (2012). Bayesian dynamic pricing policies: Learning and earningunder a binary prior distribution. Management Science 58 (3), 570–586.

Hart, O. D. and J. Tirole (1988). Contract Renegotiation and Coasian Dynamics. The Review of EconomicStudies 55 (4), 509–540.

Huh, W. T. and P. Rusmevichientong (2009). A nonparametric asymptotic analysis of inventory planningwith censored demand. Mathematics of Operations Research 34 (1), 103–123.

Kakade, S., I. Lobel, and H. Nazerzadeh (2013). Optimal Dynamic Mechanism Design and the Virtual-PivotMechanism. Operations Research 61 (4), 837–854.

Kanoria, Y. and D. Saban (2017). Facilitating the search for partners on matching platforms : Restrictingagent actions. Working Paper .

Kessler, A. (1998). The Value of Ignorance. The RAND Journal of Economics 29 (2), 339–354.

Laffont, J.-J. and J. Tirole (1990). Adverse Selection and Renegotiation in Procurement. Review of EconomicStudies 57 (4), 597.

33

Laffont, J.-J. and J. Tirole (1993). A Theory of Incentives in Procurement and Regulation. Cambridge,Massachusetts: MIT Press.

Levin, J. (2003). Relational incentive contracts. American Economic Review 93 (3), 835–857.

Myerson, R. B. (1981). Optimal auction design. Mathematics of operations research 6 (1), 58–73.

Ortner, J. (2017). Durable goods monopoly with stochastic costs. Theoretical Economics 12 (2), 817–861.

Ostrovsky, M. and M. Schwarz (2010). Information Disclosure and Unraveling in Matching Markets. Amer-ican Economics Journal: Microeconomics 2, 34–63.

Papanastasiou, Y. (2019). Newsvendor decisions with two-sided learning. Working Paper .

Pavan, A., I. Segal, and J. Toikka (2014). Dynamic Mechanism Design: A Myersonian Approach. Econo-metrica 82 (2), 601–653.

Roesler, A. K. and B. Szentes (2017). Buyer-optimal learning and monopoly pricing. American EconomicReview 107 (7), 2072–2080.

Saure, D. and A. Zeevi (2013). Optimal dynamic assortment planning with demand learning. Manufacturing& Service Operations Management 15 (3), 387–404.

Schmidt, K. M. (1993). Commitment through Incomplete Information in a Simple Repeated BargainingGame. Journal of Economic Theory 60 (1), 114–139.

Spence, M. (1973). Job Market Signaling. The Quarterly Journal of Economics 87 (3), 355–374.

Taylor, T. A. and E. L. Plambeck (2007a). Simple relational contracts to motivate capacity investment:Price only vs. price and quantity. Manufacturing & Service Operations Management 9 (1), 94–113.

Taylor, T. A. and E. L. Plambeck (2007b). Supply chain relationships and contracts: The impact of repeatedinteraction on capacity investment and procurement. Management science 53 (10), 1577–1593.

Tomlin, B. (2009). Impact of Supply Learning When Suppliers Are Unreliable. Manufacturing & ServiceOperations Management 11 (2), 192–209.

Wan, Z. and D. R. Beil (2009). RFQ Auctions with Supplier Qualification Screening. Operations Re-search 57 (4), 934–949.

Yang, Z. B., G. Aydin, V. Babich, and D. R. Beil (2012). Using a Dual-Sourcing Option in the Presence ofAsymmetric Information About Supplier Reliability: Competition vs. Diversification. Manufacturing &Service Operations Management 14 (2), 202–217.

34

A Extensions

A.1 Endogenizing the Prior

We now describe a model extension in which γ is determined strategically by the seller (as opposed

to exogenously) at time t = 0. In this extension, period t = 0 is comprised by two stages. First the

seller selects γ, which is then observed publicly. Second, the type of the seller is drawn according to

a Bernoulli distribution with parameter γ, where the outcome of this draw is a private information

of the seller and fixed throughout the horizon. Subsequent periods take place as described in §2.

In period t = 1, the buyer forms his belief µ1 based on the γ that the seller selected in period 0.

Thus, at time t = 0 the seller maximizes his expected-payoff by solving:

maxγ∈[0,1]

γEp1

(V ρ,α,µ1 (〈γ, p1〉)

∣∣θ = H)

+ (1− γ)Ep1

(V ρ,α,µ1 (〈γ, p1〉)

∣∣θ = L). (8)

If the seller selects γ ∈ {0, 1}, then his expected payoff equals 0 by Theorems 1 and 2, as the buyer

will always offer the seller’s cost. However, selecting γ ∈ (µ∗1, 1) would generate a positive payoff

for the seller whenever he is a low type. The seller aims to maximize that (expected) payoff.

Characterizing the probability γ∗ which maximizes (8) is difficult. We show that it exists (and

therefore an equilibrium exists) but imposes additional restrictions on ρ. By Theorems 1 and 2,

the buyer’s strategy is a threshold policy, so Ep1

(V ρ,α,µ1 (〈γ, p1〉)

∣∣θ = L)

is piecewise constant in γ

with positive “jumps.” Thus, the seller’s objective is piecewise linear with positive “jumps.” These

discontinuities occur at the subset of beliefs:

G :={γ ∈ [0, 1] : ∃t ≤ T, {yt′}t−1

t′=1 ∈ {0, 1}t−1 s.t.

∀τ = 1, ..., t− 1 : µτ(⟨γ, (pt′ = cH , at′ = 1, yt′ = yt′)τ−1

t′=1

⟩)≥ µ∗τ ,

µt(⟨γ, (pt′ = cH , at′ = 1, yt′ = yt′)t−1

t′=1

⟩)= µ∗t

}.

That is, if given µ1 = γ, there exists a sequence of quality realizations that leads to the buyer

having belief µ∗t in period t (where she is indifferent between offering cH and 0) after offer-

ing cH in all previous periods, then there is a discontinuity in the seller’s objective function at

γ. This discontinuity arises because for a fixed sequence of observations {yt′}t−1t′=1 ∈ {0, 1}t−1,

µt(⟨γ, (pt′ = cH , at′ = 1, yt′ = yt′)t−1

t′=1

⟩)is continuous in γ, and at belief µt = µ∗t + ε, the buyer

35

offers pt = cH and at belief µt = µ∗t − ε, the buyer offers pt′ = cL for all t′ ≥ t. Note that for

any period t, there are t different combinations of t − 1 past successes and failures (conditional

on µτ ≥ µ∗τ , ∀τ , the order of successes and failures does not affect µt). Thus, there are at most

t different priors γ ∈ [0, 1] for which there exists a sequence of observations {yt′}t−1t′=1 resulting in

µt(h) = µ∗t , and since T is finite, the set G is finite as well.

Consider the supremum of the seller’s optimization problem:

V ∗ = supγ∈[0,1]

Ep1

(V ρ,α,µ1 (〈γ, p1〉)

∣∣θ = L)

We note that if there exists γ∗ ∈ [0, 1] which achieves V ∗, then γ∗ ∈ G.12 To ensure the existence of

γ∗, ρ must satisfy the following condition. For any (on-path) history h ∈ H1 such that µ1(h) = µ∗1

one has p1 = cH with probability one, and for any t > 1 and (on-path) history h ∈ Ht such that

µt(h) = µ∗t and pt−1 = cH , one has pt = cH with probability one. When γ is exogenous, the

buyer offers cH and/or cL with positive probability at these histories, so endogenizing γ further

restricts the set of equilibria. When ρ satisfies the above condition, there is a local maximum at

every discontinuity γ ∈ G. Thus, the seller has a best response (the maximum of a finite set),

and an equilibrium exists. Letting γ∗ denote a best response that achieves V ∗ when ρ satisfies the

above condition, we observe that a strategy profile ρ that does not satisfy this condition is not an

equilibrium because the seller could profit from deviating to γ∗ + ε for some ε > 0.

A.2 Equilibria Calculations

We calculate the buyer’s equilibrium thresholds and expected payoff in an integrated learning

equilibrium as well as in the experimental-only and contractual-only benchmark models.

For this purpose we generated a grid of parameters spanning the region where the buyer has a bad

outside option: we consider 40 values of qH equally spaced between 0.1 and 0.9; 30 values of qLequally spaced between 0.02 and 0.8; 20 values of cH equally spaced between 0.02 and 0.5; and 20

values of λ equally spaced between 0 and 0.3. We normalized cL to 0 in all instances. Out of the

resulting 480, 000 combinations, we focus the analysis on those that satisfy qH − cH > qL− cL > λ,

which leaves 101,074 instances. Of these, 51% are cases where qL − cH > λ, and 49% are cases12The seller’s objective function decreases on each interval between the discontinuities because

Ep1 (V ρ,α,µ1 (〈γ, p1〉)|θ = L) is independent of γ on this interval. Therefore, if the seller selects some γ0 6∈ G,there will exist ε > 0 such that his payoff strictly increases if he selects γ0 − ε.

36

where qL − cH < λ. For each of these combinations we calculate the equilibrium for T = 10 and

compare the buyer’s thresholds and payoffs across models.

Average Thresholds

Region µ∗1 µE1 µC1

qL − cH < λ 0.125 0.155 0.352qL − cH > λ 0.002 0.000 0.004

Table 1: Threshold Comparison

In Table 1, we report the average threshold in period 1 grouped by whether the outside option is

better or worse than buying from the low-type seller at a high price. When qL−cH > λ, in all three

models, the buyer offers cH at nearly every belief. When qL− cH < λ, in period 1 (with 10 periods

to go), the value of experimental learning drives the buyer’s threshold much lower. We observe that

both the integrated learning model and the experimental-only benchmark have, on average, a much

lower threshold than the contractual-only benchmark. Moreover, we see an additional reduction in

the threshold due to the presence of contractual learning in the integrated learning model.

Theorem 3 maxµUC1 (µ)−U∗1 (µ)

U∗1 (µ)

Region How Often(6) Holds

Avg. T 25thPercentile

Median 75thPercentile

qL − cH < λ 100.00% 1.04 5.10% 10.05% 17.18%qL − cH > λ 33.93% 2.11 1.56% 3.89% 7.06%

Table 2: Integrated Learning vs. Contractual Only

Table 2 compares the integrated learning model with the contractual-only benchmark. From the

first column, one can observe that when qL − cH < λ, (6) holds in all cases (thus, there is always

a region of beliefs under which access to quality observations reduces the buyer’s average payoff).

On the other hand, for 66% of the cases such that qL − cH > λ, the buyer’s expected payoff in

the integrated learning model is greater than the payoff in the contractual-only benchmark at all

beliefs. Of the instances where (6) holds, we compare several more characteristics. We observe

that, on average, the required number of periods to go T , which is characterized in Theorem 3, is

small. As T ≥ 1 by definition, one can observe that in nearly all cases with qL − cH < λ, in any

period t < T − 2 (with two or more future periods), there is an interval of beliefs for which the

buyer’s payoff is greater if she cannot observe quality realizations. When λ is very small, T is on

37

average a bit larger.

Finally, the last three columns indicate how much can be lost by having access to quality observa-

tions with a horizon of 10 periods. When qL−cH < λ, we observe that in over half of the instances,

the buyer’s payoff could be 10% larger (at some belief) if she could not observe quality realizations.

Moreover, for a quarter of cases, the buyer’s payoff could be at least 17% greater without access to

quality observations. When qL − cH > λ the loss is not as large but in over half of the instances,

the buyer’s payoff is nearly 4% lower than it would be access to quality observations.

Region Avg. µ1 Avg. µ1−µ∗1µC1 −µ

∗1

How Often:UC1 (µC1 ) > U∗1 (µC1 )

qL − cH < λ 0.186 31.76% 0.0465%qL − cH > λ 0.011 90.25% 6.5585%

Table 3: Analysis of µ

One can also observe that with 10 periods to go, there is a relatively large interval of beliefs for

which not observing quality observations is beneficial. In Table 3, we use µ1 to refer to the largest

belief that satisfies, U∗t (µ) < UCt (µ) for all µ ∈ (0, µ1). On average, therefore, the buyer is better

not observing quality observations for nearly 19% of the belief space (the relative size of the interval

(0, µ1)). Moreover, the second column shows that on average µ1 is falling fairly close to the middle

of the interval (µ∗1, µC1 ). Indeed, almost all of the situations in which a buyer would be better

off without access to quality observations are such that she offers cL without access to quality

observations, but with integrated learning, in some cases she would offer cL and in some cases

cH . In rare cases (0.05% of the instances qL − cH < λ) one has that µ1 > µC1 , which means that

there exists an interval of beliefs where the buyer offers cH in both the integrated learning and

contractual only models, and she would still prefer to not be able to observe quality realizations.

When qL − cH > λ this phenomenon happens more often but is still relatively rare. Moreover, it

only occurs when there are several periods to go so experimental learning is valuable; we observe

that with five or fewer periods remaining, µ1 < µC1 in all cases.

Finally, we also compared the integrated learning thresholds to the experimental-only thresholds.

We observed in §4 that the payoff for the buyer is always weakly larger in an integrated learning

equilibrium. However, we also observed in §3.2.2 that the presence of contractual learning intro-

duced two effects to the integrated learning thresholds which cause the sequences of thresholds

to cross. We compare these two effects in Table 4 by calculating the last period t in which the

38

Last Period with µ∗t < µEt

Period (t) # %

10 0 0.00%9 0 0.00%8 19 0.04%7 20,581 41.62%6 12,328 24.93%5 6,211 12.56%4 3,497 7.07%3 2,153 4.35%2 1,381 2.79%1 928 1.88%N/A 2,348 4.75%

Table 4: Comparison of Upward/Downward Effect

downward effect dominates (so that µ∗t < µEt ). One can observe that in general the cross in the

sequences of thresholds happens with only few periods remaining. In about two-thirds of the in-

stances integrated learning thresholds were greater than the experimental-only thresholds for only

a few (the last four or fewer) periods. In less than five percent of the instances the upward effect

dominated for 10 or more periods.

A.3 With-in Period Commitment to Not Observe

Suppose the buyer has the choice of observing or not at the time she receives the quality (that is,

after the acceptance decision). Because the buyer’s value function is convex in each setting, it is

always weakly dominant to observe (and will typically be strict). It is weak, for example, after an

acceptance of cL in which case there is no more learning. Based on this intuition, we now outline

the argument for why allowing the buyer to commit to not observe quality in the current period

does not affect the integrated learning equilibrium.

Suppose at the time of the price offer pt, the buyer also selects observe or not ot. Intuitively, if

the buyer offers cH , she weakly prefers to observe the quality and if she offers cL, she is indifferent

between observing and not because observing the quality provides no new information after the

acceptance. Inductively, we can show that while there may be equilibria in which the buyer offers

p = cH , ot = 0, the buyer’s purchase decisions, price offers, and payoffs are the same regardless of

whether she has the observe decision.

39

We focus on the region with a bad outside option. Consider period T , the buyer is completely indif-

ferent between observing the quality realization or not because the game ends and the information

provides no value. Therefore, the buyer threshold for offering cH is the same in the integrated

learning model and this slightly altered model.

Inductively, assume that the thresholds for offering cH is the same in periods t′ = t + 1, ..., T .

Consider the buyer’s decision in period t. A low-type seller accepts p ∈ [cL, cH) with probability

such that the buyer is indifferent between offering cL and cH in the next period. Becasuse the

thresholds are the same in subsequent periods, the acceptance probability is the same as in the

integrated learning model. Thus, the buyer expected payoff is the same. Even if in future periods,

the buyer may offer p = cH , o = 0, she can only do so when she is indifferent between observing

and not observing so this has no effect.

Ultimately, the buyer’s decision to not observe this period or not does not affect the seller’s accep-

tance decision. The buyer can only benefit by committing to not observe in future periods because

that means you offer low prices for a larger interval of beliefs in future periods.

40

B Proofs

B.1 Proof of Theorem 1

This section is organized as follows. In §B.1.1 we provide preliminaries and state the inductionhypotheses that are used in the proof; In §B.1.2 we state auxiliary lemmas that are used in theproof; In §B.1.3 we prove the theorem; In §B.1.4 we provide proofs for the auxiliary lemmas andfor other auxiliary results.

B.1.1 Preliminaries

Notation. Our notation is in line with the formulation laid out in §2. To simplify notationwe sometimes write µt (without the argument ht) to refer to the buyer’s beliefs. Similarly, wedenote by ρt and αt (without arguments) the probability distributions over the buyer’s and seller’sactions, and by pt and at the respective random variables associated with these distributions. Weuse lowercase letters without subscripts (e.g., a, p) to denote realizations of random variables. Fora given buyer strategy ρ, a time period t, and a history h ∈ Ht, we define by F ρt (h) the cumulativedistribution function over price offers associated with distribution ρt(h). We define:

σ(z) := qHz

qHz + qL(1− z) , φ(z) := (1− qH)z(1− qH)z + (1− qL)(1− z) , and η(z) := qHz + qL(1− z).

(9)The functions σ and φ capture how a belief z updates according to Bayes’ rule after a success anda failure, respectively, and η captures the probability of a success given belief z.

Proof Technique. Our proof is based on backward induction over time periods. Moreover, withineach period t, we characterize the necessary conditions for the buyer’s and seller’s equilibriumstrategies, leveraging the agents’ sequentially rationality at time t for every history. We show thatthere exists a strictly increasing sequence of thresholds {µ∗t : t = 1, . . . , T} with 0 < µ∗1 < µ∗2 <

· · · < µ∗T < 1, such that equilibrium strategies must satisfy for each t, h ∈ Ht, and p ∈ R:

P(at = 1|θ = H,ht = h, pt = p) =

1, if p ≥ cH

0, if p < cH ,(10)

41

P(at = 1|θ = L, ht = h, pt = p) =

1, if p ≥ cH(µ∗t+1−µt

(1−µt)µ∗t+1

)+, if cL < p < cH

ψt ∈[0,(

µ∗t+1−µt(1−µt)µ∗t+1

)+], if p = cL

0, if p < cL,

(11)

pt =

cH w.p. 1, if µt > µ∗t

cH w.p. ξt ∈ [0, 1], if µt = µ∗t

p ∈ Pt w.p. 1− ξt, if µt = µ∗t

p ∈ Pt w.p. 1, if µt < µ∗t ,

(12)

where we define µ∗T+1 := 1, and Pt denotes the set of prices that are rejected with probability 1:

Pt :=

{p : p < cL}, if ψt > 0

{p : p ≤ cL}, if ψt = 0.(13)

Induction hypotheses. Our first induction hypothesis is as follows:

(IH1) Fix t < T . There exists a sequence of thresholds 0 < µ∗t+1 < µ∗t+2 < · · · < µ∗T < 1 suchthat, in any equilibrium, for each period t′ = t+ 1, t+ 2, ..., T , the price offer and acceptancedecision, pt′ , at′ , satisfy the properties in equations (12), (10), (11), respectively.13 Moreover,offering cH at beliefs µt′ > µ∗t′ and offering a price that is rejected with probability one by allseller types at beliefs µt′ < µ∗t′ is a strictly dominant action for the buyer.

Our proof technique is effective due to our refinements, as summarized by the following remark.

Remark 1. Under our refinements on the buyer’s off-path belief (see discussion after equation(3)), the seller’s strategy can be determined without considering whether the buyer has made aprice offer pt = p that occurs with positive probability in equilibrium and regardless of whether thehistory h occurs with positive probability; otherwise, it would have been required to specify howthe buyer’s belief is updated after each (p, a, y) in cases where p is off-path.

13We note that these are necessary conditions for equilibrium strategies but not sufficient; in particular, ξt′ will bepinned down at some (off-path) histories.

42

Our second induction hypothesis is used for establishing that the set of equilibria is non-empty.

(IH2) Fix t < T . The set of equilibria for the game with T − t periods is non-empty.

While (IH1) and (IH2) form the basis of the theorem, the third induction hypothesis (which follows),characterizes auxiliary properties that are used in the induction step. Let {µ∗t : t = 1, . . . , T} bethe sequence of thresholds in (IH1), and let η, σ, and φ be the functions defined in (9). Then, fora fixed t ≤ T , we define, recursively, the functions U∗t : [0, 1]→ R as follows:

U∗t (µ) =

η(µ)− cH + η(µ)U∗t′+1(σ(µ)) + (1− η(µ))U∗t′+1(φ(µ)), if µ ≥ µ∗t′

(T − t+ 1)λ, otherwise,(14)

with terminal function UT+1(µ) = 0. Then, our third induction hypothesis can be stated as:

(IH3) Fix t < T . Suppose that for each period t′ ∈ {t+ 1, t+ 2, ..., T}, (IH1) holds. Then, forevery t′ ∈ {t+ 1, t+ 2, ..., T} and history h ∈ Ht′ , Uρ,α,µt′ (h) = U∗t′(µt′(h)).

B.1.2 Auxiliary results

We next detail auxiliary results that are used when proving the theorem in §B.1.3 (the proofsof these auxiliary results appear in §B.1.4). The first result characterizes the seller’s action inequilibrium at period T .

Lemma 1. (Seller’s action at period T ) Fix a history h ∈ HT and a price offer p. Anyequilibrium strategy of the seller must satisfy:

P (aT = 1|θ = H,hT = h, pT = p) =

1, if p ≥ cH

0, if p < cH ,(15)

P (aT = 1|θ = L, hT = h, pT = p) =

1, if p > cL

0, if p < cL

ψT ∈ [0, 1], if p = cL.

(16)

The second auxiliary result characterizes the high-type seller’s action in equilibrium in time t < T ,conditioned on our three induction hypotheses holding for every future period t′ > t.

43

Lemma 2. (High-type seller’s action at period t < T ) Fix a history h ∈ Ht and a price offerp ∈ R, and suppose that (IH1), (IH2), and (IH3) hold for every t′ > t. Then, any equilibriumstrategy of a high-type seller at time t must satisfy:

P (at = 1|θ = H,ht = h, pt = p) =

1, if p ≥ cH

0, if p < cH .(17)

The third auxiliary result characterizes properties that the low-type seller equilibrium strategymust satisfy at time t < T , assuming our induction hypotheses hold at every future period t′ > t.

Lemma 3. (Low-type seller’s action at period t < T ) Fix a public history h ∈ Ht, a priceoffer p ∈ R, and suppose that (IH1), (IH2), and (IH3) hold for every t′ > t. Then, the equilibriumstrategy of a low-type seller at time t must satisfy:

P (at = 1|θ = L, ht = h, pt = p) =


(1−µt)µ∗t+1

)+, if cL < p < cH

ψt ∈[0,(

µ∗t+1−µt(1−µt)µ∗t+1

)+], if p = cL

0, if p < cL.

(18)

The next result characterizes the buyer’s action at period t < T , assuming our induction hypotheseshold for every future period t′ > t, and based on the characterization of the seller’s action.

Lemma 4. (Buyer’s action at period t < T ) Fix a public history h ∈ Ht and suppose that(IH1), (IH2), and (IH3) hold for every t′ > t. The period-t buyer’s strategy must satisfy:

pt =


cH w.p. ξt ∈ [0, 1], if µt = µ∗t


p ∈ Pt w.p. 1, if µt < µ∗t ,

(19)

for some µ∗t < µ∗t+1.

44

B.1.3 Proving the theorem

In Part (I) of the proof we analyze period T , and in part (II) of the proof we analyze the inductionstep for a general period t < T and establish the result. Fix parameters qH , cH , λ, qL, cL such thatqH − cH > λ > qL − cL. Fix T ∈ N.

Part I: Base Case (Period T). Using sequentially rationality, we work backwards beginningwith the seller’s action, and then, fixing the seller’s action, we characterize the buyer’s action. Wefinally prove that the set of equilibria is non-empty, and that (IH3) is satisfied.

The seller’s action at period T is characterized by Lemma 1, stating that for a fixed a historyh ∈ HT and a price offer p. Any equilibrium strategy of the seller must satisfy:

P (aT = 1|θ = H,hT = h, pT = p) =

1, if p ≥ cH

0, if p < cH ,

P (aT = 1|θ = L, hT = h, pT = p) =

1, if p > cL

0, if p < cL

ψT ∈ [0, 1], if p = cL.

We turn to characterize the buyer’s strategy in period T . Given history hT = h, the buyer hasbelief µT := µT (h). The buyer’s expected payoff, given strategy ρ ∈ P is:

45

Uρ,α,µT (h) = µT

∫ ∞−∞

(EyT

(yT − p

∣∣θ = H, aT = 1)P(aT = 1|θ = H,hT = h, pT = p)

+λP(aT = 0|θ = H,hT = h, pT = p))dFρT (h; p)

+(1− µT )∫ ∞−∞

(EyT

(yT − p|θ = L, aT = 1)P(aT = 1|θ = L, hT = h, pT = p)

+λP(aT = 0|θ = L, hT = h, pT = p))dFρT (h; p)

(a)= µT · EpT

((qH − pT )1{pT ≥ cH}+ λ1{pT < cH}

∣∣∣∣ht = h

)+(1− µT ) · EpT

((qL − pT )1{pT > cL}+ λ1{pT < cL}+ (20)(

ψT (qL − cL) + (1− ψT )λ)1{pT = cL}

∣∣∣∣hT = h

)= EpT

(µT (qH − qL) + qL − pT

∣∣∣∣pT ≥ cH , hT = h

)P(pT ≥ cH |hT = h)

+EpT

((1− µT )(qL − pT ) + µTλ

∣∣∣∣cL < pT < cH , hT = h

)P(cL < pT < cH |hT = h)

+EpT

((1− µT )ψT (qL − pT ) + (µT + (1− µT )(1− ψT ))λ

∣∣∣∣pT = cL, hT = h

)P(pT = cL|hT = h)

+λP(pT < cL|hT = h),

where (a) follows from using the seller strategy and taking expectation with respect to yT . By sequentialrationality, for any given belief, the buyer’s equilibrium strategy maximizes his expected payoff. Defineµ∗T := λ+cH−qL

qH−qL. One may observe that (i) if µT > µ∗T , then (20) is maximized by pT = cH with probability

one; (ii) if µT < µ∗T , then the maximum value of (20) is λ which can be achieved by any offer that is rejectedwith probability 1; and (iii) if µT = µ∗T , then (20) is maximized by pT ∈ {cH} ∪ {p : p ≤ cL}. Denoting byPT the set of price offers that are rejected by all seller types with probability 1 at period T , pT must satisfy:

pT =

cH w.p. 1, if µT > µ∗T

cH w.p. ξT ∈ [0, 1], if µT = µ∗T

p ∈ PT w.p. 1− ξT , if µT = µ∗T

p ∈ PT w.p. 1, if µT < µ∗T .

(21)

From our characterization of the seller’s strategy, we know that PT includes, at least, any price offer strictlyless than cL (and also cL if ψT = 0 in (16)). At µT = µ∗T , the buyer is indifferent between offering a pricethat is rejected and offering cH , so we can mix between these two types of offers. (In the next section, we willshow that the mixing probability, ξT , will be pinned down at some (off-path) histories.) Defining µ∗T+1 = 1,this establishes that (IH1) holds in period T .

(IH2): Equilibrium existence. To show the set of equilibria is non-empty for T = 1, we construct an

46

equilibrium as follows. Let ρT be as described in (21) with ξT equal to 0 and let p ∈ PT be equal tocL − 1. Let αT be defined as in (15) and (16) with ψT equal to 0. Let µT be calculated according to Bayesrule whenever possible and equal to 0 at any other histories. From the previous analysis one obtains that(ρT , αT , µT ) is an equilibrium for T = 1; thus, (IH2) holds.

(IH3): For every history h ∈ HT , we have that Uρ,α,µT (h) = U∗T (µT (h)) whenever (ρ,α,µ) satisfies(IH1). Suppose that (ρ,α,µ) satisfies (IH1). Fix a history h ∈ Ht. Define the buyer’s period T valuefunction as

U(ρ,α,µ)T (h) :=

λ, if µT (h) ≤ µ∗T(qH − qL)µT + qL − cH , if µT (h) > µ∗T .

(22)

Note that this can be expressed as a function of µT (h), and corresponds with the definition of U∗T (·).Moreover, U∗T (µT ) is convex in µT since qH − qL > 0. Therefore, (IH3) holds.

Part II: Inductive Step (Period t < T ). Fix 1 ≤ t < T . Suppose that the Induction Hypotheses (IH1),(IH2), and (IH3) hold for every t′ > t. As before, we start by characterizing the seller’s period-t strategy forevery history h ∈ Ht and every price offer pt = p. By sequential rationality and taking the seller’s period-tstrategy as given, we characterize the buyer’s period-t strategy, which completes the proof of (IH1). We thenproceed to show (IH2) and (IH3).

Seller’s action at period t < T . The high-type seller’s action at time t < T is characterized byLemma 2, stating that for any history h ∈ Ht, and price offer p ∈ R, if (IH1), (IH2), and (IH3) hold forevery t′ > t, then any equilibrium strategy of a high-type seller at time t is:

P (at = 1|θ = H,ht = h, pt = p) =

1, if p ≥ cH

0, if p < cH .

Lemma 3 characterizes properties that the low-type seller equilibrium strategy must satisfy at period t < T ,assuming that our induction hypotheses hold in every future period t′ > t. It states that for any historyh ∈ Ht and a price offer p ∈ R, if (IH1), (IH2), and (IH3) hold for every t′ > t, then the equilibrium strategyof a low-type seller at must satisfy:

P (at = 1|θ = L, ht = h, pt = p) =


(1−µt)µ∗t+1

)+, if cL < p < cH

ψt ∈[0,(

µ∗t+1−µt

(1−µt)µ∗t+1

)+], if p = cL

0, if p < cL.

47

Buyer’s action at period t < T . Based on the properties that must be satisfied at equilibrium bythe seller’s action at period t, we now use them to characterize the buyer’s action period t. Lemma 4 statesthat for any a public history h ∈ Ht, if (IH1), (IH2), and (IH3) hold for every t′ > t, then in equilibrium thebuyer’s action at time t must satisfy

pt =


cH w.p. ξt ∈ [0, 1], if µt = µ∗t


p ∈ Pt w.p. 1, if µt < µ∗t .

for some µ∗t < µ∗t+1. Together, this establishes that in an equilibrium, the buyer and seller strategies mustsatisfy the properties stated in the theorem and have shown that (IH1) holds in period t. To complete theproof of the theorem it remains to establish (IH2) and (IH3).

(IH2): Equilibrium existence for T − t+ 1 periods. We now construct an equilibrium to showthe set of equilibria is non-empty for T −t+1 periods and that (IH2) holds in period t. By (IH2), there existsa continuation equilibrium for periods t+ 1 to T that satisfies the other induction hypotheses. For t′ > t, letρt′ , αt′ , µt′ be as described in this continuation equilibrium. However, for pt+1, set ξt+1 according to (24) inthe proof of Lemma 3 for (off-path) histories where pt = p ∈ (cL, cH). For period t, set pt according to (19)with p ∈ P equal to cL − 1. Define αt as in (17) and (18) with ψt equal to 0 for all histories; and let µt becalculated as required by our equilibrium concept where possible and equal to 0 at any other histories. Itis straightforward from the previous analysis that (ρ, α, µ) describes an equilibrium for T − t + 1 periods.Thus, (IH2) holds.

(IH3): For history ht ∈ Ht, Uρ,α,µt (ht) = U∗t (µt(ht)) whenever (IH1) holds for all periodst′ ≥ t. Fix ht ∈ Ht and suppose that (ρ,α,µ) satisfy (IH1). By our previous analysis the buyer’s period-tvalue function, U (ρ,α,µ)

t (h) is given (see the proof of Lemma 4) by the expression in (26) if µt(ht) ≥ µ∗t andby the expression in (27) if µt(ht) < µ∗t (these expressions are equal at µ∗t ). Note that this can be expressedas a function of µt(h), corresponding with the definition of U∗t (·). Moreover, U∗t is convex because there area finite number of possible continuation policies (given that the optimal policy will be either offer cH or aprice that is rejected with probability 1), each of which generates an expected payoff that is linear in µt.U∗t (µ) is the pointwise maximum of these policies so is convex. This concludes the proof of the theorem.

B.1.4 Proofs of auxiliary results

Proof of Lemma 1. Consider the seller’s action at history hs = 〈θ, hT = h, pT = p〉. The seller’s payofffrom a decision a ∈ {0, 1} is a(p−cθ), as there are no future expected payoffs in the last period. By sequential

48

rationality, a seller accepts any price strictly greater than his cost, rejects prices strictly less than his cost,and is indifferent between accepting and rejecting when the price is exactly equal to his cost. Therefore, thelow-type seller strategy must satisfy (16). If p = cL, a low-type seller is indifferent between accepting andrejecting so ψT can take any value in [0, 1]. The high-type seller strategy is similar but, by our equilibriumrefinement,14 with probability 1, a high-type seller accepts if pT = cH and therefore α must satisfy (15).

Proof of Lemma 2. Fix a public history h ∈ Ht and a price offer p ∈ R. By (IH1), in any equilibrium,(ρ,α,µ) we have that V ρ,α,µt+1 (〈θ = H,ht = h, pt = p〉) = 0 (i.e., the high-type seller’s expected continuationpayoff in period t + 1 equals 0) because the buyer does not offer a price greater than cH with positiveprobability in periods t′ > t. Therefore, a high-type seller rejects all prices less than cH and acceptsall prices greater than cH . At pt = cH , a high-type seller is indifferent between accepting and rejecting;therefore, by our equilibrium refinement, he accepts with probability one.15 Thus, the equilibrium strategyof a high-type seller must satisfy (17).

Proof of Lemma 3. The proof is divided into claims, covering different ranges of price offers.

Claim 1. Fix a history h ∈ Ht, and suppose that pt = p ≥ cH . Then, P (at = 1|θ = L, ht = h, pt = p) = 1.

Proof. By (17), a high-type seller accepts a price p ≥ cH with probability one. Therefore, if the low-typeseller rejects this price with positive probability, then after a rejection, by Bayes’ rule µt′ = 0 for all t′ > t.16

Therefore, in equilibrium, with probability one a low-type seller’s continuation value equals 0 after a rejectionas, by (IH1), the buyer offers p ∈ Pt′ in every subsequent period. Therefore, by rejecting, a low-type seller’sexpected payoff equals 0. By accepting, however, his expected payoff is at least cH − cL > 0. Therefore, alow-type seller accepts pt ≥ cH with probability 1.

Claim 2. Fix a history h ∈ Ht, and suppose that pt = p < cL. Then, P (at = 1|θ = L, ht = h, pt = p) = 0.

Proof. By (17), a high-type seller rejects a price offer p < cL with probability one. Therefore, if the low-typeseller accepts p < cL with positive probability, then after an acceptance by Bayes’ rule one has µt′ = 0 for allt′ > t and a low-type seller’s period t+ 1 continuation value equals 0 as, by (IH1), the buyer offers p ∈ Pt′in every subsequent period. Therefore, by accepting, the seller’s expected payoff equals p− cL < 0. On theother hand, by rejecting, the seller’s expected payoff is greater than or equal to zero because, by (IH1), the

14Without the equilibrium refinement, we can establish the slightly weaker restriction that a high-type seller acceptspT = cH with probability 1 at histories where the buyer offers cH with positive probability. This can be establishedthrough a Bertrand-style argument.

15By a Bertrand-style argument, we can establish that, in equilibrium, the seller accepts pt = cH with probabilityone at histories hs = 〈θ = H,hT = h, pt = cH〉 where P(pt = cH |ht = h) > 0. This later serves as justification forimposing the refinement in other regions, even though there are other equilibria in those regions in which a high-typeseller does not accept with probability one.

16We do not need to account for whether the realized price offer, pt = p, occurs with positive probability inequilibrium or not because the buyer’s belief in the next period are pinned down in the same way due to the nosignaling what you do not know refinement. See Remark 1.

49

seller does not accept an offer less than cL in future periods. Hence, a low-type seller rejects p < cL withprobability 1.

Claim 3. Fix a history h ∈ Ht such that µt > µ∗t+1 and suppose that pt = p ∈ (cL, cH). Then,P (at = 1|θ = L, ht = h, pt = p) = 0.

Proof. If µt > µ∗t+1, a low-type seller strictly prefers to reject the offer. By rejecting, he will receive an offerof cH (with probability one) in the next period because, if he rejects, then µt+1 ≥ µt > µ∗t+1 where the firstinequality follows because the high type rejects with probability 1. By accepting, a low-type seller wouldreveal his type (i.e. µt+1 = 0) and, by (IH1), the buyer would offer p ∈ Pt′ in every subsequent periodbecause µt′ = 0 with probability one in equilibrium for t′ > t. Therefore, rejecting the offer p and acceptingan offer cH in the next period dominates accepting since p < cH .

Before stating the next claim, we present the following definition:

Definition 6. With a slight abuse of notation, we define the seller’s expected payoff as a function of histype and the public history in period t′, h ∈ Ht, as

V ρ,α,µt′ (〈θ, h〉) = Ept′

(V ρ,α,µt′ (〈θ, ht′ , pt′〉)

∣∣∣ht′ = h)

(23)

Claim 4. Fix a history h ∈ Ht such that µt ≤ µ∗t+1 and suppose pt = p ∈ (cL, cH). Then, every continuationequilibrium is one where the low-type seller period t response is given by P (at = 1|θ = L, ht = h, pt = p) =µ∗t+1−µt

(1−µt)µ∗t+1, a high-type seller rejects according to (17), and the buyer’s price offer in period t + 1 satisfies

(19) where ξt+1 = ξρ,α,µt+1 is defined as:

ξρ,α,µt+1 =

p− cLcH − cL

, if t = T − 1,p− cL

cH − cL + qLVρ,α,µt+2 (h1

t+2), otherwise,

(24)

where h1t+2 = 〈θ = L, ht ∪ {pt = p, at = 0, yt = y; pt+1 = cH , at+1 = 1, yt+1 = 1}〉.

Proof of Claim 4. We first show that, when µt < µ∗t+1, a low-type seller must use a mixed strategy whenoffered a price in (cL, cH). We do so by showing that pure strategies cannot occur in equilibrium.

For a contradiction, suppose that the seller’s equilibrium strategy, α, dictates that a low type accepts withprobability 1, i.e. P(at = 1|θ = L, ht = h, pt = p ∈ (cL, cH)) = 1. Then, upon a rejection, µt+1 = 1and, by (IH1), the buyer will offer pt′ = cH with probability one for t′ > t. Thus, if the equilibriumstrategy is to accept with probability 1, rejecting is a profitable deviation for the low-type seller because(T − t)(cH − cL) > p− cL. Next, suppose that the equilibrium strategy, α, dictates that the low type rejectswith probability 1 (i.e. P(at = 1|θ = L, ht = h, pt = p ∈ (cL, cH)) = 0). Then, after a rejection, the buyer’s

50

belief does not change as, by Lemma 2, the high-type strategy is to reject with probability 1. Therefore, by(IH1), the buyer offers pt′ ∈ Pt′ with probability 1 for t′ > t because µt′ = µt (as both types reject p withprobability 1) and µt < µ∗t′ for t′ ≥ t + 1. Thus, if his equilibrium strategy is to reject with probability 1,accepting and receiving p > 0 is a profitable deviation. Hence, a low-type seller must mix between acceptingand rejecting.

Moreover, at an equilibrium, the mixing between accepting and rejecting must be such that the low-type selleris indifferent between these actions. His expected value from accepting is p− cL because, upon acceptance,the buyer’s belief will update to µt+1 = 0 so, by (IH1), the buyer will offer pt′ = p ∈ Pt′ for all t′ > t.Therefore, his expected value from rejecting must also equal p− cL.

For his expected value from rejecting to equal p− cL for a given p ∈ (cL, cH), the belief at

ht+1 := 〈ht = h, (pt = p, at = 0, yt)〉, (25)

must satisfy µt+1

((ht+1

)= µ∗t+1. This is because, at any other belief, a low-type seller’s expected value

upon a rejection will either be 0, or at least cH − cL. That is, if µt+1

((ht+1

)< µ∗t+1, by (IH1), the buyer

offers pt′ ∈ Pt′ with probability 1 for t′ ≥ t+1 and the seller rejects with probability one all such price offers.On the other hand, if µt+1

((ht+1

)> µ∗t+1, the seller’s continuation payoff is at least cH − cL > p − cL as,

by (IH1), we have that pt+1 = cH and the seller accepts the offer.

Moreover, following a rejection, the probability that the buyer offers cH given µt+1 = µ∗t+1, ξt+1, must besuch that a low-type seller’s expected payoff from rejecting pt = p equals p− cL. That is, in an equilibrium,we must have V ρ,α,µt+1 (〈θ = L, ht+1 = ht+1〉) = p − cL where ht+1 is as in (25). Therefore, in what follows,we write ξρ,α,µt′ for t′ > t to make the dependence on (ρ,α,µ) explicit.

If t = T − 1, using (IH1) we have that V ρ,α,µT (〈θ = L, hT = hT 〉) = ξρ,α,µT (cH − cL) as, in period T , thebuyer will mix between offering cH with probability ξρ,α,µT , in which case the seller accepts with probability1 and gets a payoff of cH − cL, and offering a price in PT , which the seller rejects with probability 1 thusobtaining a payoff of zero. Therefore, if t = T − 1, we must have ξρ,α,µT = p−cL

cH−cL, which is well defined as

p ∈ (cL, cH).

Consider the case t + 1 < T . Let h1t+2 := 〈θ = L, ht+1 ∪ (pt+1 = cH , at+1 = 1, yt+1 = 1)〉 denote the

seller type and public history in period t + 2 when the public history at period t + 1 is given by ht+1 in(25), the buyer offers cH , the seller accepts, and the quality realization is equal to 1. Similarly, defineh0t+2 := 〈θ = L, ht+1 ∪ (pt+1 = cH , at+1 = 1, yt+1 = 0)〉 denote the seller type and the public history in

period t + 2 when ht+1 = ht+1, the buyer offers pt+1 = cH , the seller accepts, and the quality realizationis equal to 0. Finally, let ht+2 := 〈θ = L, ht+1 ∪ (pt+1 ∈ Pt+1, at+1 = 0, yt+1 = y)〉, be the seller type andpublic history in period t+ 2 when ht+1 = ht+1, the buyer offers a price in Pt+1 and the seller rejects.

51

Using the induction hypotheses, we know that V ρ,α,µt+1 (〈θ = L, ht+1 = ht+1〉) can be calculated as

V ρ,α,µt+1 (〈θ = L, ht+1 = ht+1〉) = ξρ,α,µt+1((cH − cL) + qLV

ρ,α,µt+2 (h1

t+2) + (1− qL)V ρ,α,µt+2 (h0t+2)

)+(1− ξρ,α,µt+1 )V ρ,α,µt+2 (ht+2)

= ξρ,α,µt+1((cH − cL) + qLV

ρ,α,µt+2 (h1

t+2))

The first equality follows from the fact that, at ht+1 the buyer’s belief is µt+1 = µ∗t+1 and, by (IH1), the buyeroffers cH with probability ξρ,α,µt+1 , in which case the seller accepts with probability 1 and a quality observationis realized giving rise to histories h1

t+2 and h0t+2, and offers a price in Pt+1 with probability (1 − ξρ,α,µt+1 ),

in which case the seller rejects with probability 1, and the resulting history is ht+2. In addition, we havethat V ρ,α,µt+2 (h0

t+2) = 0; this follows from the fact that, after yt+1 = 0, we have that µt+2 = µt+2(h0t+2) <

µt+1 = µ∗t+1 < µ∗t+2, where the last inequality follows from (IH1). Therefore, in periods t′ ≥ t+ 2 onwards,the buyer will offer a price in Pt′ and, since both types reject with probability one, the buyer’s belief willremain at µt′ = µt+2 < µ∗t′ for all t′ ≥ t+ 2. Similarly, V ρ,α,µt+2 (ht+2) = 0; as both types would have rejectedthe price offer in period t+ 1 with probability 1, we have µt+2(ht+2) = µt+1 = µ∗t+1 < µ∗t+2 and, in periodst′ ≥ t + 2 onwards, the buyer will offer a price in Pt′ , which both types reject with probability one, so thebuyer’s belief will remain at µt′ = µt+2 < µ∗t′ for all t′ ≥ t+ 2.

Therefore, if t + 1 < T , we must have p − cL = ξρ,α,µt+1((cH − cL) + qLV

ρ,α,µt+2 (h1

t+2))

or, equivalently,ξρ,α,µt+1 = p− cL

(cH − cL) + qLVρ,α,µt+2 (h1

t+2), which is well defined as p ∈ (cH − cL) and V ρ,α,µt+2 (h1

t+2) ≥ 0.

If at ht+1 the buyer’s strategy ρ is such that ξρ,α,µt+1 does not satisfy (24), then there is no acceptanceprobability in response to p that would make the low-type seller indifferent between accepting and rejecting(and we have already ruled out pure strategies), so there cannot be an equilibrium.

Now, given that the buyer’s belief must equal µ∗t+1 after a rejection, Bayes’ rule pins down the seller’sacceptance probability which must solve:17

µ∗t+1 = µtµt + (1− µt)P(at = 1|ht = h, pt ∈ (cL, cH)) .

Finally, consider the case µt = µ∗t+1. In this case, in every continuation equilibrium after an offer pt = p ∈(cL, cH), the low-type seller rejects with probability 1 and the buyer uses (21) with ξρ,α,µt at least as large asdescribed above in (24) so that it is a best response to reject. If α dictates that a low-type seller accept withpositive probability, then upon a rejection, pt+1 = cH is offered with probability one because the buyer’sbelief updates to µt+1 > µt = µ∗t+1, so the low-type would prefer to reject. There is no equilibrium where ρdictates ξρ,α,µt be strictly less than described in (24) after pt = p ∈ (cL, cH) and at = 0 because a low-typeseller will, again, always want to deviate from any given acceptance probability.

17This is again possible based on our refinements. See Remark 1.

52

Claim 5. Fix a history h ∈ Ht, and suppose pt = cL. Then, P (at = 1|θ = L, ht = h, pt = p) = ψt for some

ψt ∈[0,(

µ∗t+1−µt

(1−µt)µ∗t+1

)+].

Proof. The proof of this claim largely mimics that of Claim 4. The only difference in the analysis from thecase where pt ∈ (cL, cH) is that if µt < µ∗t+1, in equilibrium, the low-type seller strategy can be to acceptwith any probability less than or equal to

(µ∗t+1−µt

(1−µt)µ∗t+1

). This is due to the fact that, given any of these

acceptance probabilities, the buyer offers pt′ ∈ Pt following a rejection or an acceptance, so there is noprofitable deviation for a low-type seller. As above though, the low-type seller strategy cannot be to acceptpt = cL with probability greater than

(µ∗t+1−µt

(1−µt)µ∗t+1

); otherwise, deviating to a rejection would be optimal.

We omit the formal proof for the sake of brevity.

This concludes the proof of the lemma.

Proof of Lemma 4. We first show, in Claim 6, that the only price offers that can arise with positiveprobability in equilibrium are those in {cH} ∪ Pt.

Claim 6. Fix a history h ∈ Ht. Then, P(pt ∈ {cH} ∪ Pt) = 1, that is, with probability 1, the buyer offerseither cH or a price offer that is rejected with probability 1.

Proof. We first establish that, in equilibrium, one must have P(pt > cH) = 0 because otherwise the buyercould deviate to offer cH with probability 1 and strictly increase her expected payoff. This follows because,by Lemmas 2 and 3 all offers of p ≥ cH are accepted with probability 1. Therefore, the belief in t + 1 willbe updated to σ(µt) if yt = 1 and to φ(µt) if yt = 0, and thus the buyer’s expected period t+ 1 equilibriumcontinuation payoff equals (by (IH3)):

η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)),

which is independent of p.

Therefore, by Lemmas 2 and 3, the only remaining price offers that could be made and accepted withpositive probability are those in p ∈ [cL, cH); prices p < cL are rejected by probability with 1 by both types.However, by Lemmas 2 and 3, the seller accepts any offer of pt = p ∈ (cL, cH) with the same probability,so none of these prices can occur in equilibrium. To see why, note that if there existed a history h ∈ Ht

such that, for δ > 0, P(pt ∈ (δ, cH), at = 1|ht = h) > 0, then the buyer could deviate to a strategy inwhich she shifts the probability that p ∈ (δ, cH) to p = δ/2 and strictly improve her payoff. This deviationclearly increases her expected payoff by reducing the price she pays in the current period conditional onan acceptance, but does not alter her expected continuation payoff in period t + 1 after an acceptance orrejection as the acceptance probability is independent of price. Thus, in equilibrium, the buyer strategymust satisfy P(pt ∈ (cL, cH), at = 1) = 0 because the above argument holds for all δ > 0.

53

Moreover, as in period T , the buyer only offers pt = cL with positive probability if it is accepted withprobability 0. If µt < 1 and ψt > 0 in (18), then offering pt < cL strictly dominates offering cL because thebuyer’s period t+ 1 expected payoff is the same after an acceptance or rejection, but her expected payoff inthe current period is strictly less from oan offer of cL because qL − cL < λ.

We can now proceed to the proof of Lemma 4. Fix h ∈ Ht. By Claim 6 the buyer offers either cH or a priceoffer in Pt and, by Lemmas 2 and 3 we have that pt = cH is accepted with probability 1 by both types, andpt ∈ Pt is rejected with probability one by both types. Therefore, using (IH3), we can express the expectedpayoff from offering pt = cH is in period t given belief µt as (with a slight abuse of notation):

Ut(µt|pt = cH) := Uρ,α,µt (h|pt = cH) = η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)) (26)

and the expected payoff from offering pt ∈ Pt as

Ut(µt|pt ∈ Pt) := Uρ,α,µt (h|pt ∈ Pt) = λ+ U∗t+1(µt) (27)

Therefore, to find the buyer’s period-t strategy we must see whether U(µt|pt = cH) > U(µt|pt ∈ Pt) or viceversa. First, we prove that at beliefs µt ≥ µ∗t+1, offering pt = cH is a strictly dominant action. To provethis, we first have:

Ut(µt|pt ∈ Pt) = λ+ U∗t+1(µt) = λ+ η(µt)− cH + η(µt)U∗t+2(σ(µt)) + (1− η(µt))U∗t+2(φ(µt))

where the last equality follows from the fact that µt ≥ µ∗t+1 and thus, in period t + 1, the buyer offers cHand the seller accepts. (If µt = µ∗t+1 the buyer need not offer cH in period t + 1 but we have establishedthat she obtains the same utility as she is indifferent between offering cH and offering a price in Pt+1, so theanalysis above remains valid.) Therefore, substituting in, we want to show:

η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)) > λ+ η(µt)U∗t+2(σ(µt)) + (1− η(µt))U∗t+2(φ(µt))

or, equivalently,

η(µt)(U∗t+1 (σ(µt))− U∗t+2 (σ(µt))

)+ (1− η(µt))

(U∗t+1(φ(µt))− U∗t+2(φ(µt))

)> λ

We can lower bound U∗t+1(φ(µt)) − U∗t+2(φ(µt)) with λ as, in period t + 1, the buyer can always offer aprice in Pt+1 and receive λ + U∗t+2(φ(µt)). Moreover, we have that U∗t+1(σ(µt)) > λ + U∗t+2(σ(µt)) asσ(µt) > µt ≥ µ∗t+1 and, by (IH1), at this belief, offering pt+1 = cH is strictly preferred to offering a pricethat is rejected.

Second, when µt = 0, offering a price pt ∈ Pt is strictly dominant since λ > qL − cL.

54

Finally, at histories h ∈ ht such that µt(h) ∈ (0, µ∗t+1), there exists a unique threshold at which the buyer isindifferent between offering cH and offering a price in Pt, and where offering pt = cH with probability 1 isstrictly dominant for µt > µ∗t and offering a price in Pt with probability 1 is strictly dominant for µt < µ∗t .

To prove this, it is sufficient to show that the buyer’s expected payoff from offering cH is continuous andstrictly increasing on [0, µ∗t+1] because, by (IH1) and (IH3), the value of offering p ∈ Pt on this interval isconstant (and therefore continuous on [0, µ∗t+1] as well) at (T−t+1)λ (as µt < µ∗t+1 and thus, at equilibrium,the buyer will offer a price in Pt′ in all remaining periods t+ 1 ≤ t′ ≤ T ).

The buyer’s expected payoff from offering cH for µt < µ∗t+1 equals:

Ut(µt|pt = cH , µt < µ∗t+1) = η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt))

= η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))(T − t)λ

= (T − t)λ− cH + η(µt) + η(µt)(U∗t+1(σ(µt))− (T − t)λ) (28)

where in the first line we used that pt = cH is accepted with probability 1 and in the second line we usethat φ(µt) < µt < µ∗t+1, and by (IH1), the buyer offers p ∈ Pt′ with probability one for each t′ > t. U∗t+1

is convex by (IH3), so it is continuous on the interior (0, 1). Moreover, U∗t+1 is continuous at µt = 0 as forµt < µ∗t+1, U∗t+1(µt) = (T − t)λ). Since σ(µ) and η(µ) are continuous as well, (28) is continuous on [0, µ∗t+1].

The first terms in (28) are constant. Moreover, η(µt) is strictly increasing in µt since qH−qL > 0. Therefore,it suffices to show that the last term is weakly increasing in µt. By (IH3), U∗t+1 is convex and, by (IH1),equal to (T − t)λ for µt+1 < µ∗t+1 because the buyer’s price offer will be rejected with probability one in eachperiod t′ > t. Therefore, U∗t+1(σ(µt))− (T − t)λ is weakly increasing in µt and positive. Moreover, η(µt) ispositive and increasing, so their product is (weakly) increasing. Therefore, the value of offering cH is strictlyincreasing in µt relative to the value of offering pt ∈ Pt on this range so a threshold policy is optimal.

Let µ∗t be the belief at which the buyer is indifferent between offering cH and p ∈ Pt. That is, at µ∗t we musthave

η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)) = (T − t+ 1)λ.

This concludes the proof of the lemma.


The proof follows the ideas in the proof of Theorem 1.

B.2.1 Preliminaries

Notation. See Appendix B.1 for discussion on notation.

55

Proof Technique. Our proof is based on backward induction over time periods. Moreover, within eachperiod t, we characterize the necessary conditions for the buyer’s and seller’s equilibrium strategies, leveragingthe agents’ sequentially rationality at time t for every history. We show that there exists a strictly increasingsequence of thresholds {µ∗t : t = 1, . . . , T} with 0 < µ∗1 < µ∗2 < · · · < µ∗T < 1, such that equilibriumstrategies must satisfy for each t, h ∈ Ht, and p ∈ R:

P(at = 1|θ = H,ht = h, pt = p) =

1, if p ≥ cH

0, if p < cH ,(29)

P(at = 1|θ = L, ht = h, pt = p) =


(1−µt)µ∗t+1

)+, if cL < p < cH

ψt ∈[0,(

µ∗t+1−µt

(1−µt)µ∗t+1

)+], if p = cL

0, if p < cL,

(30)

pt =


cH w.p. ξt ∈ [0, 1], if µt = µ∗t

cL w.p. 1− ξt, if µt = µ∗t

cL w.p. 1, if µt < µ∗t ,

(31)

where we define µ∗T+1 := 1.

Induction hypotheses. Our first induction hypothesis is as follows:

(IH1) Fix t < T . For periods t′ = t+1, t+2, ..., T . There exists a sequence of thresholds 0 < µ∗t+1 < µ∗t+2 <

· · · < µ∗T < 1 such that, in any equilibrium, the price offer and acceptance decision, pt′ , at′ , satisfy theproperties in equations (31), (29), (30), respectively.18 Moreover, offering cH with probability one atbeliefs µt′ > µ∗t′ and offering cL with probability one at beliefs µt′ < µ∗t′ are strictly dominant actionsfor the buyer.

Our proof technique is effective due to our refinements; see Remark 1 in the proof of Theorem 1.

Our second induction hypothesis is used for establishing that the set of equilibria is non-empty.

(IH2) Fix t < T . The set of equilibria for the game with T − t periods is non-empty.18We highlight that these are necessary conditions for equilibrium strategies but not sufficient —in particular, ξt′

and ψt′ will be pinned down at some histories.

56

The proof deviates from the proof of Theorem 1 in subtle but important ways. First, (IH1) and (IH2) formthe basis of much of the theorem, but we also need to prove that the acceptance probability of a low-typeseller ψt (see (11)) is pinned down after an offer of pt = cL. We can only establish this when consideringthe buyer’s action in period t. Additionally, we need to establish that ξt+1 = 0 if pt = cL, which again canonly be done based on the analysis of period t. Therefore, working backwards we characterize necessaryconditions that the equilibrium strategies must satisfy as in the proof of Theorem 1, but in the inductivestep, for some histories, we further pin down ψt′ and ξt′ by working forward in time.

Let {µ∗t : t = 1, . . . , T} be the sequence of thresholds in (IH1), and let η, σ, and φ be the functions definedin (9). Then, for a fixed t ≤ T , we define, recursively, the functions U∗t : [0, 1]→ R as follows:

U∗t′(µ) =

η(µ)− cH + η(µ)U∗t′+1(σ(µ)) + (1− η(µ))U∗t+1(φ(µ)), if µ ≥ µ∗t′(1− µ

µ∗t+1

)(qL − cL + U∗t+1(0) +

(µ

µ∗t+1

)(λ+ U∗t+1(µ∗t+1)), otherwise .

(32)

with terminal function UT+1(µ) = 0 where {µ∗t′} is the sequence of thresholds in (IH1), and η, σ and φ aredefined in (9). Then, our third induction hypothesis can be stated as:

(IH3) Fix t < T . Suppose that for periods t′ = t+ 1, t+ 2, ..., T , (ρ,α,µ) satisfies (IH1). Then, for everyhistory h ∈ Ht′ ,Uρ,α,µt′ (h) = U∗t′(µt′(h)).

B.2.2 Auxiliary results

Lemmas 1-3 in §B.1.2 are used when proving the theorem in §B.2.3. For brevity, we do not repeat the proofsas they do not change when we replace the induction hypotheses in Theorem 1 with the current inductionhypotheses.

Lemma 4 is replaced by the following Lemma whose proof appears in §B.2.4. This result characterizes thebuyer’s action at period t < T , assuming our induction hypotheses hold for every future period t′ > t, andbased on the characterization of the seller’s action.

Lemma 5. (Buyer’s action at period t < T ) Fix a public history h ∈ Ht and suppose that (IH1), (IH2),and (IH3) hold for every t′ > t. The period-t buyer’s strategy must satisfy:

pt =


cH w.p. ξt ∈ [0, 1], if µt = µ∗t


cL w.p. 1, if µt < µ∗t

(33)

57

for some µ∗t < µ∗t+1. Moreover, if pt = cL, then for all t′ > t, P(pt′ = cL) = 1 and:

P (at′ = 1|θ = L, ht′ = h, pt′ = cL) =µ∗t′+1 − µt′

(1− µt′)µ∗t′+1.

B.2.3 Proving the theorem

In Part (I) of the proof we analyze period T , and in part (II) of the proof we analyze the induction stepfor a general period t < T and establish the result. Fix parameters qH , cH , λ, qL, cL such that qH − cH >

qL − cL > λ. Fix T ∈ N.

Part I: Base Case (Period T). Using sequentially rationality, we work backwards beginning with theseller’s action. Then, taking the seller’s action as given, we characterize the buyer’s action. We finally provethat the set of equilibria is non-empty, and that (IH3) is satisfied.

The seller’s action at period T is characterized by Lemma 1, stating that for a fixed a history h ∈ HT anda price offer p. Any equilibrium strategy of the seller must satisfy:

P (aT = 1|θ = H,hT = h, pT = p) =

1, if p ≥ cH

0, if p < cH ,

P (aT = 1|θ = L, hT = h, pT = p) =

1, if p > cL

0, if p < cL

ψT ∈ [0, 1], if p = cL.

So far, the proof is identical to the proof of Theorem 1. The first difference in this region (which we prove inClaim 9) below) is that ψT is pinned down at some on-path histories in equilibrium. Specifically, ψT = 1 afterany history h ∈ HT where P(pT = cL|hT = h) > 0 and µT (hT ) < 1; that is, at any history h ∈ HT wherethe buyer assigns positive probability to the seller being a low type and offers cL with positive probability,a low-type seller accepts cL with probability one.

We turn to characterize the buyer’s strategy in period T . Given history hT = h, the buyer has belief

58

µT := µT (h), so the buyer’s expected payoff, given strategy ρ ∈ P is:

Uρ,α,µT (h) = µT

∫ ∞−∞

(EyT

(yT − p

∣∣θ = H, aT = 1)P(aT = 1|θ = H,hT = h, pT = p)

+λP(aT = 0|θ = H,hT = h, pT = p))dFρT (h; p)

+(1− µT )∫ ∞−∞

(EyT

(yT − p|θ = L, aT = 1)P(aT = 1|θ = L, hT = h, pT = p)

+λP(aT = 0|θ = L, hT = h, pT = p))dFρT (h; p)

(a)= µT · EpT

((qH − pT )1{pT ≥ cH}+ λ1{pT < cH}

∣∣∣∣ht = h

)+(1− µT ) · EpT

((qL − pT )1{pT > cL}+ λ1{pT < cL}+ (34)(

ψT (qL − cL) + (1− ψT )λ)1{pT = cL}

∣∣∣∣hT = h

)= EpT

(µT (qH − qL) + qL − pT

∣∣∣∣pT ≥ cH , hT = h

)P(pT ≥ cH |hT = h)

+EpT

((1− µT )(qL − pT ) + µTλ

∣∣∣∣cL < pT < cH , hT = h

)P(cL < pT < cH |hT = h)

+EpT

((1− µT )ψT (qL − pT ) + (µT + (1− µT )(1− ψT ))λ

∣∣∣∣pT = cL, hT = h

)P(pT = cL|hT = h)

+λP(pT < cL|hT = h)

where (a) follows from using the seller strategy and taking expectation with respect to yT . By sequentialrationality, for any given belief, the buyer’s strategy maximizes his expected payoff. One may observe thatprice offers p > cH cannot occur with positive probability in equilibrium as they are dominated by an offerof pT = cH , which is also accepted with probability one.

In the next two claims we show that price offers p < cL and price offers p ∈ (cL, cH) cannot occur withpositive probability in equilibrium in period T .

Claim 7. For any h ∈ HT , P(pT < cL|hT = h) = 0.

Proof. We show that offering pT < cL with positive probability is dominated by either pT = cH withprobability one or pT = p with probability one for some p ∈ (cL, cH).

Offering p < cL results in payoff of λ as it is rejected with probability one. If µT := µT (hT ) = 1, then offeringcH with probability one is better as qH − cH > λ. If µT < 1, then offering p = cL + ε for 0 < ε < qL− cL−λgenerates payoff µTλ+(1−µT )(qL−cL−ε) > λ since qL−cL−ε > λ. Therefore, offering pT < cL with positiveprobability is a dominated strategy and cannot be offered with positive probability in equilibrium.

Claim 8. For any h ∈ HT , P(pT ∈ (cL, cH)|hT = h) = 0.

59

Proof. Fix history h ∈ HT such that µT (h) = 1. In this case, offering cH with probability one generateshigher expected payoff than offering p ∈ (cL, cH)cH because qH − cH > λ.

Fix history h ∈ HT such that µT (h) < 1 and fix any δ > 0. If P(pT ∈ (δ, cH)|hT = h) > 0, then thebuyer could deviate to a strategy in which she shifts the probability that p ∈ (δ, cH) to p = δ/2 and strictlyimprove her payoff because the offer is accepted with the same probability but she pays a lower price uponan acceptance. Thus, in equilibrium, the buyer strategy must satisfy P(pT ∈ (cL, cH)) = 0 because the aboveargument holds for all δ > 0.

In the next claim, we establish that, with probability one, a low-type seller accepts offers of cL that are madein equilibrium.

Claim 9. Fix history h ∈ HT such that µT (h) < 1. If P(pT = cL|hT = h) > 0, then

P (aT = 1|θ = L, hT = h, pT = cL) =: ψT = 1.

Proof. Fix h ∈ HT such that µT (h) < 1. Suppose, by contradiction, that the buyer offers cL and ψT < 1.The buyer’s expected payoff from offering cL is:

µTλ+ (1− µT )(1− ψT )λ+ (1− µT )ψT (qL − cL)

In this case, the buyer can deviate to an offer of pT = cL + ε for 0 < ε < qL − cL − λ. This offer generatespayoff:

µTλ+ (1− µT )(qL − cL − ε)

Since qL − cL − ε > λ, deviating and offering pT = cL + ε with probability one is profitable, so there cannotbe an equilibrium where ψT < 1 if µT (h) < 1 and P(pT = cL|hT = h) > 0.

Therefore, in combination with the characterization of the low-type seller strategy in (34), we have proventhe claim.

Define µ∗T := cH−cL

qH−λ−cL. With the above claims, one can observe that, if µT > µ∗T , (34) is maximized when

pT = cH with probability one; if µT < µ∗T , (34) is maximized when pT = cL with probability one; if µT = µ∗T ,(34) is maximized by pT ∈ {cL, cH}. Therefore, pT must satisfy:

pT =

cH w.p. 1, if µT > µ∗T

cH w.p. ξT ∈ [0, 1], if µT = µ∗T

cL w.p. 1− ξT , if µT = µ∗T

cL w.p. 1, if µT < µ∗T

(35)

60

At µT = µ∗T , the buyer is indifferent between offering cL and offering cH , so she can mix between these twotypes of offers. When considering the seller’s strategy in the induction step, we will show that this mixingprobability, ξT , is pinned down at some on-path histories just as ψT was pinned down at certain histories.In particular, if T > 1, ξT = 0 if pT−1 = cL. Note that if T = 1, then the condition that ξT = 0 if pT−1 = cL

does not apply.

We have proven that any equilibrium strategies must satisfy the properties stated in the theorem in PeriodT . Moreover, defining µ∗T+1 = 1, we have shown that (IH1) holds in period T .

(IH2): Equilibrium existence. To show the set of equilibria is non-empty for T = 1, we construct anequilibrium as follows. Let ρT be as described in (35) with ξT equal to 0; let αT be defined as in (34) and(34) with ψT equal to 1 at all histories; and let µT be calculated as required by our equilibrium conceptwhenever possible and equal to 0 at any other histories. It is straightforward from the previous analysis that(ρT , αT , µT ) describes an equilibrium for T = 1; thus, (IH2) holds.

(IH3): For every history h ∈ HT , we have that Uρ,α,µT (h) = U∗T (µT (h)) whenever (ρ,α,µ) satisfies(IH1). Suppose that (ρ,α,µ) satisfies (IH1). Fix a history h ∈ Ht. By our previous discussion, we candefine the buyer’s period T value function as

U(ρ,α,µ)T (h) :=

(1− µT )(qL − cL) + µTλ, if µT (h) ≤ µ∗T(qH − qL)µT + qL − cH , if µT (h) > µ∗T

(36)

Note that this can be expressed as a function of µT (h), and it agrees with the definition of U∗T (·). Moreover,U∗T (µT ) is convex in µT since qH − qL > λ+ cL − qL and the two expressions in (36) are equal at µ∗T . Thus,(IH3) holds.

Part II: Inductive Step (Period t < T ). Fix 1 ≤ t < T . Suppose that the Induction Hypotheses (IH1),(IH2), and (IH3) hold for every t′ > t. As before, we start by characterizing the seller’s period-t strategy forevery history h ∈ Ht and every price offer pt = p. By sequential rationality and taking the seller’s period-tstrategy as given, we characterize the buyer’s period-t strategy, which completes the proof of (IH1). We thenproceed to show (IH2) and (IH3).

Seller’s action at period t. The proof of the seller strategy is identical to the proof in Theorem 1.19

The high-type seller’s action at time t < T is characterized by Lemma 2, stating that for any history h ∈ Ht,and price offer p ∈ R, if (IH1), (IH2), and (IH3) hold for every t′ > t, then any equilibrium strategy of ahigh-type seller at time t is:

19In this region, for some combinations of parameters, there are equilibria where a high-type seller does not acceptan offer of cH with probability one. Our refinement rules them out.

61

P (at = 1|θ = H,ht = h, pt = p) =

1, if p ≥ cH

0, if p < cH .(37)

Lemma 3 characterizes properties that the low-type seller equilibrium strategy must satisfy at period t < T ,assuming that our induction hypotheses hold in every future period t′ > t. It states that for any historyh ∈ Ht and a price offer p ∈ R, if (IH1), (IH2), and (IH3) hold for every t′ > t, then the equilibrium strategyof a low-type seller at must satisfy:

P (at = 1|θ = L, ht = h, pt = p) =


(1−µt)µ∗t+1

)+, if cL < p < cH

ψt ∈[0,(

µ∗t+1−µt

(1−µt)µ∗t+1

)+], if p = cL

0, if p < cL

(38)

Buyer’s period t strategy. So far we have characterized properties that must be satisfied by theperiod t seller’s equilibrium strategies. We now use them to characterize the buyer’s period t strategy. ByLemma 5, we know that the period-t buyer’s strategy must satisfy:

pt =


cH w.p. ξt ∈ [0, 1], if µt = µ∗t


cL w.p. 1, if µt < µ∗t

(39)

for some µ∗t < µ∗t+1. Moreover, if pt = cL, then for all t′ > t, P(pt′ = cL) = 1 and:


(1− µt′)µ∗t′+1.

This result completes (IH1). Moreover, it completes the proof of 2 apart from equilibrium existence whichwe establish now.

(IH2): Equilibrium existence for T − t+ 1 periods. We now construct an equilibrium to showthe set of equilibria is non-empty for T − t + 1 periods and that (IH2) holds in period t. By (IH2), thereexists a continuation equilibrium for periods t + 1 to T that satisfies the other induction hypotheses. Fort′ > t, let ρt′ , αt′ , µt′ be as described in this continuation equilibrium. However, for pt+1, set ξt+1 as in (24)for (off-path) histories where pt = p ∈ (cL, cH) and equal to 0 at histories h ∈ Ht+1 where pt = cL. Definept according to (33). Define αt as in (37) and (38) with ψt equal to µ∗t+1−µt

(1−µt)µ∗t+1for all histories; and let µt be

calculated as required by our equilibrium concept where possible and equal to 0 at any other histories. It

62

is straightforward from the previous analysis that (ρ, α, µ) describes an equilibrium for T − t + 1 periods.Thus, (IH2) holds.

(IH3): For every history ht ∈ Ht, we have that Uρ,α,µt (ht) = U∗t (µt(ht)) whenever (ρ,α,µ)satisfies (IH1) for all period t′ ≥ t. Fix a history ht ∈ Ht and suppose that (ρ,α,µ) satisfy (IH1).By our previous discussion, the buyer’s period-t value function, U (ρ,α,µ)

t (h) is given by the expression in (40)if µt(ht) ≥ µ∗t and by the expression in (42) if µt(ht) < µ∗t (these expressions are equal at µ∗t ). Note that thiscan be expressed as a function of µt(h), and it agrees the with the definition of U∗t (·) in (32). Moreover, U∗tis convex because there are a finite number of possible continuation policies (given that the optimal policywill be either offer cH or cL), each of which generates an expected payoff that is linear in µt. U∗t (µ) is thepointwise maximum of these policies so is convex.

B.2.4 Proofs of auxiliary reuslts

Proof of Lemma 5.

Proof. As an initial step to proving Lemma 5, we first show in the following claim, that the only price offersthat can arise with positive probability in equilibrium are {p : p ≤ cH}.

Claim 10. Fix a history h ∈ Ht. P(pt ≤ cH |ht = h) = 1.

Proof. Suppose for a contradiction, that P(pt > cH |ht = h) > 0. The buyer could deviate from her strategyand shift the probability of an offer greater than cH to cH and strictly increase her expected payoff. Thisfollows because, by (37) and (38) all offers of p ≥ cH are accepted with probability 1. Therefore, the beliefin t+ 1 will be updated to σ(µt) if yt = 1 and to φ(µt) if yt = 0, and thus the buyer’s expected period t+ 1equilibrium continuation payoff equals (by (IH3)):

η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)),

which is independent of p.

Having shown that the buyer offers pt ≤ cH with probability one, consider the expected payoff from offeringeach of the remaining prices. Fix h ∈ Ht. By (37) and (38) we have that pt = cH is accepted with probability1 by both types, pt = p ∈ (cL, cH) is rejected with probability 1 by a high-type seller and accepted withprobability

(µ∗t+1−µt

(1−µt)µ∗t+1

)+by a low-type seller, pt = cL is rejected with probability 1 by a high-type seller

and accepted with probability ψt by a low-type seller, and pt < cL is rejected with probability one by bothtypes. Using (IH3), we can express the expected payoff from offering pt = cH is in period t given belief µt

63

as (with a slight abuse of notation):

Ut(µt|pt = cH) := Uρ,α,µt (h|pt = cH) = η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)), (40)

the expected payoff from offering pt = p ∈ (cL, cH) as

Ut(µt|pt ∈ (cL, cH)) := Uρ,α,µt (h|pt ∈ (cL, cH) =(

1− µtµ∗t+1

)+ (qL − pt + U∗t+1 (0)

)+(

µtµ∗t+1

)(λ+ U∗t+1

(µ∗t+1

)),

(41)

the expected payoff from offering pt = cL as

Ut(µt|pt = cL) := Uρ,α,µt (h|pt = cL) = (1− µt)ψt(qL − cL + U∗t+1 (0)

)+(µt + (1− µt)(1− ψt)

)(λ+ U∗t+1

(µt

µt + (1− µt)(1− ψt)

)),

(42)

and the expected payoff from offering pt < cL as

Ut(µt|pt < cL) := Uρ,α,µt (h|pt < cL) = λ+ U∗t+1(µt). (43)

To complete the proof, we consider two sets of histories separately. First, we consider the buyer’s actionat histories where µt(h) ≥ µ∗t+1 and then consider the buyer’s action at histories where µt(h) < µ∗t+1. Thefollowing claim follows the proof of Lemma 4 very closely.

Claim 11. Fix a history h ∈ Ht such that µt(h) ≥ µ∗t+1. Offering pt = cH with probability 1 is a strictlydominant strategy, so P(pt = cH |ht = h) = 1.

Proof. Fix a history h ∈ Ht such that µt(h) ≥ µ∗t+1. We now show that offering pt = cH is a strictlydominant strategy. First, note that:

Ut(µt|pt = cL) = Ut(µt|pt < cL) = Ut(µt|pt ∈ (cL, cH))

= λ+ U∗t+1(µt)

= λ+ η(µt)− cH + η(µt)U∗t+2(σ(µt)) + (1− η(µt))U∗t+2(φ(µt))

where the last equality follows from the fact that µt ≥ µ∗t+1 and thus, in period t + 1, the buyer offers cHand the seller accepts. We consider pt = cL, pt ∈ (cL, cH), and pt < cL together because a low-type sellerrejects all these offers with probability 1 if µt ≥ µ∗t+1, see equation (38). (If µt = µ∗t+1 the buyer need notoffer cH in period t+1 but we have established that she obtains the same utility as she is indifferent between

64

offering cH and offering a price in cL, so the analysis above remains valid.) Therefore, substituting in, wewant to show:

η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt)) > λ+ η(µt)U∗t+2(σ(µt)) + (1− η(µt))U∗t+2(φ(µt))

or, equivalently,

η(µt)(U∗t+1 (σ(µt))− U∗t+2 (σ(µt))

)+ (1− η(µt))

(U∗t+1(φ(µt))− U∗t+2(φ(µt))

)> λ

We can lower bound U∗t+1(φ(µt)) − U∗t+2(φ(µt)) with λ as, in period t + 1, the buyer can always offer aprice p < cL and receive λ + U∗t+2(φ(µt)). Moreover, we have that U∗t+1(σ(µt)) > λ + U∗t+2(σ(µt)) asσ(µt) > µt ≥ µ∗t+1 and, by (IH1), at this belief, offering pt+1 = cH is strictly preferred to offering a pricethat is rejected.

We now turn to the second set of histories. Namely, we consider where µt(h) < µ∗t+1 and first show that wecan disregard more prices.

Claim 12. P(pt < cL|ht = h) = P(pt ∈ (cL, cH)|ht = h) = 0.

Proof. Fix history h ∈ Ht. If µt(h) ≥ µ∗t+1, then the Claim follows from Claim 11.

Alternatively, suppose µt(h) < µ∗t+1. First, since the seller accepts any offer of pt = p ∈ (cL, cH) with thesame probability, none of these prices can occur in equilibrium. To see why, note that if there existed ahistory h ∈ Ht such that, for δ > 0, P(pt ∈ (δ, cH), at = 1|ht = h) > 0, then the buyer could deviate toa strategy in which she shifts the probability that p ∈ (δ, cH) to p = δ/2 and strictly improve her payoff.This deviation clearly increases her expected payoff by reducing the price she pays in the current periodconditional on an acceptance, but does not alter her expected continuation payoff in period t + 1 after anacceptance or rejection as the acceptance probability is independent of price, see equations (37) and (38).Thus, in equilibrium, the buyer strategy must satisfy P(pt ∈ (cL, cH)) = 0 because the above argument holdsfor all δ > 0.

Moreover, for p ∈ (cL,min {qL − λ, cH}):(1− µt

µ∗t+1

)+ (qL − p+ U∗t+1 (0)

)+(

µtµ∗t+1

)(λ+ U∗t+1

(µ∗t+1

))> Ut(µt|pt < cL)

because (1− µt

µ∗t+1

)+ (U∗t+1 (0)

)+(

µtµ∗t+1

)(U∗t+1

(µ∗t+1

))≥ U∗t+1(µt)

(which follows by Jensen’s inequality because U∗t+1 is convex by (IH3)) and qL − p > λ. Therefore, P(pt <cL) = 0

65

We have ruled out the possibility of any price offer being made with positive probability except for cL andcH . We can now pin down ψt′ and ξt′ for all t′ > t when the buyer’s strategy consists of offering cL withpositive probability in period t.

Claim 13. Fix a history h ∈ Ht such that µt(h) < µ∗t+1. If P(pt = cL|ht = h) > 0, then

P (at = 1|θ = L, ht = h, pt = p) =µ∗t+1 − µt

(1− µt)µ∗t+1.

Moreover, if pt = cL, then for all t′ > t, P(pt′ = cL) = 1 and:


(1− µt′)µ∗t′+1.

Proof. Fix h ∈ Ht such that µt(h) < µ∗t+1. Suppose, by contradiction, that the buyer offers cL with positiveprobability and ψt <

µ∗t+1−µt

(1−µt)µ∗t+1=: At. The buyer’s expected payoff from offering cL is:

µtλ+ (1− µt)(1− ψt)(λ+ (1− µt)ψt(qL − cL) + U∗t+1(µt)

In this expression, we have used that U∗t+1 is linear on the range (0, µ∗t+1) and the buyer’s belief after arejection, µt

µt+(1−µt)(1−ψt) ≤ µ∗t+1, so:

Eat

(U∗t+1(µt+1(ht+1))|ht = h, pt = cL

)= U∗t+1 (Eat

(µt+1(ht+1)|ht = h, pt = cL)) = U∗t+1(µt) (44)

In this case, the buyer can shift the probability of offering pt = cL to pt = cL+ε for 0 < ε <(At−ψt

At

)(qL − cL − λ).

This price offer generates payoff:(1− µt

µ∗t+1

)(qL − cL − ε) + µt

µ∗t+1(λ) + U∗t+1(µt)

Since At(qL− cL− ε) + (1−At)λ > ψt(qL− cL) + (1−ψt)λ, (where the left hand side is the expected payofffrom an offer of cL + ε conditional on the seller being a low type and the right hand side is the expectedpayoff from an offer of cL conditional on the seller being a low type), deviating and offering pt = cL + ε

with probability one is profitable for the buyer, so there cannot be an equilibrium where ψt <µ∗t+1−µt

(1−µt)µ∗t+1if

µt(h) < µ∗t+1 and P(pt = cL|ht = h) > 0.

Therefore, in combination with the characterization of the low-type seller strategy in (38), we have proventhat after public history h, in equilibrium, ψt = At if P(pt = cL|ht = h) > 0. Moreover, this requirement onψt pins down ξt+1 at h = 〈ht = h, (pt = cL, at = 0, yt = y). Specifically, after a rejection of an offer of cL,the buyer has belief µt+1(h) = µ∗t+1. At this belief, (IH1) allows the buyer to mix between cL and cH withany probability ξt+1 ∈ [0, 1], but we can now show that equilibrium requires ξt+1 = 0 at this history.

Since the low-type seller strategy is mixed, (that is 0 < ψt < 1), a low-type seller must be indifferent between

66

accepting and rejecting. By accepting, µt+1(〈ht = h, (pt = cL, at = 1, yt = y)) = 0 by Bayes rule, so by(IH1) the buyer offers cL in every subsequent period. Therefore, a low-type seller has 0 continuation payofffrom accepting. This implies that rejecting must also lead to a payoff of 0 or there cannot be an equilibrium.By (IH1), the buyer offers cH or cL when she has belief equal to µ∗t+1 in period t+ 1. For a low-type sellerto have zero continuation payoff from a rejection, the buyer must offer cL with probability 1 following arejection of cL. Therefore, in period t+ 1, after a rejection of cL, the buyer offers cL again with probability1 - that is ξt+1 = 0 at h.

Now, we can replicate the first half of the above argument to prove that ψt+1 is pinned down after the historyh. Namely, ψt+1 = At+1. And then replicate the second half to show that ξt+2 = 0 after a rejection of cL inperiod t+ 1. Continuing forward like this, our equilibrium requires that for each t′ > t, the buyer offers cLwith probability 1 and ψt′ = At′ .

Claim 14. Fix a history h ∈ Ht such that µt(h) = 0. Offering cL with probability 1 is a strictly dominantstrategy, so P(pt = cL|ht = h) = 1.

Proof. By Claim 13 and (42), offering cL generates expected payoff

(qL − cL) + U∗t+1(0)

By (40), offering cH generates expected payoff

(qL − cH) + U∗t+1(0)

The claim follows as cH > cL.

Finally, consider histories h ∈ ht such that µt(h) ∈ (0, µ∗t+1). We have shown in Claims 10 and 12 thatthe buyer offers either cL or cH . We now show that there exists a unique threshold at which the buyer isindifferent between offering cH and offering cL, and where offering pt = cH with probability 1 is strictlydominant for µt > µ∗t and offering cL with probability 1 is strictly dominant for µt < µ∗t .

Because we have already shown that cL is strictly dominant at µt = 0 in Claim 14 and cH is strictly dominantat µt = µ∗t+1 in Claim 11, it is sufficient to show that on this interval, the buyer’s expected payoff from offeringcH is continuous, the expected payoff from offering cL is continuous, and Ut(µt|pt = cH)− Ut(µt|pt = cL) isstrictly increasing in µt.

We first show that the expected payoffs of offering cL and cH are continuous on [0, µ∗t+1). In (40), Ut+1

is continuous on (0, µ∗t+1) because it is convex by (IH3). η(µ), σ(µ), and φ(µ) are clearly continuous on[0, 1], see (9). Therefore, the buyer’s expected payoff from offering cH is continuous. The buyer’s expectedpayoff from offering cL, (see (42)), is also continuous because U∗t+1 is convex by (IH3) and ψt is equal to the

67

continuous function µ∗t+1−µt

(1−µt)µ∗t+1by Claim 13. Finally, the buyer’s expected payoffs are also continuous at 0 as

the expected payoffs from both offers are linear in µt for µt near 0. To see this, first consider the expectedpayoff from offering cL at µ < µ∗t+1. It equals:

(T − t+ 1)qL + µt(λ− qL)T∑t′=t

1µ∗t′+1

, (45)

Here, we have written the expected payoff without U∗t+1 by observing that for µt < µ∗t , the buyer offers cLin each subsequent period. Using this observation in conjunction with the observation in (44) allows us tosimplify the expression. Similarly, for µ < σ−1(µ∗t+1) = qLµ

∗t+1

qH(1−µ∗t+1)+qLµ∗t+1, even if the buyer offers cH and

observes a success, she will offer cL in each subsequent period by (IH1). Therefore, the value from offeringcH at these beliefs is:

η(µ)− cH + (T − t)qL + µt(λ− qL)T∑

t′=t+1

1µ∗t′+1

.

Here, to simplify the expression, we have again used the fact that the expected payoff is linear in µt on thisrange and the buyer’s belief is a martingale.

Therefore, the expected payoffs are continuous on [0, µ∗t+1], and it remains to prove that the difference inexpected payoff is strictly increasing. To do so, define the difference in expected payoffs from these two offersas a function of the buyer’s belief:

Π(µt) = η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt))

−(

1− µtµ∗t+1

)(qL − cL + U∗t+1(0))− µt

µ∗t+1

(λ+ U∗t+1(µ∗t+1)

)(a)= η(µt)− cH + η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µt))

−(

1− µtµ∗t+1

)(qL − cL)− µt

µ∗t+1λ+ U∗t+1(µt)

(a) follows, again, because for µ ∈ (0, µ∗t+1), U∗t+1(µ) is linear (see 44) and the buyer’s belief is a martingale.Note that U∗t+1 is convex so differentiable almost everywhere. It is differentiable for µ and φ(µ) as U∗t+1

is linear on the range (0, µ∗t+1). For points where it is not differentiable, we use derivative to refer to thesmallest (i.e. most negative) sub-gradient, which exists for all µ ∈ (0, 1) as Ut+1 is convex. First, we show

68

that:

d

dµt

(η(µt)U∗t+1(σ(µt)) + (1− η(µt))U∗t+1(φ(µ2t))− U∗t+1(µt)

)≥ 0

⇔ (qH − qL)U∗t+1(σ(µt)) + η(µt)σ′(µt)U∗′

t+1(σ(µt))− (qH − qL)U∗t+1(φ(µt)) + (1− η(µt))φ′(µt)U∗t+1(φ(µt))

≥ U∗′

t+1(µt)(a)⇔ (qH − qL)(U∗t+1(σ(µt))− U∗t+1(φ(µt))) + qHqL

η(µt)U∗′

t+1(σ(µt)) + (1− qH)(1− qL)1− η(µt)

U∗′

t+1(φ(µt)) ≥ U∗′

t+1(µt)

(b)⇐ (qH − qL)U∗′

t+1(φ(µt))(σ(µt)− φ(µt)) + qHqLη(µt)

U∗′

t+1(φ(µt)) + (1− qH)(1− qL)1− η(µt)

U∗′

t+1(φ(µt)) ≥ U∗′

t+1(µt)

(c)⇐ U∗′

t+1(φ(µt)) ≥ U∗′

t+1(µt)

Working up, the last expression holds because U∗t+1 is linear on the range (0, µ∗t+1). (c) holds by algebra.(b) holds because U∗t+1 is convex so we lower bound U∗t+1(σ(µt)) and U∗

′

t+1(σ(µt)). (a) holds by algebra.

Thus, we can lower bound the difference in the rates of change of these terms by 0. Therefore, taking thederivative of the remaining terms (which are all differentiable):

d

dµtΠ(µ) ≥

(η(µt)− cH + η(µt)−

(1− µt

µ∗t+1

)(qL − cL)

)= qH − qL + 1

µ∗t+1(qL − cL − λ) > 0

Therefore, Π is strictly increasing.

Let µ∗t be the belief at which the buyer is indifferent between offering cH and cL. That is, at µ∗t we musthave

Π(µ∗t ) = 0

Moreover since Π is strictly increasing, for all µ > µ∗t , offering cH is strictly dominant and for all µ < µ∗t ,offering cL is strictly dominant.

We have proven that, in an equilibrium, the buyer and seller strategies must satisfy the properties stated inthe theorem and have shown that (IH1) holds in period t.


B.3.1 Preliminaries

Notation. See Appendix B.1 for discussion on notation.

Proof Technique. We directly prove the result in a series of lemmas which we state in the next section.

69

We use the functions U∗t : [0, 1] → R and UCt : [0, 1] → R which are characterized by Proposition 5 asequal to the Buyer’s continuation value in the integrated learning model and contractual only benchmark,respectively. We reproduce them below for convenience. In the proofs, the functions are also shown to beconvex which we use in this proof.

Define, recursively, the functions U∗t′ : [0, 1]→ R as:

U∗t′(µ) =

η(µ)− cH + η(µ)U∗t′+1(σ(µ)) + (1− η(µ))U∗t+1(φ(µ)), if µ ≥ µ∗t′(1− µ

µ∗t+1

) (qL − cL + U∗t+1(0)

)+(

µµ∗t+1

)(λ+ U∗t+1(µ∗t+1)), otherwise

(46)

with terminal function U∗T+1(µ) = 0 where µ∗t′ are the sequence of thresholds in Theorem 2, and η, σ and φ

are defined in (9).

Define, recursively, the functions UCt′ : [0, 1]→ R as:

UCt′ (µ) =

η(µ)− cH + UCt+1(µ), if µ ≥ µCt′(1− µ

µCt+1

)(qL − cL + UCt+1(0)) +

(µ

µCt+1

)(λ+ UCt+1(µCt+1)), otherwise

(47)

with terminal function UCT+1(µ) = 0 where µCt′ are the sequence of thresholds in Proposition 3.

B.3.2 Results

The Theorem follows directly from these Lemmas.

Lemma 6. Suppose qL − cL > λ, and there exists T such that µ∗T−T < µC

T−T , then µ∗t < µCt for all1 ≤ t < T − T .

Lemma 7. Suppose qH − cH > qL − cL > λ. If qH

qL> qL−λ−cL

cH−cL, then there exists T ∈ N such that

µ∗T−T < µC

T−T . Conversely, if qH

qL≤ qL−λ−cL

cH−cLthen µ∗t = µCt ,∀t.

Lemma 8. If qH − cH > qL − cL > λ and µ∗t+1 < µCt+1, then there exists µt > µ∗t such that U∗t (µ) <UCt (µ), ∀µ ∈ (0, µt).

Lemma 9. If λ > qL − cL, then U∗t (µ) ≥ UCt (µ) for all t and all µ ∈ [0, 1].

Lemma 10. If qL − cL > λ and qH

qL≤ qL−λ−cL

cH−cL, then U∗t (µ) ≥ UCt (µ) for all t and all µ ∈ [0, 1].

B.3.3 Proofs of results

Proof of Lemma 6.

70

Proof. This follows by an identical induction argument to the proof of Proposition 4, but with a strictinequality step (d) of (78). See §B.7.

Proof of Lemma 7.

Proof of Lemma 7. We establish this is a series of statements. First, we characterize a necessary and suffi-cient condition for characterizing T in terms of µC

T−T and µCT−T+1.

Claim 15. If µ∗t+1 = µCt+1, then σ(µCt ) > µCt+1 ⇔ µ∗t < µCt .

Proof. Fix t such that µ∗t+1 = µCt+1.

First, we observe that there is a slope change in U∗t+1 at µ∗t+1 that is:

limε→0

U∗t+1(µ)− U∗t+1(µ− ε)ε

< limε→0

U∗t+1(µ+ ε)− U∗t+1(µ)ε

To see this, fix 0 < µ1 < µ∗t+1 < µ2 < 1. Note that U∗t+1 and U∗t+2 are convex so differentiable almosteverywhere. Moreover, U∗t+1 is linear on the range (0, µ∗t+1), by its definition, so it is differentiable at µ1.In particular for µ < µ∗t , because U∗t+1 is linear on this range and the buyer’s belief is a martingale, we cansimplify the value function as:

U∗t (µ) = qL − cL + µ

µ∗t+1(λ+ cL − qL) + U∗t+1(µ)

Also, U∗t+2 is linear on the range (0, µ∗t+2) so is differentiable at µ1. If U∗t+1 or U∗t+2 is not differentiable atµ2, let U∗′t+1 and U∗

′

t+2 refer to the smallest sub-gradient, which exists for all µ ∈ (0, 1) as Ut+2 is convex.From the definition of U∗t+1:

U∗′

t+1(µ1) = (λ+ cL − qL)µ∗t+1

+ U∗′

t+2(µ1)

Then:

U∗′

t+1(µ2) = qH − qL + d

dµ

(η(µ2)U∗t+2(σ(µ2)) + (1− η(µ2))U∗t+2(φ(µ2))

)= qH − qL + (qH − qL)U∗t+2(σ(µ2)) + η(µ2)σ′(µ2)U∗t+2(φ(µ2))− (qH − qL)U∗t+2(σ(µ2)) + η(µ2)φ′(µ2)U∗t+2(φ(µ2))

= qH − qL + (qH − qL)(U∗t+2(σ(µ2))− U∗t+2(φ(µ2))) + qHqLη(µ2)U

∗′t+2(σ(µ2)) + (1− qH)(1− qL)

1− η(µ2) U∗′

t+2(φ(µ2))

a≥ qH − qL + (qH − qL)U∗

′

t+2(µ1)(σ(µ2)− φ(µ2)) + qHqLη(µ2)U

∗′t+2(µ1) + (1− qH)(1− qL)

1− η(µ2) U∗′

t+2(µ1)

b= qH − qL + U∗′

t+2(µ1) > U∗′

t+1(µ1)

(a) follows because U∗′t+2(σ(µ2)) ≥ U∗′t+2(φ(µ2)) = U∗′

t+2(µ1) (this follows, again, because U∗t+2 is linear on the

71

range (0, µ∗t+2) and convex on [0, 1], so the derivative (or smallest sub-gradient) is greater than or equal toU∗′

t+2(µ1) at all µ ∈ (0, 1). (b) follows by algebra.

Therefore, U∗′t+1(µ2) > U∗′

t+1(µ1).

Given this, we can now prove ⇒: Suppose that µ∗t+1 = µCt+1, and σ(µCt ) > µCt+1. Since U∗t+1 is convex andthe slope changes at µ∗t+1:

η(µCt )U∗t+1(σ(µCt )) + ν(µCt )U∗t+1(φ(µCt )) > U∗t+1(µCt ) (48)

Therefore, consider the difference in expected payoff from offering cH and from offering cL in the integratedmodel with belief equal to µCt :

η(µCt )− cH + η(µCt )U∗t+1(σ(µCt )) + ν(µCt )U∗t+1(φ(µCt ))− λ µCtµ∗t+1

−(

1− µCtµ∗t+1

)(qL − cL)− U∗t+1(µCt ) (49)

By assumption we have µ∗t+1 = µCt+1. And by the proof of Theorem 2, we know that there is a unique pointµ∗t ∈ (0, µ∗t+1) where (49) equals 0, and for all µ > µ∗t , it is greater than zero and is less than zero for allµ < µ∗t . Therefore, to show that µ∗t < µCt , it is equivalent to show that (49) is greater than 0 at µCt . Now,we can see that (49) is greater than 0 from (48) and, by the definition of µCt as the indifference point whichsolves:

η(µCt )− cH = λ

(µCtµCt+1

)+(

1− µCtµCt+1

)(qL − cL).

Therefore, we have proven ⇒.

Now to prove ⇐: suppose that µ∗t+1 = µCt+1 and µ∗t < µCt . The argument from above essentially goes inreverse. Again, by the proof of Theorem 2, we know that if µ∗t < µCt < µ∗t+1 = µCt+1, then:

η(µCt )−cH +η(µCt )U∗t+1(σ(µCt ))+ν(µCt )U∗t+1(φ(µCt ))−λ µCtµ∗t+1

−(

1− µCtµ∗t+1

)(qL−cL)−U∗t+1(µCt ) > 0 (50)

Since η(µCt )− cH = λ(µC

t

µCt+1

)+(

1− µCt

µCt+1

)(qL − cL) (by the definition of µCt ), this implies that:

η(µCt )U∗t+1(σ(µCt )) + ν(µCt )U∗t+1(φ(µCt )) > U∗t+1(µCt )

Since U∗t+1 is linear for µ ∈ [0, µ∗t+1], if σ(µ∗t ) ≤ µ∗t+1, then

η(µCt )U∗t+1(σ(µCt )) + ν(µCt )U∗t+1(φ(µCt )) = U∗t+1(µCt )

But this would be a contradiction, so we must have σ(µ∗t ) > µ∗t+1.

72

Claim 16. Suppose µCt+1 = µ∗t+1. If

µCt+1 < µ := qH(cH − cL)− qL(qL − λ− cL)(qH − qL)(cH + qL − cL)

then µ∗t < µCt . Otherwise, µ∗t = µCt .

Proof. By Claim 15, this is equivalent to showing that, if µCt+1 < µ, then σ(µCt ) > µCt+1, and if µCt+1 ≥ µ,then σ(µCt ) ≤ µCt+1. By the definition of µCt as buyer’s point of indifference between offering cH and cL, weknow:

µCt =(cH − cL)µCt+1η(µCt+1)− λ− cL

=: g(µCt+1)

Using g, we can write σ(µCt ) in terms of µCt+1:

σ(g(µ)) = σ

((cH − cL)µη(µ)− λ− cL

)=

qH(cH−cL)µη(µ)−λ−cL

qH(cH−cL)µη(µ)−λ−cL

+(

1− (cH−cL)µη(µ)−λ−cL

)qL

= qH(cH − cL)µqH(cH − cL)µ+ qL(η(µ)− λ− cL)− (cH − cL)µqL

= qH(cH − cL)µ(qH − qL)(cH − cL)µ+ qL((qH − qL)µ+ qL − λ− cL)

= qH(cH − cL)µ(qH − qL)µ(cH + qL − cL) + qL(qL − λ− cL)

Using this we can solve for µ, which we define as the µ 6= 0 such that σ(g(µ)) = µ. We then use propertiesof σ(g(µ)) along with the equality of σ(g(µ)) = µ and σ(g(0)) = 0 to show the desired relation.

Solving for µ:

µ = qH(cH − cL)µ(qH − qL)µ(cH + qL − cL) + qL(qL − λ− cL)

⇔ (qH − qL)µ(cH + qL − cL) + qL(qL − λ− cL) = qH(cH − cL)

⇔ µ = qH(cH − cL)− qL(qL − λ− cL)(qH − qL)(cH + qL − cL)

We now prove if µ < µ, then σ(µ) > µ, and if µ ≥ µ, then σ(µ) ≤ µ. To do so, we first show that (see below)σ (g(µ)) is concave. Next, we check two cases: (1) µ > 0 and (2) µ ≤ 0. If µ > 0, then we show the derivativeof σ(g(µ)) at 0 is greater than 1. With this and the concavity of σ (g(µ)), we must have σ(g(µ)) > µ for allµ ∈ (0, µ) and σ(g(µ)) < µ for all µ ∈ (µ, 1).

To prove the second case, we show that if µ < 0, then the derivative of σ(g(µ)) < 1 at µ = 0, so the derivativeis less than 1 everywhere (because of the concavity). Therefore, σ(g(µ)) < µ for all µ ∈ [0, 1].

73

Therefore, in either case. For µ ∈ [0, 1], if µ < µ, then σ(g(µ)) < µ, and if µ ≥ µ, then σ(g(µ)) ≥ µ. Fromthe result of Claim 15, these imply that if µCt+1 < µ then µ∗t < µCt and if µCt+1 ≥ µ then µ∗t ≥ µCt .

Proof of concavity and derivative: Define the following, which are each positive by our assumptions onthe region:

a := qH(cH − cL)

b := (qH − qL)(cH + qL − cL)

c := qL(qL − λ− cL)

∂

∂µ

aµ

bµ+ c= ac

(bµ+ c)2

∂2

∂µ2aµ

bµ+ c= −2abc

(bµ+ c)3 < 0

Thus, the function is concave in µ for µ ∈ [0, 1]. Also:

∂

∂µ

aµ

bµ+ c

∣∣∣∣µ=0

= ac

(b · 0 + c)2 = a

c

= qH(cH − cL)qL(qL − λ− cL)

It is now straightforward from the definition of µ to see that if µ < 0, then the derivative of σ(g(µ)) at µ = 0is less than 1 and if µ > 0, then the derivative of σ(g(µ)) at µ = 0 is greater than 1.

Claim 17. limT→∞ µ∗1 = max{

0, cH+λ−qL

qH−qL

}Proof. We use the relation between µCt and µCt+1 (defined as g from Claim 16) to inductively show that:

1µCt

=(qH − qLcH − cL

) T−t−1∑t′=0

(qL − cL − λcH − cL

)t′+(qL − cL − λcH − cL

)T−t(qH − λ− cLcH − cL

).

Base Case:1µCT

= qH − λ− cLcH − cL

Induction Hypothesis: For τ > t:

1µCτ

= qH − qLcH − cL

T−τ−1∑t′=0



)T−τ (qH − λ− cLcH − cL

).

74

Induction Step:

1µCt

=η(µCt+1)− cL − λ

(cH − cL)µCt+1

= qH − qL(cH − cL) +

(qL − cL − λ(cH − cL)

)1

µCt+1

= qH − qL(cH − cL) +

(qL − cL − λ(cH − cL)

)(qH − qLcH − cL

T−t−2∑t′=0



)T−t−1(qH − λ− cLcH − cL

))

=(qH − qLcH − cL

) T−t−1∑t′=0



)T−t(qH − λ− cLcH − cL

).

If qL−λ > cH , then the last term increases to infinity so (taking the inverse) limT→∞ µC1 = 0. Alternatively,if qL − λ < cH

limT→∞

qH − qLcH − cL

T−t−1∑t′=0



)T−t(cH − cL

qH − λ− cL

)

= limT→∞

(qH − qLcH − cL

) 1−(qL−cL−λcH−cL

)T−t1− qL−cL−λ

cH−cL

+(qL − cL − λcH − cL

)T−t(cH − cL

qH − λ− cL

)=(qH − qLcH − cL

)1

1− qL−cL−λcH−cL

= qH − qLcH − cL − qL + cL + λ

= qH − qLcH − qL + λ

Therefore, (taking the inverse) limT→∞ µC1 = cH+λ−qL

qH−qL.

With these three Claims proven, we can complete the proof of the Lemma.

If qH(cH−cL)qL(qL−λ−cL) ≤ 1 then µ ≤ 0, which means that µCt ≥ µ for all t so by Claim 10, µ∗t = µCt for all t.

If qH(cH−cL)qL(qL−λ−cL) > 1 then µ > 0. There are two cases to consider:

• First, if qL − cH > λ, then µC1 converges to 0 for large enough T by Claim 17. Therefore, there existsT such that µC

T−T+1 = µ∗T−T+1 < µ, in which case by Claim 16, µ∗

T−T < µCT−T .

75

• Second, if qL − cL > λ > qL − cH , then we have µ > cH+λ−qL

qH−qLbecause:

µ >cH + λ− qLqH − qL

⇔ (qH)(cH − cL)− qL(qL − λ− cL)(qH − qL)(cH + qL − cL) >

cH + λ− qLqH − qL

⇔ (qH)(cH − cL)− qL(qL − λ− cL) > (cH + λ− qL)(cH + qL − cL)

⇔ (qH)(cH − cL) > (cH + λ)(cH − cL)

⇔ qH > (cH + λ)

where the last inequality holds by assumption. Then by the convergence of µ∗1 proven in Claim 17,there exists T such that µT−T < µ, so by Claim 16, µ∗

T−T < µCT−T

Proof of Lemma 8.

Proof. To prove this, it is enough to prove that for all µ ≤ µ∗t , U∗t (µ) < UCt (µ) because both functions areconvex (so continuous on the interior). Therefore, if there is a strict difference in the value at µ∗t , then bythe continuity of the value functions, there exists an additional interval (µ∗t , µt) for some µt > µ∗t such thatU∗t (µ) < UCt (µ) for µ ∈ (µ∗t , µt) . Fix µ ∈ (0, µ∗t ].

UCt (µ) (a)=T∑t′=t

(1− µ

µCt+1

)qL +

(µ

µCt+1

)λ

= (T − t+ 1)qL + (λ− qL)µT∑t′=t

(1

µCt+1

)(b)> (T − t+ 1)qL + (λ− qL)µ

T∑t′=t

(1

µ∗t+1

)= U∗t (µ)

(51)

(a) follows because µ∗t+1 ≤ µCt+1 by Lemma 6. Moreover, we can simplify the expression in this way becauseUt′ is linear on the range (0, µ∗t′+1) for all t′ > t, so the expected value of Ut′ equals Ut′ of the expected valuein each period. (b) follows because λ− qL < 0 and µ∗t+1 < µCt+1.

Proof of Lemma 9.

Proof. Suppose λ > qL − cL. We use a backward induction argument to prove the result.

76

Base Case: Period T . By the proofs of Theorem 1 and Proposition 2, the value functions are equal, so theresult follows immediately.

Induction Hypothesis: Suppose that for t′ = t+ 1, ..., T , and for all µ ∈ [0, 1], U∗t′(µ) ≥ UCt′ (µ).

Induction Step: Fix µ. In the contractual only model, the buyer’s payoff is, by Proposition 5E,:

UCt (µ) =

η(µ)− cH + UCt′+1(µ), if µ ≥ µCt′

λ+ UCt′+1(µ), otherwise(52)

We can now see that the buyer’s value function in the integrated learning model dominates the value functionof the contractual only model because the buyer can replicate the action of the contractual only model andoffer cH if µ > µC and offer a price that is rejected otherwise. Using this strategy, by the induction hypothesisand Jensen’s Inequality because U∗t is convex and the buyer’s belief is a martingale, the buyer’s payoff inthe integrated model is weakly greater at µ.

Proof of Lemma 10.

Proof. Suppose qL− cL > λ and qH

qL≤ qL−λ−cL

cH−cL. We use a backward induction argument to prove the result

using the result from Lemma 7 that for all t, µ∗t = µCt . Base Case: Period T . By the proofs of Theorem 2and Proposition 3, the value functions are equal, so the result follows immediately.

Induction Hypothesis: Suppose that for t′ = t+ 1, ..., T , and for all µ ∈ [0, 1], U∗t′(µ) ≥ UCt′ (µ).

Induction Step: Fix µ. In the contractual only model, the buyer’s payoff is, by Proposition 5E:

UCt (µ) =

η(µ)− cH + UCt+1(µ), if µ ≥ µCt(1− µ

µCt+1

)(qL − cL + UCt+1(0)) +

(µ

µCt+1

)(λ+ UCt+1(µCt+1)), otherwise

(53)

We can now see that the buyer’s value function in the integrated learning model dominates the value functionof the contractual only model because the buyer can replicate the action of the contractual only model andoffer cH if µ > µCt = µ∗t , and offer cL otherwise. Using this strategy, by the induction hypothesis and Jensen’sInequality because U∗t is convex and the buyer’s belief is a martingale, the buyer’s payoff in the integratedmodel is weakly greater if µ > µCt . If µ ≤ µCt , then the value functions are the same as µ∗t+1 = µCt+1.

77

B.4 Proof of Proposition 1

Proof. Our proof follows by introducing the single-armed bandit problem more formally and showing theequivalence between the optimal policy and the Buyer’s equilibrium strategy. We then use known resultsregarding the optimality of Gittins Indices for the decision maker in the single-armed bandit problem, whichimplies that it is optimal for the buyer in the integrated learning model as well.

Consider the following single-armed bandit problem. There is a single decision-maker who selects betweentwo actions (‘arms’) in each period t = 1, ..., T . Arm 1 generates a known reward of λ. Arm 2 generates astochastic reward yt − cH where y = {yt}Tt=1 is a sequence of independent random variables distributed asBer(θ), where θ is a random variable equal to qH with probability γ and equal to qL with probability 1− γ.We use xt = 1 to denote the second arm is selected in period t and xt = 0 to denote arm 1 is selected inperiod t. Define rt := (yt − cH)xt + λ(1− xt) as the reward observed by the buyer in period t. Denote thehistory of information available to the decision maker at time t as:

h1 = 〈γ〉, ht = 〈(xt′ , rt′)t−1t′=1〉.

We denote by {Ht = σ(ht) : t = 1, ..., T} the filtration associated with the process and denote by Ht,the set of all possible histories in period t. A buyer policy is a sequence of measurable functions, τ ={τt : Ht → {0, 1}}Tt=1. The buyer seeks a policy which solves:

maxτ

Eθ,y

(T∑t=1

xτt(ht)(yt − cH) + (1− xτt(ht)))

(54)

As it is a finite horizon problem, an optimal solution, τ ∗, can be determined using dynamic programmingand solving recursively backwards - see, Berry and Fristedt (1985) Lemma 2.3.1. Let µt(ht) = P(θ = qH |ht)and define, recursively, the value functions:

Vt(µt) = maxxt

E (rt + Vt+1(µt+1))

where Vt+1(µT+1) = 0.

We can now show that the buyer offers cH if it is strictly optimal for the decision-maker to pull arm 2 andvice versa. There are two cases to consider qH − cH ≤ λ and qH − cH > λ.

Case 1: Suppose that qH − cH > λ ≥ qL − cL. We inductively show that the in the optimal policy, thedecision maker uses arm 2 if µt > µ∗t and uses arm 1 if µt < µ∗t , where {µ∗t′} is the sequence of thresholds in

78

(IH1). By Proposition 5E, the buyer’s value function in every equilibrium is:

U∗t′(µ) =


(T − t′ + 1)λ, otherwise(55)

with terminal function UT+1(µ) = 0 where {µ∗t′} is the sequence of thresholds in (IH1), and η, σ and φ aredefined in (9).

Base Case: Period T . As the decision maker is an expected value maximizer, it is clear that it is optimal touse arm 1 if µT qH + (1−µT )qL− cH > λ and use arm 2 if µT qH + (1−µT )qL− cH < λ. This is equivalent tousing arm 2 if µT > µ∗T and arm 1 if µT < µ∗T . See the proof of Theorem 1. Moreover, the decision-maker’svalue function is:

VT (µT ) =

qHµT + qL(1− µT )− cH , if µT ≥ µ∗Tλ, otherwise

(56)

Induction Hypothesis: For periods t′ = t+ 1, ..., T :

(IH1) τ∗t′(µt′) = 1 if µ > µ∗t′ and τ∗t′(µt′) = 0 if µ < µ∗t′ .

(IH2) Vt′(µ) = U∗t′(µ), for all µ ∈ [0, 1].

Induction Step: Fix h ∈ Ht and let µt := µt(h). Using the dynamic programming setup above, the decisionmaker selects arm 2 if:

η(µ)− cH + η(µ)V ∗t+1(σ(µ)) + (1− η(µ))V ∗t+1(φ(µ)) > λ+ Vt+1(µt)

and arm 1 if the inequality holds strictly the other way. Since:

1. Ut+1(µ) = Vt+1(µ) for all µ by (IH2),

2. Arm 2 generates the same expected payoff as offering cH for the buyer, and

3. Arm 1 generates the same expected payoff as offering a price that is rejected,

by the definition of µ∗t , it is strictly optimal to use arm 2 if µt > µ∗t and strictly optimal to use arm 1 ifµt < µ∗t . Therefore,

V ∗t (µ) =

η(µ)− cH + η(µ)V ∗t+1(σ(µ)) + (1− η(µ))V ∗t+1(φ(µ)), if µ ≥ µ∗t′

(T − t′ + 1)λ, otherwise .(57)

And we have established that (IH1) and (IH2) hold in period t.

79

Now, from Berry and Fristedt (1985) Chapter 5, Theorem 5.3.1 and Corollary 5.3.2, a Gittins Index policyis optimal in the single-armed bandit problem that we introduced because the discount sequence is ‘regular’by Proposition 5.2.1 and the subsequent example, A1 with α = 1. Above, we established that it is strictlyoptimal for the decision maker to use arm 2 if µt > µ∗t and strictly optimal to use arm 1 if µt < µ∗t . Therefore,if µt < µ∗t , we must have Λt(µt) < λ, and if µt > µ∗t , we must have Λt(µt) > λ. Moreover, we must haveΛt(µ∗t ) = λ as Λt(µ) is continuous in µ by Corollary 5.3.3 in Berry and Fristedt (1985).

Case 2: Suppose that λ ≥ qH − cH > qL − cL. In this case, Λ(µt) ≤ Λ(1) = qH − cH < λ. Thus, thedecision-maker never uses arm 2. Moreover, it is straightforward that it is optimal for the buyer to offer aprice that is rejected in each period if λ ≥ qH − cH > qL − cL. Therefore, in this case, the proposition holdsas Λt(µt) < λ for all t and all µt.

Thus, we have proven Proposition 1.


The proof follows nearly the same argument as the Proof of Theorem 1. See Appendix B.1 where we definepreliminary notation.

Preliminaries: Proof Technique and Induction Hypotheses Our proof will be by backwardinduction in the number of periods. That is, we will use period T , the last period, as the base case. Withineach period, we first characterize the necessary conditions for the buyer’s and seller’s equilibria strategies.In particular, within each period t, we use the fact that the agents are sequentially rational and startbackwards by characterizing the seller’s strategy at time t for every history and every price offer. Then,given the seller’s strategy, we characterize necessary condition for the buyer’s strategy at time t. We showthat, for µCt = λ+cH−qL

qH−qL, in every period, the equilibrium strategies in period t must satisfy:

P(at = 1|θ = H,hCt = h, pt = p) =

1, if p ≥ cH

0, if p < cH

(58)

80

P(at = 1|θ = L, hCt = h, pt = p) =

1, if p ≥ cH(µC

t+1−µt

(1−µt)µCt+1

)+, if cL < p < cH

ψt ∈[0,(

µCt+1−µt

(1−µt)µCt+1

)+], if p = cL

0, if p < cL

(59)

pt =

cH w.p. 1, if µt > µCt

cH w.p. ξt ∈ [0, 1], if µt = µCt

p ∈ Pt w.p. 1− ξt, if µt = µCt

p ∈ Pt w.p. 1, if µt < µCt

(60)

where we define µCT+1 := 1 and Pt represents the set of prices that are rejected with probability 1, that is,

Pt :=

{p : p < cL}, if ψt > 0

{p : p ≤ cL}, if ψt = 0(61)

Formally, our first induction hypothesis is as follows:

(IH1) Fix t < T . For periods t′ = t+ 1, t+ 2, ..., T and µCt′ = λ+cH−qL

qH−qL. In any equilibrium, the price offer

and acceptance decision, pt′ , at′ , satisfy the properties in equations (60), (58), (59), respectively.20

We would like to highlight that this proof technique is possible due to our refinements, as summarized bythe following remark:

Remark 2. Note that because of our refinements on the buyer’s off-path belief, the seller’s strategy can bedetermined without considering whether the buyer has made a price offer pt = p that occurs with positiveprobability in equilibrium and regardless of whether the history h occurs with positive probability. Withoutthis refinement, we would need to specify how the buyer’s belief is updated after each (p, a, y) in the casewhere p is off-path.

Next, we show that the set of equilibria is non-empty. In particular, we show that an equilibrium exists forthe game with T = 1, and we inductively show that it exists for games with more periods. Formally, oursecond induction hypothesis as follows.

20We highlight that these are necessary conditions for equilibrium strategies but not sufficient —in particular, ξt′will be pinned down at some (off-path) histories.

81


Note that (IH1) and (IH2) form the basis of the theorem.

The third induction hypothesis, which we will introduce next, characterizes properties that we will use inthe induction step. To that end, define, recursively, the functions UCt′ : [0, 1]→ R as:

UCt′ (µ) =

η(µ)− cH + UCt′+1(µ), if µ ≥ µC


with terminal function UT+1(µ) = 0 where µC is the threshold characterized in (IH1). Then, our thirdinduction hypothesis can be stated as

(IH3) Fix t < T . Suppose that for periods t′ = t+ 1, t+ 2, ..., T , (ρ,α,µ) satisfies (IH1). Then, for everyhistory h ∈ HC

t′ , we have that Uρ,α,µt′ (h) = UCt′ (µt′(h)), where UCt′ is defined in (62).

Proof of Proposition 2: Contractual Learning Only Equilibria with Moderate OutsideOption. Fix parameters qH , cH , λ, qL, cL such that qH − cH > λ > qL − cL. Fix T ∈ N.

Part I: Base Case (Period T) This follows word-for-word from the Proof of Theorem 1. See Ap-pendix B.1.

Part II: Inductive Step (Period t < T ). Fix 1 ≤ t < T . Suppose that the Induction Hypotheses(IH1), (IH2), and (IH3) hold for every t′ > t. As before, we start by characterizing the seller’s period-tstrategy for every history h ∈ HC

t and every price offer pt = p. By sequential rationality and taking theseller’s period-t strategy as given, we characterize the buyer’s period-t strategy, which completes the proofof (IH1). We then proceed to show (IH2) and (IH3).

Seller’s period t strategy. Consider the problem in period t beginning with the seller’s strategy.

Lemma 11. Fix a public history h ∈ HCt , a price offer p ∈ R, and suppose that (IH1), (IH2), and (IH3)

hold for every t′ > t. Then, the equilibrium strategy of a high-type seller at time t is:

P(at = 1|θ = H,hCt = h, pt = p

)=

1, if p ≥ cH

0, if p < cH

(63)

The proof follows word-for-word from the proof of Lemma 2 with the correct histories ht and thresholds µCt′ .

82

Lemma 12. Fix a public history h ∈ HCt , a price offer p ∈ R, and suppose that (IH1), (IH2), and (IH3)

hold for every t′ > t. Then, the equilibrium strategy of a low-type seller at time t must satisfy:

P(at = 1|θ = L, hCt = h, pt = p

)=

1, if p ≥ cH(µC

t+1−µt

(1−µt)µCt+1

)+, if cL < p < cH

ψt ∈[0,(

µCt+1−µt

(1−µt)µCt+1

)+], if p = cL

0, if p < cL

(64)

The proof follows nearly identically to the proof of Lemma 3 with the correct histories hCt′ and thresholdsµCt′ substituted in. The only difference emerges in Claim 4, as ξt′ = ξρ,α,µt′ are defined so that:

ξρ,α,µt+1 =

p− cLcH − cL

, if t = T − 1(p− cL − V ρ,α,µt+2 (hC0

t+2)cH − cL + V ρ,α,µt+2 (hC1

t+2)− V ρ,α,µt+2 (hC0t+2)

)+

, otherwise(65)

where hC1t+2 = 〈θ = L, ht ∪ {pt = p, at = 0; pt+1 = cH , at+1 = 1}〉 and hC0

t+2 = 〈θ = L, ht ∪ {pt = p, at =0; pt+2 ∈ Pt, at+1 = 0}〉. Note that an offer of p ∈ (cL, cH) in period t does not only affect ξt+1 butpotentially all ξt′ for t′ > t because the buyer’s belief does not change for the remainder of the game and thethresholds µCt′ are identical by (IH1). Therefore, once the buyer’s belief is equal to µC , it does not change, soall of the subsequent ξt′ will matter. Of course, the buyer can simply set ξt+1 = p−cL

cH−cLand all other ξt′ = 0

which makes a low-type seller indifferent between accepting and rejecting.

Apart from the slightly different definition of ξt′ , the proof follows in the same way.

Buyer’s period T strategy.

Lemma 13. Fix a public history h ∈ HCt and let µt := µt(h). The period-t buyer’s strategy must satisfy:

pt =

cH w.p. 1, if µt > µC

cH w.p. ξt ∈ [0, 1], if µt = µC

p ∈ Pt w.p. 1− ξt, if µt = µC

p ∈ Pt w.p. 1, if µt < µC

(66)

Proof. As an intermediate step to proving Lemma 13, we first show in Claim 18, that the only price offersthat can arise with positive probability in equilibrium are those in {cH} ∪ Pt.

Claim 18. Fix a history h ∈ HCt . Then, P(pt ∈ {cH}∪Pt) = 1, that is, with probability 1, the buyer offers

either cH or a price offer that is rejected with probability 1.

83

Proof. The proof follows the proof of Claim 6 so we briefly summarize it, but omit the details for the sakeof brevity as they are nearly identical. cH dominates any price offer greater than cH and any price cL + δ

is dominated by cL + δ/2 for any δ > 0. Moreover, if µ < µCt+1, offering p ∈ Pt dominates offering cL andotherwise offering cH dominates offering cL.

We can now compare the expected payoff of offering p ∈ Pt and of offering cH . Fix history h ∈ Ht withbelief µt = µt(h). The expected payoff of offering cH is:

η(µt)− cH + UCt+1(µt), (67)

and the payoff of offering p ∈ Pt isλ+ UCt+1(µt), (68)

as there is no learning from either offer. Thus, the buyer’s continuation payoff is the same from each offerso the trade off is the same as in the last period. Thus, µCt = µC = λ+cH−qL

qH−qL.

We have proven that, in an equilibrium, the buyer and seller strategies must satisfy the properties stated inthe Lemma and have shown that (IH1) holds in period t.

(IH2): Equilibrium existence for T − t+ 1 periods. We now construct an equilibrium to showthe set of equilibria is non-empty for T − t + 1 periods and that (IH2) holds in period t. By (IH2), thereexists a continuation equilibrium for periods t + 1 to T that satisfies the other induction hypotheses. Fort′ > t, let ρt′ , αt′ , µt′ be as described in this continuation equilibrium. However, for pt′ , set ξt′+1 = p−cL

cH−cL

for (off-path) histories where pt′ = p ∈ (cL, cH) and equal to 0 otherwise. For period t, set pt according to(66) with p ∈ Pt equal to cL− 1. Define αt as in (63) and (64) with ψt equal to 0 for all histories; and let µtbe calculated as required by our equilibrium concept where possible and equal to 0 at any other histories. Itis straightforward from the previous analysis that (ρ, α, µ) describes an equilibrium for T − t + 1 periods.Thus, (IH2) holds.

(IH3): For every history h ∈ HCt , we have that Uρ,α,µt (h) = UCt (µt(h)) whenever (ρ,α,µ)

satisfies (IH1) for all period t′ ≥ t. Fix a history h ∈ HCt and suppose that (ρ,α,µ) satisfy (IH1).

By our previous discussion, the buyer’s period-t value function, U (ρ,α,µ)t (h) is given by the expression in (67)

if µt(h) ≥ µC and by the expression in (68) if µt(h) < µC (these expressions are equal at µC). Note thatthis can be expressed as a function of µt(h), and it agrees the with the definition of UCt (·). Moreover, UCt isconvex because it is the pointwise maximum of two linear functions.

84


The proof follows nearly the same argument as the Proof of Theorem 2. See Appendix B.2.

Preliminaries: Proof Technique and Induction Hypotheses Our proof will be by backwardinduction in the number of periods. That is, we will use period T , the last period, as the base case. Withineach period, we first characterize the necessary conditions for the buyer’s and seller’s equilibrium strategies.In particular, within each period t, we use the fact that the agents are sequentially rational and workbackwards by characterizing the seller’s strategy at time t for every history and every price offer. Then,given the seller’s strategy, we characterize necessary conditions for the buyer’s strategy at time t. We showthat, in every period, there exists a threshold µCt with 0 < µC1 < µCt < · · · < µCT < 1 such that, for allh ∈ HC

t , p ∈ R, the equilibrium strategies in period t must satisfy:

P(at = 1|θ = H,hCt = h, pt = p) =

1, if p ≥ cH

0, if p < cH

(69)

P(at = 1|θ = L, hCt = h, pt = p) =

1, if p ≥ cH(µC

t+1−µt

(1−µt)µCt+1

)+, if cL < p < cH

ψt ∈[0,(

µCt+1−µt

(1−µt)µCt+1

)+], if p = cL

0, if p < cL

(70)

pt =



cL w.p. 1− ξt, if µt = µCt

cL w.p. 1, if µt < µCt

(71)

where we define µCT+1 := 1.

Formally, our first induction hypothesis is as follows:

(IH1) Fix t < T . For periods t′ = t+1, t+2, ..., T . There exists a sequence of thresholds 0 < µCt+1 < µCt+2 <

· · · < µCT < 1 such that, in any equilibrium, the price offer and acceptance decision, pt′ , at′ , satisfy the

85

properties in equations (71), (69), (70), respectively.21 Moreover, offering cH with probability one atbeliefs µt′ > µCt′ and offering cL with probability one at beliefs µt′ < µCt′ are strictly dominant actionsfor the buyer.

We, again, would like to highlight that this proof technique is possible due to our refinements, as summarizedby the following remark:

Remark 3. Note that because of our refinements on the buyer’s off-path belief, the seller’s strategy can bedetermined without considering whether the buyer has made a price offer pt = p that occurs with positiveprobability in equilibrium and regardless of whether the history h occurs with positive probability. Withoutthis refinement, we would need to specify how the the buyer’s belief is updated after each (p, a, y) in the casewhere p is off-path.

Next, we show that the set of equilibria is non-empty. In particular, we show that an equilibrium exists forthe game with T = 1, and we inductively show that it exists for games with more periods. Formally, oursecond induction hypothesis is:


(IH1) and (IH2) form the basis of much of the Proposition. However, they do not comprise the entirestatement as we also need to prove that ψt is pinned down after an offer of pt = cL is made, and we canonly establish this when considering the buyer’s action in period t. Additionally, we need to establish thatξt+1 = 0 if pt = cL, which can only be done in the inductive step. Therefore, working backwards wecharacterize necessary conditions that the equilibrium strategies must satisfy as in the proof of Theorem 1,but in the inductive step, for some histories, we further pin down ψt′ and ξt′ by working forward in time.

The third induction hypothesis, which we will introduce next, characterizes properties that we will use inthe induction step. To that end, define, recursively, the functions UCt′ : [0, 1]→ R as:

UCt′ (µ) =

η(µ)− cH + UCt′+1(µ), if µ ≥ µCt′(

1− µµC

t′+1

)(qL − cL + UCt′+1(0) +

(µ

µCt′+1

)(λ+ UCt′+1(µCt′+1)), otherwise

(72)

with terminal function UT+1(µ) = 0 where {µCt′} is the sequence of thresholds in (IH1), and η is defined in(9). Then, our third induction hypothesis can be stated as:

(IH3) Fix t < T . Suppose that for periods t′ = t+ 1, t+ 2, ..., T , (ρ,α,µ) satisfies (IH1). Then, for everyhistory h ∈ HC

t′ , we have that Uρ,α,µt′ (h) = UCt′ (µt′(h)), where UCt′ is defined in (72).21We highlight that these are necessary conditions for equilibrium strategies but not sufficient —in particular, ξt′

and ψt′ will be pinned down at some histories.

86

Proof of Proposition 3: Contractual Learning Only Equilibria with Bad Outside Op-tion. Fix parameters qH , cH , λ, qL, cL such that qH − cH > qL − cL > λ. Fix T ∈ N.

Part I: Base Case (Period T) This follows identically to the Proof of Theorem 2. See AppendixB.2.

Part II: Inductive Step (Period t < T ). Suppose that the Induction Hypotheses (IH1), (IH2), and(IH3) hold for every t′ > t. As before, we start by characterizing the seller’s period-t strategy for every historyh ∈ Ht and every price offer pt = p. By sequential rationality and taking the seller’s period-t strategy asgiven, we characterize the buyer’s period-t strategy, which completes the proof of (IH1). We then proceedto show (IH2) and (IH3).

Seller’s period t strategy. Consider the problem in period t beginning with the seller’s strategy. Theproof of the seller strategy is identical to the proofs for Theorems 1 and 2.22 For every h ∈ HC

t and p ∈ R,the high-type seller strategy satisfies, see Lemma 2:

P(at = 1|θ = H,hCt = h, pt = p

)=

1, if p ≥ cH

0, if p < cH

(73)

The low-type seller strategy is, see Lemma 3:

P(at = 1|θ = L, hCt = h, pt = p

)=

1, if p ≥ cH(µC

t+1−µt

(1−µt)µCt+1

)+, if cL < p < cH

ψt ∈[0,(

µCt+1−µt

(1−µt)µCt+1

)+], if p = cL

0, if p < cL

(74)

Moreover, similar to Period T , for h ∈ HCt , we show in the next section that ψt must equal

(µC

t+1−µt

(1−µt)µCt+1

)+

if P(pt = cL|hCt = h) > 0.

Buyer’s period t strategy. So far we have characterized some properties that must be satisfied bythe period t seller’s equilibrium strategies. We now use them to characterize the buyer’s period t strategy.

22In this region, there are equilibria where a high-type seller does not accept an offer of cH . Our refinement isneeded to rule them out.

87

Lemma 14. Fix a public history h ∈ HCt . The period-t buyer’s strategy must satisfy:

pt =



cL w.p. 1− ξt, if µt = µCt

cL w.p. 1, if µt < µCt

(75)

for µCt = cHµCt+1

η(µCt+1)−λ .

As the proof changes slightly from the proof Theorem 2, we will repeat the claims we use here but only provethe ones which require a new proof.

Claim 19. Fix a history h ∈ HCt . Then, P(pt ≤ cH) = 1.

Proof. See Claim 10.

Claim 20. Fix a history h ∈ HCt such that µt(h) ≥ µCt+1. Offering pt = cH with probability 1 is a strictly

dominant strategy, so P(pt = cH |ht = h) = 1.

Proof. Fix a history h ∈ Ht such that µt(h) ≥ µCt+1. Since µt ≥ µCt+1, by the low-type strategy, see (73) and(74), every offer p < cH is rejected. Offering cH generates expected payoff:

η(µt)− cH + UCt+1(µt)

Offering a price that is rejected generates expected payoff:

λ+ UCt+1(µt)

Thus, we must show that η(µt)− cH > λ. This follows because:

η(µt)− cH ≥ η(µCt+1)− cH(a)=(

1−µCt+1µCt+2

)(qL − cL) +

(µCt+1µCt+2

)λ

(b)> λ

(a) follows because the buyer is indifferent between offering cH and cL at µCt+1 by (IH1). (b) follows by (IH1)because µCt+1 < µCt+2.

Claim 21. P(pt < cL|hCt = h) = P(pt ∈ (cL, cH)|hCt = h) = 0.

88


Claim 22. Fix a history h ∈ HCt such that µt(h) < µCt+1. If P(pt = cL|hCt = h) > 0, then

P(at = 1|θ = L, hCt = h, pt = p

)=

µCt+1 − µt(1− µt)µCt+1

.

Moreover, if pt = cL, then for all t′ > t, P(pt′ = cL) = 1 and:

P(at′ = 1|θ = L, hCt′ = h, pt′ = cL

)=

µCt′+1 − µt′(1− µt′)µCt′+1

.


Claim 23. Fix a history h ∈ HCt and define µCt = (cH−cL)µC

t+1η(µC

t+1)−λ−cL. If µt(h) < µCt , then offering pt = cL with

probability one is strictly dominant and if µt(h) > µCt , then offering pt = cH is strictly dominant.

Proof. For all histories, we have established that the buyer can only offer cH or cL with positive probability.Fix a history h ∈ HC

t such that µt(h) < µ∗t+1. Offering cH generates expected payoff:

η(µt)− cH + UCt+1(µt) (76)

Offering cL generates expected payoff:(1− µCt

µCt+1

)(qL − cL) +

(µtµCt+1

)λ+ UCt+1(µt) (77)

In comparing the relative value, UCt+1(µt) cancel and we must compare the remaining terms. Note thatη(µt) − cH is increasing in µt since qH − qL > 0 and

(1− µC

t

µCt+1

)(qL − cL) +

(µt

µCt+1

)λ is decreasing in µt

as qL − cL > λ. We have already shown in Claim 20 that offering cH is optimal at µt = µCt+1. It isstraightforward to see that offering cL is dominant at µt = 0 because qL − cL > qL − cH . Moreover, thebuyer’s expected payoff from each offer is continuous in µt as UCt+1 is convex (so continuous on (0,1)) and theremaining terms are clearly continuous. Note that, the expected payoffs are also continuous at µt = 0 as forµ < µCt+1, Ut+1(µ) = (T − t)qL + µ(λ − qL)

∑Tt′=t+1

1µC

t′+1because cL is offered in every period. Therefore,

to prove the claim, we show that µCt defined above is the indifference point between the two price offers, we

89

have:

η(µt)− cH =(

1− µtµCt+1

)(qL − cL) +

(µtµCt+1

)λ

⇔ (qH − qL)µt − cH = −cL +(

µtµCt+1

)(λ− qL + cL)

⇔ (qH − qL)µt −(

µtµCt+1

)(λ− qL + cL) = cH − cL

⇔ µt = cH − cL(qH − qL)−

(1

µCt+1

)(λ− qL + cL)

⇔ µt =(cH − cL)µCt+1

(qH − qL)µCt+1 + qL − λ− cL

Thus, the claim follows.

We have proven that, in an equilibrium, the buyer and seller strategies must satisfy the properties stated inthe theorem and have shown that (IH1) holds in period t.

(IH2): Equilibrium existence for T − t+ 1 periods. We now construct an equilibrium to showthe set of equilibria is non-empty for T − t + 1 periods and that (IH2) holds in period t. By (IH2), thereexists a continuation equilibrium for periods t + 1 to T that satisfies the other induction hypotheses. Fort′ > t, let ρt′ , αt′ , µt′ be as described in this continuation equilibrium. However, for pt+1, set ξt+1 = pt−cL

cH−cL

for (off-path) histories where pt = p ∈ (cL, cH) and equal to 0 at histories h ∈ Ht+1 where pt = cL. Definept according to (75). Define αt as in (73) and (74) with ψt equal to µC

t+1−µt

(1−µt)µCt+1

for all histories; and let µt becalculated as required by our equilibrium concept where possible and equal to 0 at any other histories. Itis straightforward from the previous analysis that (ρ, α, µ) describes an equilibrium for T − t + 1 periods.Thus, (IH2) holds.

(IH3): For every history h ∈ HCt , we have that Uρ,α,µt (h) = UCt (µt(h)) whenever (ρ,α,µ)

satisfies (IH1) for all t′ ≥ t. Fix a history h ∈ Ht and suppose that (ρ,α,µ) satisfy (IH1). By ourprevious discussion, the buyer’s period-t value function, U (ρ,α,µ)

t T (h) is given by the expression in (76) ifµt(h) ≥ µCt and by the expression in (77) if µt(h) < µCt (these expressions are equal at µCt ). Note that thiscan be expressed as a function of µt(h), and it agrees the with the definition of UCt (·) in (72). Moreover, UCtis convex because there are a finite number of possible continuation policies (given that the optimal policywill be either offer cH or cL), each of which generates an expected payoff that is linear in µt. UCt (µ) is thepointwise maximum of these policies so is convex.

90


Proof. There are two cases to consider.

Case 1: Suppose λ > qL − cL. From Proposition 2, µC = cH+λ−qL

qH−qLfor all t. From the proof of Theorem 1

in Appendix B.1, µ∗T = cH+λ−qL

qH−qLin the proof of the buyer’s Period T strategy - see (20) and (21). Finally,

from Theorem 1, µ∗t < µ∗T for all t < T .

Case 2: For qL − cL ≥ λ, we prove the result by backward induction over time periods.

Base Case: Period T . Assume the buyer has belief µ ∈ [0, 1]. The buyer’s expected payoff from offeringcL equals µλ+ (1− µ)(qL − cL) in both models since a low-type seller accepts with probability one in eachmodel and it is the last period. The buyer’s expected payoff from offering cH equals (qH − qL)µ+ qL − cHin both models as well. Therefore, the statement for Period T holds as µ∗T = µCT = cH−cL

qH−λ−cL.

Induction Hypotheses: Fix t < T .

(IH) Suppose that, for all t′ = t+ 1, ...T , µ∗t′ ≤ µCt′

Induction Step: Fix µ ∈ [0, µ∗t ]. In this interval, cL is the optimal offer for the Buyer in the IntegratedLearning model. We will show that the buyer must also offer cL in the contractual only model.

U∗t (µ) =(

1− µ

µ∗t+1

)(qL − cL + U∗t+1(0))+

(µ

µ∗t+1

)(λ+ U∗t+1(µ∗t+1))

(a)≥ (qH − qL)µ+ qL − cH+η(µ)U∗t+1(σ(µ)) + (1− η(µ))U∗t+1(φ(µ))

(b)⇒(

1− µ

µ∗t+1

)(qL − cL) +

(µ

µ∗t+1

)λ+ U∗t+1(µ) ≥ (qH − qL)µ+ qL − cH + Eyt

(U∗t+1(µt+1)|µ

)(c)⇒

(1− µ

µ∗t+1

)(qL − cL) +

(µ

µ∗t+1

)λ+ U∗t+1(µ) ≥ (qH − qL)µ+ qL − cH + U∗t+1(µ)

⇒(

1− µ

µ∗t+1

)(qL − cL) +

(µ

µ∗t+1

)λ ≥ (qH − qL)µ+ qL − cH

(d)⇒(

1− µ

µCt+1

)(qL − cL) +

(µ

µCt+1

)λ ≥ (qH − qL)µ+ qL − cH

⇒(

1− µ

µCt+1

)(qL − cL) +

(µ

µCt+1

)λ+ UCt+1(µ) ≥ (qH − qL)µ+ qL − cH + UCt+1(µ)

(e)⇒(

1− µ

µCt+1

)(qL − cL + UCt+1(0)) +

(µ

µCt+1

)(λ+ UCt+1(µCt+1)) ≥ (qH − qL)µ+ qL − cH + UCt+1(µ)

(78)

(a) follows because the buyer’s payoff from offering cL was shown to be greater than her payoff from offeringcH in the proof in Theorem 2. (b)U∗t+1 is linear on the range [0, µ∗t+1] by the definition of U∗t+1 and byassumption, we have µ ≤ µ∗t < µ∗t+1 where the second inequality follows by Theorem 2. (c) By Jensen’sInequality and the convexity of U∗t+1 (See proof of Theorem 2). (d) by (IH) for t′ = t+ 1 since qL − cL > λ.

91

(e) UCt+1 is linear on the range [0, µCt+1] by the definition of UCt+1 and by assumption, we have µ ≤ µ∗t <

µ∗t+1 ≤ µ∗t+1 where the last inequality follows by (IH).

Note that the last expression implies that the expected payoff from offering cL is greater than the expectedpayoff from offering cH in the contractual model. Therefore, if cL is optimal for the Integrated Learningmodel, it is optimal in the contractual only model, so µ∗t ≤ µCt , which confirms (IH) and proves the claim.

B.8 Proposition 5 - Extended

In this extended version of Proposition 5, we explicitly define the functions, U∗t and UCt for t = 1, ..., T . Notethat the functions have different definitions depending on whether qL − cL > λ or vice versa. We leave thisimplicit in our notation as the proper definition will be clear from context.

Moderate Outside Option: Suppose qL − cL ≤ λ. Define, recursively, the functions U∗t′ : [0, 1]→ R as:

U∗t′(µ) =



with terminal function U∗T+1(µ) = 0 where µ∗t′ are the sequence of thresholds in Theorem 1, and η, σ and φ

are defined in (9).

Moreover, define, recursively, the functions UCt′ : [0, 1]→ R as:

UCt′ (µ) =

η(µ)− cH + UCt′+1(µ), if µ ≥ µCt′

λ+ UCt′+1(µ), otherwise(80)

with terminal function UCT+1(µ) = 0 where µCt′ = µC which is characterized by Proposition 2.

Bad Outside Option: Suppose that qL − cL > λ. Define recursively, the functions U∗t′ : [0, 1]→ R as:

U∗t′(µ) =

η(µ)− cH + η(µ)U∗t′+1(σ(µ)) + (1− η(µ))U∗t′+1(φ(µ)), if µ ≥ µ∗t′(1− µ

µ∗t+1

) (qL − cL + U∗t′+1(0)

)+(

µµ∗t+1

)(λ+ U∗t′+1(µ∗t+1)), otherwise

(81)

with terminal function U∗T+1(µ) = 0 where µ∗t′ are the sequence of thresholds in Theorem 2.

Moreover, define, recursively, the functions UCt′ : [0, 1]→ R as:

UCt′ (µ) =

η(µ)− cH + UCt′+1(µ), if µ ≥ µCt′(

1− µµC

t′+1

)(qL − cL + UCt′+1(0)) +

(µ

µCt′+1

)(λ+ UCt′+1(µCt′+1)), otherwise

(82)

92

with terminal function UCT+1(µ) = 0 where µCt′ are the sequence of thresholds in Proposition 3.

Proposition 5E. For every equilibrium, (ρ,α,µ), period t = 1, ..., T , and history h ∈ Ht, Uρ,α,µt (h) =U∗t (µt(h)) where {U∗t } are defined in (79) and (81). Moreover, for every equilibrium, (ρC ,αC ,µC) periodt = 1, ..., T , and history h ∈ HC

t , UρC ,αC ,µC

t (h) = UCt (µCt (h)) where {UCt } are defined in (80) and (82).

Proof. For the functions {U∗t }, the result follows from the proofs of Theorems 1 and 2. Specifically, (IH3)in each proof. See equations (14) and (32). For the functions {UCt }, the result follows from the proofs ofPropositions 2 and 3. Specifically, (IH3) in each proof. See equations (62) and (72).

93

sequential procurement with contractual and experimental ... · client can set up a job (which...

Documents