2006 mailath

664

Upload: aea15

Post on 12-Mar-2015

130 views

Category:

Documents


2 download

TRANSCRIPT

Repeated Games and ReputationsThis page intentionally left blank Repeated Gamesand ReputationsLong-Run RelationshipsGeorge J. Mailath and Larry Samuelson120061Oxford University Press, Inc., publishes works that furtherOxford Universitys objective of excellencein research, scholarship, and education.Oxford New YorkAuckland Cape Town Dar es Salaam Hong Kong KarachiKuala Lumpur Madrid Melbourne Mexico City NairobiNew Delhi Shanghai Taipei TorontoWith ofces inArgentina Austria Brazil Chile Czech Republic France GreeceGuatemala Hungary Italy Japan Poland Portugal SingaporeSouth Korea Switzerland Thailand Turkey Ukraine VietnamCopyright 2006 by Oxford University PressPublished by Oxford University Press, Inc.198 Madison Avenue, New York, New York 10016www.oup.comOxford is a registered trademark of Oxford University PressAll rights reserved. No part of this publication may be reproduced,stored in a retrieval system, or transmitted, in any form or by any means,electronic, mechanical, photocopying, recording, or otherwise,without the prior permission of Oxford University Press.Library of Congress Cataloging-in-Publication DataMailath, George Joseph.Repeated games and reputations : long-run relationships / George J. Mailath, Larry Samuelson.p. cm.Includes bibliographical references and index.ISBN-13 978-0-19-530079-6ISBN 0-19-530079-31. Game theory. 2. Economics, Mathematical. I. Samuelson, Larry, 1953 II. Title.HB144.M32 2006519.3dc22 20050495181 3 5 7 9 8 6 4 2Printed in the United States of Americaon acid-free paperTo LorettaGJMTo Mark, Brian, and SamanthaLSThis page intentionally left blank AcknowledgmentsWe thank the many colleagues who have encouraged us throughout this project. Inparticular, we thank Martin Cripps, Jeffrey Ely, Eduardo Faingold, Drew Fudenberg,Qingmin Liu, Stephen Morris, Georg Nldeke, Ichiro Obara, Wojciech Olszewski,Andrew Postlewaite, Patrick Rey, Andrzej Skrzypacz, and Ennio Stacchetti for com-ments on various chapters; Roberto Pinheiro for exceptionally close proofreading; BoChen, Emin Dokumaci, Ratul Lahkar, and Jos Rodrigues-Neto for research assis-tance; and a large collection of anonymous referees for detailed and helpful reports.We thank Terry Vaughn and Lisa Stallings at Oxford University Press for their helpthroughout this project.We also thank our coauthors, who have played an important role in shaping ourthinking for so many years. Their direct and indirect contributions to this work aresignicant.We thank the University of Copenhagen and University College London for theirhospitality in the nal stages of the book.We thank the National Science Foundation for supporting our research that isdescribed in various chapters.This page intentionally left blank Contents1 Introduction 11.1 Intertemporal Incentives 11.2 The Prisoners Dilemma 31.3 Oligopoly 41.4 The Prisoners Dilemma under Imperfect Monitoring 51.5 The Product-Choice Game 71.6 Discussion 81.7 AReaders Guide 101.8 The Scope of the Book 10Part I Games with Perfect Monitoring2 The Basic Structure of Repeated Games with Perfect Monitoring 152.1 The Canonical Repeated Game 152.1.1 The Stage Game 152.1.2 Public Correlation 172.1.3 The Repeated Game 192.1.4 Subgame-Perfect Equilibrium of the Repeated Game 222.2 The One-Shot Deviation Principle 242.3 Automaton Representations of Strategy Proles 292.4 Credible Continuation Promises 322.5 Generating Equilibria 372.5.1 Constructing Equilibria: Self-Generation 372.5.2 Example: Mutual Effort 402.5.3 Example: The Folk Theorem 412.5.4 Example: Constructing Equilibria for Low 442.5.5 Example: Failure of Monotonicity 462.5.6 Example: Public Correlation 492.6 Constructing Equilibria: Simple Strategies and Penal Codes 512.6.1 Simple Strategies and Penal Codes 512.6.2 Example: Oligopoly 542.7 Long-Lived and Short-Lived Players 612.7.1 Minmax Payoffs 632.7.2 Constraints on Payoffs 66x Contents3 The Folk Theoremwith Perfect Monitoring 693.1 Examples 703.2 Interpreting the Folk Theorem 723.2.1 Implications 723.2.2 Patient Players 733.2.3 Patience and Incentives 753.2.4 Observable Mixtures 763.3 The Pure-Action Folk Theorem for Two Players 763.4 The Folk Theorem with More than Two Players 803.4.1 A Counterexample 803.4.2 Player-Specic Punishments 823.5 Non-Equivalent Utilities 873.6 Long-Lived and Short-Lived Players 913.7 Convexifying the Equilibrium Payoff Set WithoutPublic Correlation 963.8 Mixed-Action Individual Rationality 1014 HowLong Is Forever? 1054.1 Is the Horizon Ever Innite? 1054.2 Uncertain Horizons 1064.3 Declining Discount Factors 1074.4 Finitely Repeated Games 1124.5 Approximate Equilibria 1184.6 Renegotiation 1204.6.1 Finitely Repeated Games 1224.6.2 Innitely Repeated Games 1345 Variations on the Game 1455.1 Random Matching 1455.1.1 Public Histories 1465.1.2 Personal Histories 1475.2 Relationships in Context 1525.2.1 A Frictionless Market 1535.2.2 Future Benets 1545.2.3 Adverse Selection 1555.2.4 Starting Small 1585.3 Multimarket Interactions 1615.4 Repeated Extensive Forms 1625.4.1 Repeated Extensive-Form Games Have More Subgames 1635.4.2 Player-Specic Punishments in Repeated Extensive-FormGames 1655.4.3 Extensive-Form Games and Imperfect Monitoring 1675.4.4 Extensive-Form Games and Weak Individual Rationality 1685.4.5 Asynchronous Moves 1695.4.6 Simple Strategies 172Contents xi5.5 Dynamic Games: Introduction 1745.5.1 The Game 1755.5.2 Markov Equilibrium 1775.5.3 Examples 1785.6 Dynamic Games: Foundations 1865.6.1 Consistent Partitions 1875.6.2 Coherent Consistency 1885.6.3 Markov Equilibrium 1905.7 Dynamic Games: Equilibrium 1925.7.1 The Structure of Equilibria 1925.7.2 A Folk Theorem 1956 Applications 2016.1 Price Wars 2016.1.1 Independent Price Shocks 2016.1.2 Correlated Price Shocks 2036.2 Time Consistency 2046.2.1 The Stage Game 2046.2.2 Equilibrium, Commitment, and Time Consistency 2066.2.3 The Innitely Repeated Game 2076.3 Risk Sharing 2086.3.1 The Economy 2096.3.2 Full Insurance Allocations 2106.3.3 Partial Insurance 2126.3.4 Consumption Dynamics 2136.3.5 Intertemporal Consumption Sensitivity 219Part II Games with (Imperfect) Public Monitoring7 The Basic Structure of Repeated Games with Imperfect PublicMonitoring 2257.1 The Canonical Repeated Game 2257.1.1 The Stage Game 2257.1.2 The Repeated Game 2267.1.3 Recovering a Recursive Structure: Public Strategies andPerfect Public Equilibria 2287.2 ARepeated Prisoners Dilemma Example 2327.2.1 Punishments Happen 2337.2.2 Forgiving Strategies 2357.2.3 Strongly Symmetric Behavior Implies Inefciency 2397.3 Decomposability and Self-Generation 2417.4 The Impact of Increased Precision 2497.5 The Bang-Bang Result 251xii Contents7.6 An Example with Short-Lived Players 2557.6.1 Perfect Monitoring 2567.6.2 Imperfect Public Monitoring of the Long-Lived Player 2607.7 The Repeated Prisoners Dilemma Redux 2647.7.1 Symmetric Inefciency Revisited 2647.7.2 Enforcing a Mixed-Action Prole 2677.8 Anonymous Players 2698 Bounding Perfect Public EquilibriumPayoffs 2738.1 Decomposing on Half-Spaces 2738.2 The Inefciency of Strongly Symmetric Equilibria 2788.3 Short-Lived Players 2808.3.1 The Upper Bound on Payoffs 2808.3.2 Binding Moral Hazard 2818.4 The Prisoners Dilemma 2828.4.1 Bounds on Efciency: Pure Actions 2828.4.2 Bounds on Efciency: Mixed Actions 2848.4.3 A Characterization with Two Signals 2878.4.4 Efciency with Three Signals 2898.4.5 Efcient Asymmetry 2919 The Folk Theoremwith Imperfect Public Monitoring 2939.1 Characterizing the Limit Set of PPE Payoffs 2939.2 The Rank Conditions and a Public Monitoring Folk Theorem 2989.3 Perfect Monitoring Characterizations 3039.3.1 The Folk Theorem with Long-Lived Players 3039.3.2 Long-Lived and Short-Lived Players 3039.4 Enforceability and Identiability 3059.5 Games with a Product Structure 3099.6 Repeated Extensive-Form Games 3119.7 Games of Symmetric Incomplete Information 3169.7.1 Equilibrium 3189.7.2 A Folk Theorem 3209.8 Short Period Length 32610 Private Strategies in Games with Imperfect Public Monitoring 32910.1 Sequential Equilibrium 32910.2 AReduced-Form Example 33110.2.1 Pure Strategies 33110.2.2 Public Correlation 33210.2.3 Mixed Public Strategies 33210.2.4 Private Strategies 33310.3 Two-Period Examples 33410.3.1 Equilibrium Punishments Need Not Be Equilibria 33410.3.2 Payoffs by Correlation 33710.3.3 Inconsistent Beliefs 338Contents xiii10.4 An Innitely Repeated Prisoners Dilemma 34010.4.1 Public Transitions 34010.4.2 An Innitely Repeated Prisoners Dilemma: Indifference 34311 Applications 34711.1 Oligopoly with Imperfect Monitoring 34711.1.1 The Game 34711.1.2 Optimal Collusion 34811.1.3 Which News Is Bad News? 35011.1.4 Imperfect Collusion 35211.2 Repeated Adverse Selection 35411.2.1 General Structure 35411.2.2 An Oligopoly with Private Costs: The Game 35511.2.3 A Uniform-Price Equilibrium 35611.2.4 A Stationary-Outcome Separating Equilibrium 35711.2.5 Efciency 35911.2.6 Nonstationary-Outcome Equilibria 36011.3 Risk Sharing 36511.4 Principal-Agent Problems 37011.4.1 Hidden Actions 37011.4.2 Incomplete Contracts: The Stage Game 37111.4.3 Incomplete Contracts: The Repeated Game 37211.4.4 Risk Aversion: The Stage Game 37411.4.5 Risk Aversion: Review Strategies in the Repeated Game 375Part III Games with Private Monitoring12 Private Monitoring 38512.1 ATwo-Period Example 38512.1.1 Almost Public Monitoring 38712.1.2 Conditionally Independent Monitoring 38912.1.3 Intertemporal Incentives from Second-PeriodRandomization 39212.2 Private Monitoring Games: Basic Structure 39412.3 Almost Public Monitoring: Robustness in the Innitely RepeatedPrisoners Dilemma 39712.3.1 The Forgiving Prole 39812.3.2 Grim Trigger 40012.4 Independent Monitoring: ABelief-Based Equilibrium for theInnitely Repeated Prisoners Dilemma 40412.5 ABelief-Free Example 41013 Almost Public Monitoring Games 41513.1 When Is Monitoring Almost Public? 41513.2 Nearby Games with Almost Public Monitoring 418xiv Contents13.2.1 Payoffs 41813.2.2 Continuation Values 41913.2.3 Best Responses 42113.2.4 Equilibrium 42113.3 Public Proles with Bounded Recall 42313.4 Failure of Coordination under Unbounded Recall 42513.4.1 Examples 42513.4.2 Incentives to Deviate 42713.4.3 Separating Proles 42813.4.4 Rich Monitoring 43213.4.5 Coordination Failure 43413.5 Patient Players 43413.5.1 Patient Strictness 43513.5.2 Equilibria in Nearby Games 43713.6 AFolk Theorem 44114 Belief-Free Equilibria in Private Monitoring Games 44514.1 Denition and Examples 44514.1.1 Repeated Prisoners Dilemma with Perfect Monitoring 44714.1.2 Repeated Prisoners Dilemma with Private Monitoring 45114.2 Strong Self-Generation 453Part IV Reputations15 Reputations with Short-Lived Players 45915.1 The Adverse Selection Approach to Reputations 45915.2 Commitment Types 46315.3 Perfect Monitoring Games 46615.3.1 Building a Reputation 47015.3.2 The Reputation Bound 47415.3.3 An Example: Time Consistency 47715.4 Imperfect Monitoring Games 47815.4.1 Stackelberg Payoffs 48015.4.2 The Reputation Bound 48415.4.3 Small Players with Idiosyncratic Signals 49215.5 Temporary Reputations 49315.5.1 Asymptotic Beliefs 49415.5.2 Uniformly Disappearing Reputations 49615.5.3 Asymptotic Equilibrium Play 49715.6 Temporary Reputations: The Proof of Proposition 15.5.1 50015.6.1 Player 2s Posterior Beliefs 50015.6.2 Player 2s Beliefs about Her Future Behavior 502Contents xv15.6.3 Player 1s Beliefs about Player 2s Future Behavior 50315.6.4 Proof of Proposition 15.5.1 50916 Reputations with Long-Lived Players 51116.1 The Basic Issue 51116.2 Perfect Monitoring and Minmax-Action Reputations 51516.2.1 Minmax-Action Types and Conicting Interests 51516.2.2 Examples 51816.2.3 Two-Sided Incomplete Information 52016.3 Weaker Reputations for Any Action 52116.4 Imperfect Public Monitoring 52416.5 Commitment Types Who Punish 53116.6 Equal Discount Factors 53316.6.1 Example 1: Common Interests 53416.6.2 Example 2: Conicting Interests 53716.6.3 Example 3: Strictly Dominant Action Games 54016.6.4 Example 4: Strictly Conicting Interests 54116.6.5 Bounded Recall 54416.6.6 Reputations and Bargaining 54616.7 Temporary Reputations 54717 Finitely Repeated Games 54917.1 The Chain Store Game 55017.2 The Prisoners Dilemma 55417.3 The Product-Choice Game 56017.3.1 The Last Period 56217.3.2 The First Period, Player 1 56217.3.3 The First Period, Player 2 56518 Modeling Reputations 56718.1 An Alternative Model of Reputations 56818.1.1 Modeling Reputations 56818.1.2 The Market 57018.1.3 Reputation with Replacements 57318.1.4 How Different Is It? 57618.2 The Role of Replacements 57618.3 Good Types and Bad Types 58018.3.1 Bad Types 58018.3.2 Good Types 58118.4 Reputations with Common Consumers 58418.4.1 Belief-Free Equilibria with Idiosyncratic Consumers 58518.4.2 Common Consumers 58618.4.3 Reputations 58718.4.4 Replacements 58818.4.5 Continuity at the Boundary and Markov Equilibria 59018.4.6 Competitive Markets 594xvi Contents18.5 Discrete Choices 59618.6 Lost Consumers 59918.6.1 The Purchase Game 59918.6.2 Bad Reputations: The Stage Game 60018.6.3 The Repeated Game 60118.6.4 Incomplete Information 60318.6.5 Good Firms 60718.6.6 Captive Consumers 60818.7 Markets for Reputations 61018.7.1 Reputations Have Value 61018.7.2 Buying Reputations 613Bibliography 619Symbols 629Index 631Repeated Games and ReputationsThis page intentionally left blank 1 Introduction1.1 Intertemporal IncentivesIn Puccinis opera Gianni Schicchi, the deceased Buoso Donati has left his estate to amonastery, much to the consternation of his family.1Before anyone outside the familylearns of the death, Donatis relatives engage the services of the actor Gianni Schicchi,who is to impersonate Buoso Donati as living but near death, to write a new willleaving the fortune to the family, and then die. Anxious that Schicchi do nothing to riskexposing the plot, the family explains that there are severe penalties for tampering witha will and that any misstep puts Schicchi at risk. All goes well until Schicchi (actingas Buoso Donati) writes the new will, at which point he instructs that the entire estatebe left to the great actor Gianni Schicchi. The dumbstruck relatives watch in horror,afraid to object lest their plot be exposed and they pay the penalties with which theyhad threatened Schicchi.Ron Luciano, who worked in professional baseball as an umpire, occasionally didnot feel well enough to umpire. In his memoir, Luciano writes,2Over a period of time I learned to trust certain catchers so much that Iactually let them umpire for me on bad days. The bad days usually followedthe good nights. . . . On those days there wasnt much I could do but taketwo aspirins and call as little as possible. If someone I trusted wascatching . . . Id tell them, Look, its a bad day. Youd better take it for me.If its a strike, hold your glove in place for an extra second. If its a ball,throw it right back. And please, dont yell. . . . No one I worked with evertook advantage of the situation.In each case, the prospect for opportunistic behavior arises. Gianni Schicchi seesa chance to grab a fortune and does so. Any of Lucianos catchers could have tippedthe game in their favor by making the appropriate calls, secure in the knowledgethat Luciano would not expose them for doing his job, but none did so. What is the1. Our description of Gianni Schicchi is taken from Hamermesh (2004, p. 164) who uses it toillustrate the incentives that arise in isolated interactions.2. The use of this passage (originally from Luciano and Fisher 1982, p. 166) as an illustrationof the importance of repeated interactions is due to Axelrod (1984, p. 178), who quotes anddiscusses it.12 Chapter 1 Introduction2A B CA 5, 5 0, 0 12, 01 B 0, 0 2, 2 0, 0C 0, 12 0, 0 10, 10Figure 1.1.1 Amodied coordination game. Pure-strategy Nash equilibriainclude AA and BB but not CC.difference between the two situations? Schicchi anticipates no further dealings withthe family of Buoso Donati. In the language of game theory, theirs is a one-shot game.Lucianos catchers know there is a good chance they will again play games umpiredby Luciano and that opportunistic behavior may have adverse future consequences,even if currently unexposed. Theirs is a repeated game.These two stories illustrate the basic principle that motivates interest in repeatedgames: Repeated interactions give rise to incentives that differ fundamentally fromthose of isolated interactions. As a simple illustration, consider the game in gure 1.1.1.This game has two strict Nash equilibria, AA and BB. When this game is played once,players can do no better than to play AA for a payoff of 5. If the game is playedtwice, with payoffs summed over the two periods, there is an equilibriumwith a higheraverage payoff. The key is to use rst-period play to coordinate equilibrium play in thesecond period. The players choose CC in the rst period and AA in the second. Anyother rst-period outcome leads to BB in the second period. Should one player attemptto exploit the other by playing A in the rst period, he gains 2 in the rst period butloses 3 in the second. The deviation is unprotable, and so we have an equilibriumwith a total payoff of 15 to each player.We see these differences between repeated and isolated interactions throughoutour daily lives. Suppose, on taking your car for a routine oil change, you are told thatan engine problem has been discovered and requires an immediate and costly repair.Would your condence that this diagnosis is accurate depend on whether you regularlydo business with the service provider or whether you are passing through on vacation?Would you be more willing to buy a watch in a jewelry store than on the street corner?Would you be more or less inclined to monitor the quality of work done by a providerwho is going out of business after doing your job?Repeated games are the primary tool for understanding such situations. This pre-liminary chapter presents four examples illustrating the issues that arise in repeatedgames.1.2 The Prisoners Dilemma 312E SE 2, 2 1, 3S 3, 1 0, 0Figure 1.2.1 The prisoners dilemma.1.2 The Prisoners DilemmaThe prisoners dilemma is perhaps the best known and most studied (and most abused)of games.3We will frequently use the example given in gure 1.2.1.We interpret the prisoners dilemma as a partnership game in which each playercan either exert effort (E) or shirk (S).4Shirking is strictly dominant, while higherpayoffs for both players are achieved if both exert effort. In an isolated interaction,there is no escape from this dilemma. We must either change the game so that itis not a prisoners dilemma or must accept the inefcient outcome. Any argument indefense of effort must ultimately be an argument that either the numbers in the matrixdo not represent the players preferences, or that some other aspect of the game is notreally descriptive of the actual interaction.Things change considerably if the game is repeated. Suppose the game is playedin periods 0, 1, 2, . . . . The players make their choices simultaneously in each period,and then observe that periods outcome before proceeding to the next. Let ati and atjbe the actions (exert effort or shirk) chosen by players i and j in period t and let ui beplayer is utility function, as given in gure 1.2.1. Player i maximizes the normalizeddiscounted sum of payoffs(1 )t =0tui(ati, atj),where [0, 1) is the common discount factor. Suppose that the strategy of eachplayer is to exert effort in the rst period and to continue to do so in every subsequentperiod as long as both players have previously exerted effort, while shirking in allother circumstances. Suppose nally that period has been reached, and no one has3. Binmore (1994, chapter 3) discusses the many (unsuccessful) attempts to extract cooperationfrom the prisoners dilemma.4. Traditionally, the dominant action is called defect and the dominated action cooperate. In ourview, the words cooperate and defect are too useful to restrict their usage to actions in a single(even if important) game.4 Chapter 1 Introductionyet shirked. What should player i do? One possibility is to continue to exert effort, andin fact to do so forever. Given the strategy of the other player, this yields a (normalized,discounted) payoff of 2. The only other candidate for an optimal strategy is to shirkin period (if one is ever going to shirk, one might as well do it now), after whichone can do no better than to shirk in all subsequent periods (since player j will do so),for a payoff of (1 )3 +t =+1t 0 = 3(1 ). Continued effort is optimal if2 3(1 ), or 13.By making future play depend on current actions, we can thus alter the incentives thatshape these current actions, in this case allowing equilibria in which the players exerteffort. For these new incentives to have an effect on behavior, the players must besufciently patient, and hence future payoffs sufciently important.This is not the only equilibrium of the innitely repeated prisoners dilemma. It isalso an equilibriumfor each player to relentlessly shirk, regardless of past play. Indeed,for any payoffs that are feasible in the stage game and strictly individually rational(i.e., give each player a positive payoff), there is an equilibrium of the repeated gameproducing those payoffs, if the players are sufciently patient. This is an example ofthe folk theorem for repeated games.1.3 OligopolyWe can recast this result in an economic context. Consider the Cournot oligopolymodel. There are n rms, denoted by i = 1, . . . , n, who costlessly produce an iden-tical product. The rms simultaneously choose quantities of output, with rm isoutput denoted by ai R+ and with the market price then given by 1 ni=1ai.Given these choices, the prots of rm j are given by uj(a1, . . . , an) =aj

1 ni=1ai

.For any number of rms n, this game has a unique Nash equilibrium, which is sym-metric and calls for each rmi to produce output aNand earn prots ui(aN, . . . , aN),where5aN= 1n +1,and ui(aN, . . . , aN) =

1n +1

2.When n = 1, this is a monopoly market. As the number of rms n grows arbitrarilylarge, the equilibrium outcome approaches that of a competitive market, with zeroprice and total quantity equal to one, while the consumer surplus increases and thewelfare loss prompted by imperfect competition decreases.5. Firmjs rst-order condition for prot maximization is 1 2aj i=j ai =0 or aj =1 A,where A is the total quantity produced in the market. It is then immediate that the equilibriummust be symmetric, giving a rst-order condition of 1 (n +1)aN= 0.1.4 Imperfect Monitoring 5The analysis of imperfectly competitive markets has advanced far beyond thissimple model. However, the intuition remains that less concentrated markets are morecompetitive and yield higher consumer welfare, providing the organizing theme formany discussions of merger and antitrust policy.It may instead be reasonable to view the interaction between a handful of rmsas a repeated game.6The welfare effects of market concentration and the forcesbehind these effects are now much less clear. Much like the case of the prisonersdilemma, consider strategies in which each rm produces 1/2n as long as every rmhas done so in every previous period, and otherwise produces output 1/(n +1). Theformer output allows the rms to jointly reproduce the monopoly outcome in thismarket, splitting the prots equally, whereas the latter is the Nash equilibrium of thestage game. As long as the discount factor is sufciently high, these strategies area subgame-perfect equilibrium.7Total monopoly prots are 1/4 and the prots ofeach rm are given by 1/4n in each period of this equilibrium. The increased payoffsavailable to a rm who cheats on the implicit agreement to produce the monopolyoutput are overwhelmed by the future losses involved in switching to the stage-game equilibrium. If these collusive strategies are the equilibrium realized in therepeated game, then reductions in the number of rms may have no effect on con-sumer welfare at all. No longer can we regard less concentrated markets as morecompetitive.1.4 The Prisoners Dilemma under Imperfect MonitoringOur rst two examples have been games of perfect monitoring, in the sense that theplayers can observe each others actions. Consider again the prisoners dilemma ofsection 1.2, but now suppose a player can observe the outcome of the joint venturebut cannot observe whether his partner has exerted effort. In addition, the outcome iseither a success or a failure and is a random function of the partners actions. Asuccessappears with probability p if both partners exert effort, with probability q if one exertseffort and one shirks, and probability r if both shirk, where p > q > r > 0.This is now a game of imperfect public monitoring. It is clear that the strategiespresented in section 1.2 for sustaining effort as an equilibrium outcome in the repeatedgameexert effort in the absence of any shirking, and shirk otherwisewill no longer6. To evaluate whether the stage game or the repeated game is a more likely candidate for usefullyexamining a market, one might reasonably begin with questions about the qualitative nature ofbehavior in that market. When rm i sets its current quantity or price, does it consider the effectthat this quantity and price might have on the future behavior of its rivals? For example, does anairline wonder whether a fare cut will prompt similar cuts on the part of its competitors? Doesan auto manufacturer ask whether rebates and nancial incentives will prompt similar initiativeson the part of other auto manufacturers? If so, then a repeated game is the obvious tool formodelling the interaction.7. After some algebra, the condition is that116

n +1n

2 11

14n1(n +1)2

.6 Chapter 1 Introductionwork, because players cannot tell when someone has shirked. However, all is notlost. Suppose the players begin by exerting effort and do so as long as the ventureis successful, switching to permanent shirking as soon as a failure is observed. Forsufciently patient players, this is an equilibrium (section 7.2.1 derives the necessarycondition 1/[3p 2q]). The equilibrium embodies a rather bleak future, in thesense that a failure will eventually occur and the players will shirk thereafter, butsupports at least some effort.The difculty here is that the punishment supporting the incentives to exerteffort, consisting of permanent shirking after the rst failure, is often more severe thannecessary. This was no problem in the perfect-monitoring case, where the punishmentwas safely off the equilibrium path and hence need never be carried out. Here, theimperfect monitoring ensures that punishments will occur. The players would thusprefer the punishments be as lenient as possible, consistent withcreatingthe appropriateincentives for exerting effort. Chapter 7 explains how equilibria can be constructedwith less severe punishments.Imperfect monitoring fundamentally changes the nature of the equilibrium. Ifnontrivial intertemporal incentives are to be created, then over the course of equilibriumplay the players will nd themselves in a punishment phase innitely often. Thishappens despite the fact that the players know, when the punishment is triggered by afailure, that both have in fact followed the equilibrium prescription of exerting effort.Then why do they carry through the punishment? Given that the other players areentering the punishment phase, it is a best response to do likewise. But why wouldequilibria arise that routinely punish players for offenses not committed? Becausethe expected payoffs in such equilibria can be higher than those produced by simplyplaying a Nash equilibrium of the stage game.Given the inevitability of punishments, one might suspect that the set of feasibleoutcomes in games of imperfect monitoring is rather limited. In particular, it appearsas if the inevitability of some periods in which players shirk makes efcient outcomesimpossible. However, chapter 9 establishes conditions under which we again have afolk theoremresult. The key is to work with asymmetric punishments, sliding along thefrontier of efcient payoffs so as to reward some players as others are penalized.8Theremay thus be a premium on asymmetric strategies, despite the lack of any asymmetryin the game.The players in this example at least have the advantage that the information theyreceive is public. Either both observe a success or both a failure. This ensures thatthey can coordinate their future behavior, as a function of current outcomes, so as tocreate the appropriate current incentives. Chapters 1214 consider the case of privatemonitoring, in which the players potentially receive different private information aboutwhat has transpired. It now appears as if the ability to coordinate future behaviorin response to current events has evaporated completely, and with it the ability tosupport any outcome other than persistent shirking. Perhaps surprisingly, there is stillconsiderable latitude for equilibria featuring effort.8. This requires that the imperfect monitoring give players (noisy) indications not only that adeviation from equilibrium play has occurred but also who might have been the deviator. Thetwo-signal example in this section fails this condition.1.5 The Product-Choice Game 712h H 2, 3 0, 2L 3, 0 1, 1Figure 1.5.1 The product-choice game.1.5 The Product-Choice GameConsider the game shown in gure 1.5.1. Think of player 1 as a rm who can exerteither high effort (H) or low effort (L) in the production of its output. Player 2 is aconsumer who can buy either a high-priced product, h, or a low-priced product, . Forexample, we might think of player 1 as a restaurant whose menu features both elegantdinners and hamburgers, or as a surgeon who treats respiratory problems with eitherheart surgery or folk remedies.Player 2 prefers the high-priced product if the rm has exerted high effort, butprefers the low-priced product if the rm has not. One might prefer a ne dinneror heart surgery from a chef or doctor who exerts high effort, while preferring fastfood or an ineffective but unobtrusive treatment from a shirker. The rm prefers thatconsumers purchase the high-priced product and is willing to commit to high effort toinduce that choice by the consumer. In a simultaneous move game, however, the rmcannot observably choose effort before the consumer chooses the product. Becausehigh effort is costly, the rm prefers low effort, no matter the choice of the consumer.The stage game has a unique Nash equilibrium, in which the rm exerts loweffort and the consumer purchases the low-priced product. Suppose the game is playedinnitely often, with perfect monitoring. In doing so, we will often interpret player 2not as a single player but as a succession of short-lived players, each of whom playsthe game only once. We assume that each new short-lived player can observe signalsabout the rms previous choices.9As long as the rm is sufciently patient, there isan equilibriumin the repeated game in which the rmexerts high effort and consumerspurchase the high-pricedproduct. The rmis deterredfromtakingthe immediate payoffboost accompanying low effort by the prospect of future consumers then purchasingthe low-priced product.10Again, however, there are other equilibria, including one inwhich low effort is exerted and the low priced product purchased in every period.9. If this is not the case, we have a large collection of effectively unrelated single-shot games thathappen to have a common player on one side, rather than a repeated game.10. Purchasing the high-priced product is a best response for the consumer to high effort, so that noincentive issues arise concerning player 2s behavior. When consumers are short-lived, there isno prospect for using future play to alter current incentives.8 Chapter 1 IntroductionNow suppose that consumers are not entirely certain of the characteristics of therm. They may attach high probability to the rms being normal, meaning that it hasthe payoffs just given, but they also entertain some (possibly very small) probabilitythat they face a rmwho fortuitously has a technology or some other characteristic thatensures high effort. Refer to the latter as the commitment type of rm. This is now agame of incomplete information, with the consumers uncertain of the rms type. Aslong as the rm is sufciently patient, then in any Nash equilibrium of the repeatedgame, the rms payoff must be arbitrarily close to 2. This result holds no matter howunlikely consumers think the commitment type, though increasing patience is requiredfrom the normal rm as the commitment type becomes less likely.To see the intuition behind this result, suppose that we have a candidate equilib-rium in which the normal rm receives a payoff less than 2 . Then the normal andcommitment types must be making different choices over the course of the repeatedgame, because an equilibrium in which they behave identically would induce con-sumers to choose h and would yield a payoff of 2. Now, one option open to the normalrm is to mimic the behavior of the commitment type. If the normal rm does so overa sufciently long period of time, then the short-run players (who expect the normaltype of rm to behave differently) will become convinced that the rm is actually thecommitment type and will play their best response of h. Once this happens, the normalrm thereafter earns a payoff of 2. Of course, it may take a while for this to happen,and the rm may have to endure much lower payoffs in the meantime, but these initialpayoffs are not very important if the rm is patient. If the rm is patient enough, itthus has a strategy available that ensures a payoff arbitrarily close to 2. Our initialhypothesis, that the rms equilibrium payoff fell short of 2 , must then have beenincorrect. Any equilibrium must give the (sufciently patient) normal type of rm apayoff above 2 .The common interpretation of this argument is that the normal rm can acquireand maintain a reputation for behaving like the commitment type. This reputation-buildingpossibility, whichexcludes manyof the equilibriumoutcomes of the complete-information game, may appear to be quite special. Why should consumers uncertaintyabout the rm take precisely the form we have assumed? What happens if there is asingle buyer who reappears in each period, so that the buyer also faces intertemporalincentive considerations and may also consider building a reputation? However, theresult generalizes far beyond the special structure of this example (chapter 15).1.6 DiscussionThe unifying theme of work in repeated games is that links between current and futurebehavior can create incentives that would not be apparent if one examined a currentinteraction in isolation. We routinely rely on the importance of such links in our dailylives.Markets create boundaries within which a vast variety of behavior is possible.These markets can function effectively only if there is a shared understanding of whatconstitutes appropriate behavior. Trade in many markets involves goods and services1.6 Discussion 9whose characteristics are sufciently difcult to verify as to make legal enforcementa hopelessly clumsy tool. This is an especially important consideration in marketsinvolving expert providers, with the markets for medical care and a host of main-tenance and repair services being the most prominent examples, in which the partyproviding the service is also best positioned to assess the service. More generally,legal sanctions cannot explain why people routinely refrain from opportunistic behav-ior, such as attempting to renegotiate prices, taking advantage of sunk costs, or cheatingon a transaction. But if such behavior reigned unchecked, our markets would collapse.The common force organizing market transactions is the prospect of future inter-actions. We decline opportunities to renege on deals or turn them to our advantagebecause we expect to have future dealings with the other party. The development oftrading practices that transferred enough information to create effective intertemporalincentives was a turning point in the development of our modern market economy(e.g., Greif 1997, 2005; Greif, Milgrom, and Weingast 1994).Despite economists emphasis on markets, many of the most critical activities inour lives take place outside of markets. We readily cede power and authority to somepeople, either formally through a political process or through informal acquiesence.We have social norms governing when one is allowed to take advantage of anotherand when one should refrain from doing so. We have conventions for how familiesare formed, including who is likely to mate with whom, and how they are organized,including who has an obligation to support whom and who can expect resources fromwhom. Our society relies on institutions to perform some functions, whereas otherquite similar functions are performed outside of institutionsthe helpless elderly areroutinely institutionalized, but not infants.Aunifying view of these observations is that they reect equilibria of the repeatedgame that we implicitly play with one another.11It is then no surprise, given thetendency for repeated games to have multiple equilibria, that we see great variationacross the world in how societies and cultures are organized. This same multiplicityopens the door to the possibility that we might think about designing our society tofunction more effectively. The theory of repeated games provides the tools for thistask.The best known results in the theory of repeated games, the folk theorems, focusattention on the multiplicity of equilibria in such games, a source of great consternationfor some. We consider multiple equilibria a virtuehow else can one hope to explainthe richness of behavior that we observe around us?It is also important to note that the folk theoremcharacterizes the payoffs availableto arbitrarily patient players. Much of our interest and much of the work in this bookconcerns cases in which players are patient enough for intertemporal incentives tohave some effect, but not arbitrarily patient. In addition, we are concerned with theproperties of equilibrium behavior as well as payoffs.11. Ellickson (1991) provides a discussion of how neighbors habitually rely on informal intertem-poral arrangements to mediate their interactions rather than relying exclusively on currentincentives, even when the latter are readily available. Binmore (1994, 1998) views an under-standing of the incentives created by repeated interactions as being sufciently common as toappropriately replace previous notions, such as the categorical imperative, as the foundation fortheories of social justice.10 Chapter 1 Introduction1.7 A Readers GuideChapter 2 is the obvious point of departure, introducing the basic tools for working withrepeated games, including the dynamic programming approach to repeated games.The reader then faces a choice. One can proceed through the chapters in part I ofthe book, treating games of perfect monitoring. Chapter 3 uses constructive argumentsto prove the folk theorem. Chapter 4 examines a number of issues, such as what weshould make of an innite horizon, that arise in interpreting repeated games, whilechapter 5 pushes the analysis beyond the connes of the canonical repeated game.Chapter 6 illustrates the techniques with a collection of economic applications.Alternatively, one can jump directly to part II. Here, chapters 7 and 8 present morepowerful (though initially seemingly more abstract) techniques that allow a uniedtreatment of games of perfect and imperfect monitoring. These allow us to work withthe limiting case of perfectly patient players as well as cases in which players may beless patient and in which the sufcient conditions for a folk theorem fail. Chapter 9presents the public monitoring folk theorem, and chapter 10 explores features that ariseout of imperfections in the monitoring process. Chapter 11 again provides economicillustrations.The reader now faces another choice. Part III (chapters 1214) considers the caseof private monitoring. Here, we expect a familiarity with the material in chapter 7.Alternatively, the reader can proceed to part IV, on reputations, whether arrivingfrom part I or part II. Chapters 15 and 16 form the core here, presenting the classicalreputation results for games with a single long-lived player and for games with multiplelong-lived players, with the remaining chapters exploring extensions and alternativeformulations.For an epilogue, see Samuelson (2006, section 6).1.8 The Scope of the BookThe analysis of long-run relationships is relevant for virtually every area of economics.The literature on intertemporal incentives is vast. If a treatment of the subject isto be kept manageable, some things must be excluded. We have not attempted acomprehensive survey or history of the literature.Our canonical setting is one in which a xed stage game is played in each ofan innite number of time periods by players who maximize the average discountedpayoffs. We concentrate on cases in which players have the same discount factor (seeremark 2.1.4) or on cases in which long-lived players with common discount factorsare joined by myopic short-lived players.We are sometimes interested in cases in which the players are quite patient, typ-ically captured by examining the limit as their discount factors approach 1, becauseintertemporal incentives are most effective with patient players. An alternative isto work directly with arbitrarily patient players, using criteria such as the limit-of-the-means payoffs or the overtaking criteria to evaluate payoffs. We touch on thissubject briey (section 3.2.2), but otherwise restrict attention to discounted games.1.8 The Scope of the Book 11In particular, we are also often interested in cases with nontrivial intertemporal incen-tives, but sufcient impatience to impose constraints on equilibrium behavior. In addi-tion, the no-discounting case is more useful for studying payoffs than behavior. Weare interested not only in players payoffs but also in the strategies that deliver thesepayoffs.We discuss nitely repeated games (section 4.4 and chapter 17) only enough toargue that there is in general no fundamental discontinuity when passing from niteto innitely repeated games. The analyses of nite and innite horizon games aregoverned by the same conceptual issues. We then concentrate on innitely repeatedgames, where a more convenient body of recursive techniques generally allows a morepowerful analysis.We concentrate on cases in which the stage game is identical across periods. Ina dynamic game, the stage game evolves over the course of the repeated interactionin response to actions taken by the players.12Dynamic games are also referred to asstochastic games, emphasizing the potential randomness in the evolution of the state.We offer an introduction to such issues in sections 5.55.7, suggesting Mertens (2002),Sorin (2002), and Vieille (2002) as introductions to the literature.A literature in its own right has grown around the study of differential games, orperfect-monitoring dynamic games played in continuous time, much of it in engineer-ing and mathematics. As a result of the continuous time setting, the techniques forworking with such games resemble those of control theory, whereas those of repeatedgames more readily prompt analogies to dynamic programming. We say nothingabout such games, suggesting Friedman (1994) and Clemhout and Wan (1994) forintroductions to the topic.The rst three parts of this book consider games of complete information, whereplayers share identical information about the structure of the game. In contrast, manyeconomic applications are concerned with cases of incomplete information. A sellermay not know the buyers utility function. Abuyer may not know whether a rm plansto continue in business or is on the verge of absconding. A potential entrant may notknow whether the existing rm can wage a price war with impunity or stands to losetremendously fromdoing so. In the nal part of this book, we consider a special class ofgames of incomplete information whose study has been particularly fruitful, namely,reputation games. A more general treatment of games of incomplete information isgiven by Zamir (1992) and Forges (1992).The key to the incentives that arise in repeated games is the ability to establisha link between current and future play. If this is to be done, players must be able toobserve or monitor current play. Much of our work is organized around assumptionsabout players abilities to monitor others behavior. Imperfections in monitoring canimpose constraints on equilibrium payoffs. Anumber of publications have shown that12. For example, the oligopolists in section 1.3 might also have the opportunity to invest in cost-reducing research and development in each period. Each period then brings a new stage game,characterized by the cost levels relevant for the period, along with a new opportunity to affectfuture stage games as well as secure a payoff in the current game. Inventory levels for a rm, debtlevels for a government, education levels for a worker, or weapons stocks for a country may all besources of similar intertemporal evolution in a stage game. Somewhat further aeld, a bargaininggame is a dynamic game, with the stage game undergoing a rather dramatic transformation tobecoming a trivial game in which nothing happens once an agreement is reached.12 Chapter 1 Introductionthese constraints can be relaxed if players have the ability to communicate with oneanother (e.g., Compte 1998 and Kandori and Matsushima 1998). We do not considersuch possibilities.We say nothing here about the reputations of expert advisors. An expert may havepreferences about the actions of a decision maker whom he advises, but the adviceitself is cheap talk, in the sense that it affects the experts payoff only through itseffect on the decision makers action. If this interaction occurs once, then we have astraightforward cheap talk game whose study was pioneered by Crawford and Sobel(1982). If the interaction is repeated, then the experts recommendations have an effectnot only on current actions but also possibly on how much inuence the expert willhave in the future. The expert may then prefer to hedge the current recommendationin an attempt to be more inuential in the future (e.g., Morris 2001).Finally, the concept of a reputation has appeared in a number of contexts, someof them quite different from those that appear in this book, in both the academicliterature and popular use. Firms offering warranties are often said to be cultivatingreputations for high quality, advertising campaigns are designed to create a reputationfor trendiness, forecasters are said to have reputations for accuracy, or advisors forgiving useful counsel. These concepts of reputation touch on ideas similar to those withwhich we work in a number of places. We have no doubt that the issues surroundingreputations are much richer than those we capture here. We regard this area as aparticularly important one for further work.Part IGames with Perfect MonitoringThis page intentionally left blank 2 The Basic Structure of RepeatedGames with Perfect Monitoring2.1 The Canonical Repeated Game2.1.1 The Stage GameThe construction of the repeated game begins with a stage game. There are n players,numbered 1, . . . , n.We refer to choices in the stage game as actions, reserving strategy for behaviorin the repeated game. The set of pure actions available to player i in the stage gameis denoted Ai, with typical element ai. The set of pure action proles is given byA iAi. We assume each Ai is a compact subset of the Euclidean space Rkforsome k. Some of the results further assume each Ai is nite.Stage game payoffs are given by a continuous function,u :iAi Rn.The set of mixed actions for player i is denoted by (Ai), with typical element i,and the set of mixed proles byi(Ai). The payoff function is extended to mixedactions by taking expectations.The set of stage-game payoffs generated by the pure action proles in A isF {v Rn: a A s.t. v = u(a)}.The set of feasible payoffs,F coF,is the convex hull of the set of payoffs F.1As we will see, for sufciently patientplayers, intertemporal averaging allows us to obtain payoffs in F\F. A payoffv Fis inefcient if there exists another payoff v

Fwith v

i > vi for all i; thepayoff v

strictly dominates v. A payoff is efcient (or Pareto efcient) if it is notinefcient. If, for v, v

F, v

i vi for all i with a strict inequality for some i, thenv

weakly dominates v. A feasible payoff is strongly efcient if it is efcient and notweakly dominated by any other feasible payoff.By Nashs (1951) existence theorem, if the stage game is nite, then it has a(possibly mixed) Nash equilibrium. In general, because payoffs are given by contin-uous functions on the compact set iAi, it follows from Glicksbergs (1952) xedpoint theorem that the innite stage games we consider also have Nash equilibria. It1. The convex hull of a set A Rn, denoted coA, is the smallest convex set containing A.1516 Chapter 2 Perfect Monitoringis common when working with innite action stage games to additionally require thatthe action spaces be convex and ui quasi-concave in ai, so that pure-strategy Nashequilibria exist (Fudenberg and Tirole 1991, section 1.3.3).For ease of reference, we list the maintained assumptions on the stage game.Assumption2.1.11. Ai is either nite, or a compact and convex subset of the Euclidean space Rkfor some k. We refer to compact and convex action spaces as continuum actionspaces.2. If Ai is a continuum action space, then u : A Rnis continuous, and ui isquasiconcave in ai.Remark2.1.1Pure strategies givencontinuumactionspaces When action spaces are continua,to avoid some tedious measurability details, we only consider pure strategies.Because the basic analysis of nite action games (with pure or mixed actions) andcontinuum action games (with pure actions) is identical, we use i to both denotepure or mixed strategies in nite games, and pure strategies only in continuumaction games.Much of the work in repeated games is concerned with characterizing the payoffsconsistent with equilibriumbehavior in the repeated game. This characterization in turnoften begins by identifying the worst payoff consistent with individual optimization.Player i always has the option of playing a best response to the (mixed) actions chosenby the other players. In the case of pure strategies, the worst outcome in the stage gamefor player i, consistent with i behaving optimally, is then that the other players choosethe prole ai Ai j=iAj that minimizes the payoff i earns when i plays a bestresponse to ai. This bound, player is (pure action) minmax payoff, is given byvpi minaiAimaxaiAiui(ai, ai).The compactness of A and continuity of ui ensure vpi is well dened. A (pure action)minmax prole (which may not be unique) for player i is a prole ai= ( aii, aii) withthe properties that aii is a stage-game best response for i to aii and vpi = ui( aii, aii).Hence, player is minmax action prole gives i his minmax payoff and ensures that noalternative action on is part can raise his payoff. In general, the other players will notbe choosing best responses in prole ai, and hence aiwill not be a Nash equilibriumof the stage game.2A payoff vector v = (v1, . . . , vn) is weakly (pure action) individually rational ifvi vpi for all i, and is strictly (pure action) individually rational if vi >vpi for all i.The set of feasible and strictly individually rational payoffs is given byFp {v F: vi >vpi , i = 1, . . . , n}.The set of strictly individually rational payoffs generated by pure action proles isgiven byFp {v F : vi >vpi , i = 1, . . . , n}.2. An exception is the prisoners dilemma, where mutual shirking is both the unique Nash equi-librium of the stage game and the minmax action prole for both players. Many of the specialproperties of the prisoners dilemma arise out of this coincidence.2.1 The Canonical Repeated Game 17H TH 1, 1 1, 1T 1, 1 1, 1Figure 2.1.1 Matching pennies.Remark2.1.2Mixed-actionindividual rationality In nite games, lower payoffs can sometimesbe enforced when we allow players to randomize. In particular, allowing theplayers other than player i to randomize yields the mixed-action minmax payoff,vi minij=i(Aj)maxaiAiui(ai, i), (2.1.1)which can be lower than the pure action minmax payoff, vpi .3A (mixed) actionminmax prole for player i is a prole i= ( ii, ii) with the properties that iiis a stage-game best response for i to ii and vi = ui( ii, ii).In matching pennies (gure 2.1.1), for example, player 1s pure minmax payoffis 1, because for any of player 2s pure strategies, player 1 has a best responsegiving a payoff of 1. Pure minmax action proles for player 1 are given by (H, H)and (T, T ). In contrast, player 1s mixed minmax payoff is 0, implied by player2s mixed action of 12 H + 12 T .4We use the same term, individual rationality, to indicate both vi vpi andvi vi, with the context indicating the appropriate choice. We denote the set offeasible and strictly individually rational payoffs (relative to the mixed minmaxutility, vi) byF {v F: vi >vi, i = 1, . . . , n}.2.1.2 Public CorrelationIt is sometimes natural to allow players to use a public correlating device. Such adevice captures a variety of public events that players might use to coordinate theiractions. Perhaps the best known example is an agreement in the electrical equipmentindustry in the 1950s to condition bids in procurement auctions on the current phaseof the moon.53. Allowing player i to mix will not change is minmax payoff, because every action in the supportof a mixed best reply is also a best reply for player i.4. We denote the mixture that assigns probability i(ai) to action ai byaii(ai) ai.5. Asmall body of literature has studied this case. See Carlton and Perloff (1992, pp. 213216) fora brief introduction.18 Chapter 2 Perfect MonitoringDefinition2.1.1A public correlating device is a probability space ([0, 1], B, ), where B is theBorel -algebra and is Lebesgue measure. In the stage game with public corre-lation, a realization [0, 1] of a public random variable is rst drawn, whichis observed by all players, and then each player i chooses an action ai Ai.Astage-game action for player i in the stage game with public correlation is a (mea-surable) function ai : [0, 1] (Ai). If ai : [0, 1] Ai, then ai is a pure action.When actions depend nontrivially on the outcome of the public correlating device,we calculate player is expected payoff in the obvious manner, by taking expectationsover the outcome [0, 1]. Any strategy prole a (a1, . . . , an) induces a jointdistribution over i(Ai). When evaluating the protability of a deviation from a,because the realization of the correlating device is public, the calculation is ex post,that is, conditional on the realization of . If a is a Nash equilibrium of the stagegame with public correlation, then every in the support of a is a Nash equilibriumof the stage game without public correlation; in particular, most correlated equilibria(Aumann 1974) are not equilibria of the stage game with public correlation.It is possible to replace the public correlating device with communication, by usingjointly controlled lotteries, introduced by Aumann, Maschler, and Stearns (1968). Forexample, for two players, suppose they simultaneously announce a number from[0, 1].Let equal their sum, if the sum is less than 1, and equal their sum minus 1 otherwise.It is easy to verify that if one player uniformly randomizes over his selection, then isuniformly distributed on [0, 1] for any choice by the other player. Consequently, neitherplayer can inuence the probability distribution. We will not discuss communication inthis book, so we use public correlating devices rather than jointly controlled lotteries.Trivially, every payoff in Fcan be achieved in pure actions using public cor-relation. On the other hand, not all payoffs in Fcan be achieved in mixed actionswithout public correlation. For example, consider the game in gure 2.1.2. The set Fis given byF = {(2, 2), (5, 1), (1, 5), (0, 0)}.The set Fis the set of all convex combinations of these four payoff vectors. Someof the payoffs that are in Fbut not F can be obtained via independent mixtures,ignoring the correlating device, over the sets {T, B} and {L, R}. For example, Fcontains (20/9, 20/9), obtained by independent mixtures that place probability 2/3on T (or L) and 1/3 on B (or R). A pure strategy that uses the public correlationdevice to place probability 4/9 on (T, L), 2/9 on each of (T, R) and (B, L), and 1/9on (B, R) achieves the same payoff. In addition, the public correlating device allowsthe players to achieve some payoffs in Fthat cannot be obtained from independentL RT 2, 2 1, 5B 5, 1 0, 0Figure 2.1.2 The game of chicken.2.1 The Canonical Repeated Game 19mixtures. For example, the players can attach probability 1/2 to each of the outcomes(T, R) and (B, L), giving payoffs (3, 3). No independent mixtures can achieve suchpayoffs, because any such mixtures must attach positive probability to payoffs (2, 2)and (0, 0), ensuring that the sum of the two players average payoffs falls below 6.2.1.3 The Repeated GameIn the repeated game,6the stage game is played in each of the periods t {0, 1, . . .}. Informulating the notation for this game, we use subscripts to refer to players, typicallyidentifying the element of a prole of actions, strategies, or payoffs corresponding to aparticular player. Superscripts will either refer to periods or denote particular prolesof interest, with the use being clear from the context.This chapter introduces repeated games of perfect monitoring. At the end of eachperiod, all players observe the action prole chosen. In other words, the actions ofevery player are perfectly monitored by all other players. If a player is randomizing,only the realized choice is observed.The set of period t 0 histories is given byHt At,where we dene the initial history to be the null set, A0 {}, and Atto be the t -foldproduct of A. Ahistory ht Htis thus a list of t action proles, identifying the actionsplayed in periods 0 through t 1. The addition of a period t action prole then yieldsa period t +1 history ht +1, an element of the set Ht +1= HtA. The set of allpossible histories isH t =0Ht.Apure strategy for player i is a mapping from the set of all possible histories intothe set of pure actions,7i : H Ai.A mixed strategy for player i is a mixture over the set of all pure strategies. Withoutloss of generality, we typically nd it more convenient to work with behavior strategiesrather than mixed strategies.8Abehavior strategy for player i is a mappingi : H (Ai).Because a pure strategy is trivially a special case of a behavior strategy, we use thesame notation i for both pure and behavior strategies. Unless indicating otherwise,6. The early literature often used the term supergame for the repeated game.7. Because there is a natural bijection (one-to-one and onto mapping) between H and each playerscollection of information sets, this is the standard notion of an extensive-form strategy.8. Two strategies for a player i are realization equivalent if, xing the strategies of the other players,the two strategies of player i induce the same distribution over outcomes. It is a standard resultfor nite extensive form games that every mixed strategy has a realization equivalent behaviorstrategy (Kuhns theorem, see Ritzberger 2002, theorem 3.3, p. 127), and the same is truehere. See Mertens, Sorin, and Zamir 1994, theorem 1.6, p. 66 for a proof (though the proof isconceptually identical to the nite case, the innite horizon introduces some technical issues).20 Chapter 2 Perfect Monitoringwe then use the word strategy to denote a behavior strategy, which may happen to bepure. Recall fromremark 2.1.1 that we consider only pure strategies for a player whoseaction space is a continuum (even though for notational simplicity we sometimes usei to denote the stage game action).For any history ht H , we dene the continuation game to be the innitelyrepeated game that begins in period t , following history ht. For any strategy prole ,player is continuation strategy induced by ht, denoted i|ht , is given byi|ht (h) = i(hth), h H ,where hthis the concatenation of the history htfollowed by the history h. This is thebehavior implied by the strategy i in the continuation game that follows history ht. Wewrite |ht for (1|ht , . . . , n|ht ). Because for each history ht, i|ht is a strategy in theoriginal repeated game, that is, i|ht : H (Ai), the continuation game associatedwith each history is a subgame that is strategically identical to the original repeatedgame. Thus, repeated games have a recursive structure, and this plays an importantrole in their study.An outcome path (or more simply, outcome) in the innitely repeated game isan innite sequence of action proles a (a0, a1, a2, . . .) A. Notice that an out-come is distinct from a history. Outcomes are innite sequences of action proles,whereas histories are nite-length sequences (whose length identies the period forwhich the history is relevant). We denote the rst t periods of an outcome a byat= (a0, a1, . . . , at 1). Thus, atis the history in Htcorresponding to the outcome a.The pure strategy prole (1, . . . , n) induces the outcome a() (a0(), a1(), a2(), . . .) recursively as follows. In the rst period, the action prolea0() (1(), . . . , n())is played. In the second period, the history a0() implies that action prolea1() (1(a0()), . . . , n(a0()))is played. In the third period, the history (a0(), a1()) is observed, implying theaction prolea2() (1(a0(), a1()), . . . , n(a0(), a1()))is played, and so on.Analogously, a behavior strategy prole induces a path of play. In the rstperiod, () is the initial mixed action prole 0 i(Ai). In the second period,for each history a0in the support of 0, (a0) is the mixed action prole 1(a0),and so on. For a pure strategy prole, the induced path of play and induced outcomeare the same. If the prole has some mixing, however, then the prole induces a pathof play that species, for each period t , a probability distribution over the historiesat. The underlying behavior strategy species a period t prole of mixed stage-gameactions for each such history at, in turn inducing a probability distribution t +1(at)over period t +1 action proles at +1, and hence a probability distribution over periodt +1 histories at +1.2.1 The Canonical Repeated Game 21Suppose is a pure strategy prole. In period t , the induced pure action pro-le at() yields a ow payoff of ui(at()) to player i. An outcome a() thusimplies an innite stream of stage-game payoffs for each player i, given by(ui(a0()), ui(a1()), ui(a2()), . . .) R. Each player discounts these payoffswith the discount factor [0, 1), so that the average discounted payoff to playeri from the innite sequence of payoffs (u0i, u1i, u2i, . . .) is given by(1 )t =0tuti.The payoff from a pure strategy prole is then given byUi() = (1 )t =0tui(at()). (2.1.2)As usual, the payoff to player i from a prole of mixed or behavior strategies is theexpected value of the payoffs of the realized outcomes, also denoted Ui().Observe that we normalize the payoffs in (2.1.2) (and throughout) by the factor(1 ). This ensures that U() = (U1(), . . . , Un()) Ffor all repeated-gamestrategy proles . We can then readily compare payoffs in the repeated game andthe stage game, and compare repeated-game payoffs for different (common) discountfactors.Remark2.1.3Public correlation notation In the repeated game with public correlation, at -period history is a list of t action proles and t realizations of the public cor-relating device, (0, a0; 1, a1; . . . ; t 1, at 1). In period t , as a measurablefunction of the period t history and the period t realization t, a behavior strategyspecies i (Ai). As for games without public correlation, every t -periodhistory induces a subgame that is strategically equivalent to the original game. Inaddition, there are subgames corresponding to the period t realizations t.Rather than explicitly describing the correlating device and the players actionsas a function of its realization, strategy proles are sometimes described by simplyspecifying a correlated action in each period. Such a strategy prole in the repeatedgame with public monitoring species in each period t , as a function of historyht 1 Ht 1, a correlated action prole, that is, a joint distribution over theaction proles iAi. We also denote the reduction of the compound lotteryinduced by the public correlating device and subsequent individual randomizationby . The precise meaning will be clear from context.Remark2.1.4Common discount factors With the exception of the discussion of reputations inchapter 16, we assume that long-lived players share a common discount factor .This assumption is substantive. Consider the battle of the sexes in gure 2.1.3. Theset of feasible payoffs Fis the convex hull of the set {(3, 1), (0, 0), (1, 3)}. Forany common discount factor [0, 1), the set of feasible repeated-game payoffsis also the convex hull of {(3, 1), (0, 0), (1, 3)}. Suppose, however, players 1 and2 have discount factors 1 and 2 with 1 > 2, so that player 1 is more patientthan player 2. Then any repeated-game strategy that calls for (B, L) to be played22 Chapter 2 Perfect MonitoringL RT 0, 0 3, 1B 1, 3 0, 0Figure 2.1.3 Abattle-of-the-sexes game.in periods 0, . . . , T 1 and (T, R) to be played in subsequent periods yields arepeated game vector outside the convex hull of {(3, 1), (0, 0), (1, 3)}, being inparticular above the line segment joining payoffs (3, 1) and (1, 3). This outcomeaverages over the payoffs (3, 1) and (1, 3), but places relatively high player 2payoffs in early periods and relatively high player 1 payoffs in later periods,giving repeated-game payoffs to the two players ofplayer 1: (1 T1 ) +3T1and player 2: 3(1 T2 ) +T2 .Because 1 > 2, each players convex combination is pushed in the direction ofthe outcome that is relatively lucrative for that player. This arrangement capitalizeson the differences in the two players discount factors, with the impatient player 2essentially borrowing payoffs fromthe more patient player 1 in early periods to berepaid in later periods, to expand the set of feasible repeated-game payoffs beyondthose of the stage game. Lehrer and Pauzner (1999) examine repeated games withdiffering discount factors.2.1.4 Subgame-Perfect Equilibriumof the Repeated GameAs usual, a Nash equilibriumis a strategy prole in which each player is best respondingto the strategies of the other players:Definition2.1.2The strategy prole is a Nash equilibrium of the repeated game if for all playersi and strategies

i,Ui() Ui(

i, i).We have the following formalization of the discussion in section 2.1.1 andremark 2.1.2 on minmax utilities:Lemma2.1.1If is a pure-strategy Nash equilibrium, then for all i, Ui() vpi . If is a(possibly mixed) Nash equilibrium, then for all i, Ui() vi.Proof Consider a Nash equilibrium. Player i can always play the strategy that speciesa best reply to i(ht) after every history ht. In each period, is payoff is thus atleast vpi if i is pure (vi if i is mixed), and so is payoff in the equilibriummust be at least vpi (vi, respectively).2.1 The Canonical Repeated Game 23We frequently make implicit use of the observation that a strategy of the repeatedgame with public correlation is a Nash equilibrium if and only if for almost allrealizations of the public correlating device, the induced strategy prole is a Nashequilibrium.Ingames witha nontrivial dynamic structure, Nashequilibriumis toopermissivethere are Nash equilibrium outcomes that violate basic notions of optimality byspecifying irrational behavior at out-of-equilibrium information sets. Similar consid-erations arise from the dynamic structure of a repeated game, even if actions arechosen simultaneously in the stage game. Consider a Nash equilibrium of an innitelyrepeated game with perfect monitoring. Associated with each history that cannot occurin equilibrium is a subgame. The notion of a Nash equilibrium imposes no optimalityconditions in these subgames, opening the door to violations of sequential rationality.Subgame perfection strengthens Nash equilibrium by imposing the sequentialrationality requirement that behavior be optimal in all circumstances, both those thatarise in equilibrium (as required by Nash equilibrium) and those that arise out of equi-librium. In nite horizon games of perfect information, such sequential rationality isconveniently captured by requiring backward induction. We cannot appeal to back-ward induction in an innitely repeated game, which has no last period. We insteadappeal to the underlying denition of sequential rationality as requiring equilibriumbehavior in every subgame, where we exploit the strategic equivalence of the repeatedgame and the continuation game induced by history ht.Definition2.1.3A strategy prole is a subgame-perfect equilibrium of the repeated game if forall histories ht H , |ht is a Nash equilibrium of the repeated game.The existence of subgame-perfect equilibria in a repeated game is immediate:Any prole of strategies that induces the same Nash equilibrium of the stage gameafter every history of the repeated game is a subgame-perfect equilibrium of the latter.For example, strategies that specify shirking after every history are a subgame-perfectequilibriumof the repeated prisoners dilemma, as are strategies that specify loweffortand the low-priced choice in every period (and after every history) of the product-choice game. If the stage game has more than one Nash equilibrium, strategies thatassign any stage-game Nash equilibrium to each period t , independently of the historyleading to period t , constitute a subgame-perfect equilibrium. Playing ones part of aNash equilibrium is always a best response in the stage game, and hence, as long asfuture play is independent of current actions, doing so is a best response in each periodof a repeated game, regardless of the history of play.Although the notion of subgame perfection is intuitively appealing, it raises somepotentially formidable technical difculties. In principle, checking for subgame per-fection involves checking whether an innite number of strategy proles are Nashequilibriathe set H of histories is countably innite even if the stage-game actionspaces are nite. Moreover, checking whether a prole is a Nash equilibriuminvolveschecking that player is strategy i is no worse than an innite number of potentialdeviations (because player i could deviate in any period, or indeed in any combinationof periods). The following sections show that we can simplify this task immensely,rst by limiting the number of alternative strategies that must be examined, then byorganizing the subgames that must be checked for Nash equilibria into equivalence24 Chapter 2 Perfect Monitoringclasses, and nally by identifying a simple constructive method for characterizingequilibrium payoffs.2.2 The One-Shot Deviation PrincipleThis section describes a critical insight from dynamic programming that allows us torestrict attention to a simple class of deviations when checking for subgame perfection.A one-shot deviation for player i from strategy i is a strategy i = i with theproperty that there exists a unique history ht H such that for all h= ht,i(h) = i(h).Under public correlation, the history htincludes the period t realization of the publiccorrelating device. The strategy i plays identically to strategy i in every period otherthan t and plays identically in period t if the latter is reached with some history otherthan ht. Aone-shot deviation thus agrees with the original strategy everywhere exceptat one history where the one-shot deviation occurs. However, a one-shot deviation canhave a substantial effect on the resulting outcome.Example2.2.1Consider the grim trigger strategy prole in the innitely repeated prisonersdilemma of section 1.2. The equilibrium outcome when two players each choosegrim trigger is that both players exert effort in every period. Now consider theone-shot deviation 1 under which 1 plays as in grim trigger, with the exceptionof shirking in period 4 if there has been no previous shirking, that is, with theexception of shirking after the history (EE, EE, EE, EE). The deviating strategyshirks in every period after period 4, as does grim trigger, and hence we havean outcome that differs from the mutual play of grim trigger in innitely manyperiods. However, once the deviation has occurred, it is a prescription of grimtrigger that one shirk thereafter. The only deviation from the original strategyhence occurs after the history (EE, EE, EE, EE).Definition2.2.1Fix a prole of opponents strategies i. A one-shot deviation i from strategyi is protable if, at the history htfor which i(ht) = i(ht),Ui( i|ht, i|ht) > Ui(|ht).Notice that protability of i is dened conditional on the history htbeing reached,though htmay not be reached in equilibrium. Hence, a Nash equilibrium can haveprotable one-shot deviations.Example2.2.2Consider again the prisoners dilemma. Suppose that strategies 1 and 2 bothspecify effort in the rst period and effort as long as there has been no previousshirking, with any shirking prompting players to alternate between 10 periods ofshirking and 1 of effort, regardless of any subsequent actions. For sufciently2.2 The One-Shot Deviation Principle 25large discount factors, these strategies constitute a Nash equilibrium, induc-ing an outcome of mutual effort in every period. However, there are protableone-shot deviations. In particular, consider a history htfeaturing mutual effortin every period except t 11, at which point one player shirked, and periodst 10, . . . , t 1, in which both players shirked. The equilibrium strategy callsfor both players to exert effort in period t , and then continue alternating 10 periodsof shirking with a period of effort. A protable one-shot deviation for player 1is to shirk after history ht, otherwise adhering to the equilibrium strategy. Thereare other protable one-shot deviations, as well as protable deviations that alterplay after more than just a single history. However, all of these deviations increaseprots only after histories that do not occur along the equilibrium path, and hencenone of them increases equilibrium prots or vitiates the fact that the proposedstrategies are a Nash equilibrium.Proposition2.2.1The one-shot deviation principle A strategy prole is subgame perfect if andonly if there are no protable one-shot deviations.To conrm that a strategy prole is a subgame-perfect equilibrium, we thus needonly consider alternative strategies that deviate fromthe action proposed by once andthen return to the prescriptions of the equilibrium strategy. As our prisoners dilemmaexample illustrates, this does not imply that the path of generated actions will differfrom the equilibrium strategies in only one period. The deviation prompts a differenthistory than does the equilibrium, and the equilibrium strategies may respond to thishistory by making different subsequent prescriptions.The importance of the one-shot deviation principle lies in the implied reductionin the space of deviations that need to be considered. In particular, we do not haveto worry about alternative strategies that might deviate from the equilibrium strategyin period t , and then again in period t

> t , and again in period t

> t

, and so on.For example, we need not consider a strategy that deviates from grim trigger in theprisoners dilemma by shirking in period 0, and then deviates from the equilibriumpath (now featuring mutual shirking) in period 3, and perhaps again in period 6, andso on. Although this is obvious when examining such simple candidate equilibria inthe prisoners dilemma, it is less clear in general.Proof We give the proof only for pure-strategy equilibria in the game without publiccorrelation. The extensions to mixed strategies and public correlation, thoughconceptually identical, are notationally cumbersome.If a prole is subgame perfect, then clearly there can be no protable one-shotdeviations.Conversely, we suppose that a prole is not subgame perfect and show theremust then be a protable one-shot deviation. Because the prole is not subgameperfect, there exists a history ht, player i, and a strategy i, such thatUi(i|ht, i|ht) < Ui( i, i|ht).26 Chapter 2 Perfect MonitoringLet = Ui( i, i|ht) Ui(i|ht, i|ht). Let m = mini,a ui(a) and M =maxi,a ui(a). Let T be large enough that T(M m) < /2. Then,(1 )T 1=0ui(a(i|ht, i|ht)) +(1 )=Tui(a(i|ht, i|ht))= (1 )T 1=0ui(a( i, i|ht)) +(1 )=Tui(a( i, i|ht)) ,so(1 )T 1=0ui(a(i|ht, i|ht)) < (1 )T 1=0ui(a( i, i|ht)) 2,(2.2.1)because T(M m) < /2 ensures that regardless of how the deviation in ques-tion affects play in period t +T and beyond, these variations in play have aneffect on player is period t continuation payoff of strictly less than /2. This inturn implies that the strategy i, dened by i(h) = i(h), if < T,i|ht(h), if T,= i(h), if < T,i(hth), if T,is a protable deviation. In particular, strategy i agrees with i over the rst Tperiods, and hence captures the payoff gains of /2 promised by (2.2.1).The strategy only differs from i|ht in the rst T periods. We have thusshown that if an equilibrium is not subgame perfect, there must be a protableT period deviation. The proof is now completed by arguing recursively on thevalue of T . Let hT 1 ( a0, . . . , aT 2) denote the T 1 period history inducedby ( i, i|ht). There are two possibilities:1. Suppose Ui(i|ht hT 1, i|ht hT 1) < Ui( i|hT 1, i|ht hT 1). In this case, wehave a protable one-shot deviation, after the history ht hT 1(note that i|hT 1agrees with i in period T and every period after T ).2. Alternatively, suppose Ui(i|ht hT 1, i|ht hT 1) Ui( i|hT 1, i|ht hT 1). Inthis case, we dene a new strategy, i as follows: i(h) = i(h), if < T 1,i|ht(h), if T 1.2.2 The One-Shot Deviation Principle 27Now,Ui( i|hT 2, i|ht hT 2) = (1 )ui( aT 1) +Ui( i|hT 1, i|ht hT 1) (1 )ui( aT 1) +Ui(i|ht hT 1, i|ht hT 1)= Ui( i|hT 2, i|ht hT 2),which impliesUi( i, i|ht) Ui( i, i|ht),and so i is a protable deviation at htthat only differs from i|ht in the rstT 1 periods.Proceeding in this way, we must nd a protable one-shot deviation.Akey step in the proof is the observation that because payoffs are discounted, anystrategythat offers a higher payoff thananequilibriumstrategymust dosowithina nitenumber of periods. Abackward induction argument then allows us to showthat if thereis a protable deviation, there is a protable one-shot deviation. Fudenberg and Tirole(1991, section 4.2) showthe one-shot deviation principle holds for a more general classof games with perfect monitoring, those with payoffs that are continuous at innity (acondition that essentially requires that actions in the far future have a negligible impacton current payoffs). In addition, the principle holds for sequential equilibria in any niteextensive form game (Osborne and Rubinstein 1994, exercise 227.1), as well as forperfect public equilibria of repeated games with public monitoring (proposition 7.1.1)and sequential equilibria of private-monitoring games with no observable deviations(proposition 12.2.2).Suppose we have a Nash equilibrium that is not subgame perfect. Then, fromproposition 2.2.1, there must be a protable one-shot deviation from the strategy pro-le . However, because is a Nash equilibrium, no deviation can increase eitherplayers equilibrium payoff. The protable one-shot deviation must then occur aftera history that is not reached in the course of the Nash equilibrium. Example 2.2.2provided an illustration.In light of this last observation, do we have a corresponding one-shot deviationprinciple for Nash equilibria? Is a strategy prole a Nash equilibrium if and onlyif there are no one-shot deviations whose differences from occur after histories thatarise along the equilibrium path? The answer is no. It is immediate from the denitionof Nash equilibrium that there can be no protable one-shot deviations along theequilibrium path. However, their absence does not sufce for Nash equilibrium, as wenow show.Example2.2.3Consider the prisoners dilemma, but with payoffs given in gure 2.2.1.9Considerthe strategy prole in which both players play tit-for-tat, exerting effort in the rstperiod and thereafter mimicking in each period the action chosen by the opponent9. With the payoffs of gure 2.2.1, the incentives to shirk are independent of the action of thepartner, and so the set of discount factors for which tit-for-tat is a Nash equilibrium coincideswith the set for which there are no protable one-shot deviations on histories that appear alongthe equilibrium path.28 Chapter 2 Perfect MonitoringE SE 3, 3 1, 4S 4, 1 1, 1Figure 2.2.1 The prisoners dilemma with incentivesto shirk that depend on the opponents action.in the previous period. The induced outcome is mutual effort in every period,yielding an equilibrium payoff of 3. To ensure that there are no protable one-shot deviations whose differences appear after equilibrium histories, we needonly consider a strategy for player 1 that shirks in the rst period and otherwiseplays as does tit-for-tat. Such a strategy induces a cyclic outcome of the formSE, ES, SE, ES, . . . , for a payoff of(1 )

4(1 +2+4+ ) 1( +3+5+ )

= 4 1 +.There are then no protable one-shot deviations whose differences from theequilibrium strategy appear after equilibrium histories if and only if 14.However, when = 1/4, the most attractive deviation fromtit-for-tat in this gameis perpetual shirking, which is not a one-shot deviation. For this deviation to beunprotable, it must be that3 (1 )4 + = 4 3,and hence 13.For [1/4, 1/3) tit-for-tat is thus not a Nash equilibrium, despite the absenceof protable one-shot deviations that differ from tit-for-tat only after equilibriumhistories.What goes wrongif we mimic the proof of proposition2.2.1inaneffort toshowthatif there are no protable one-shot deviations from equilibrium histories, then we havea Nash equilibrium? Proceeding again with the contrapositive, we would begin witha strategy prole that is not a Nash equilibrium. A protable deviation may involve adeviation on the equilibriumpath, as well as subsequent deviations off-the-equilibriumpath. Beginning with a protable deviation, and following the argument of the proofof proposition 2.2.1, we nd a protable one-shot deviation. The difculty is that thisone-shot deviation may occur off the equilibrium path. Although this is immaterial forsubgame perfection, this difculty scuttles the relationship between Nash equilibriumand protable one-shot deviations along the equilibrium path.2.3 Automaton Representations 292.3 Automaton Representations of Strategy ProlesThe one-shot deviation principle simplies the set of alternative strategies we mustcheck when evaluating subgame perfection. However, there still remains a potentiallydaunting number of histories to be evaluated. This evaluation can often be simpliedby grouping histories into equivalence classes, where each member of an equivalenceclass induces an identical continuation strategy. We achieve this grouping by represent-ing repeated-game strategies as automata, where the states of the automata representequivalence classes of histories.An automaton (or machine) (W , w0, f, ) consists of a set of states W , an initialstate w0 W , an output (or decision) function f : W i(Ai) associating mixedaction proles with states, and a transition function, : W A W . The transitionfunctionidenties the next state of the automaton, givenits current state andthe realizedstage-game pure action prole.If the function f species a pure output at state w, we write f (w) for the resultingaction prole. If a mixture is specied by f at w, fw(a) denotes the probabilityattached to prole a, so thataAfw(a) = 1 (recall that we only consider mixturesover nite action spaces, see remark 2.1.1). We emphasize that even if two automataonly differ in their initial state, they nonetheless are different automata.Any automaton (W , w0, f, ) with f specifying a pure action at every stateinduces an outcome {a0, a1, . . .} as follows:a0= () = f (w0),a1= (a0) = f ((w0, a0)),a2= (a0, a1) = f (((w0, a0), a1)),...We extend this to identify the strategy induced by an automaton. First, extend thetransition function fromthe domain W Ato the domain W H \{} by recursivelydening(w, ht) = ((w, ht 1), at 1).With this denition, we have the strategy described by () = f (w0) and(ht) = f ((w0, ht)).Similarly, an automaton for which f sometimes species mixed actions induces a pathof play and a strategy.Conversely, it is straightforward that any strategy prole can be represented byan automaton. Take the set of histories H as the set of states, the null history as the initial state, f (ht) = (ht), and (ht, a) = ht +1, where ht +1 (ht, a) is theconcatenation of the history htwith the action prole a.This representation leaves us in the position of working with the full set of histo-ries H . However, strategy proles can often be represented by automata with nitesets W . The set W is then a partition on H , grouping together those histories thatprompt identical continuation strategies.30 Chapter 2 Perfect MonitoringWe say that a state w

W is accessible from another state w W if there existsa sequence of action proles such that beginning at w, the automaton eventuallyreaches w

. More formally, there exists htsuch that w

= (w, ht). Accessibility isnot symmetric. Consequently, in an automaton (W , w0, f, ), even if every state in Wis accessible from the initial state w0, this may not be true if some other state replacedw0as the initial state (see example 2.3.1).Remark2.3.1Individual automata For most of parts I and II, it is sufcient to use a singleautomaton to represent strategy proles. We can also represent a single strategyi by an automaton (Wi, w0i , fi, i).