Game Theory Lecture 7: Repeated Games and Equilibria

EC941 - Game Theory

Prof. Francesco Squintani

Email: f.squintani@warwick.ac.uk

Lecture 7

Structure of the Lecture



Infinitely Repeated Games



Nash and Subgame-Perfect Equilibrium



Finitely Repeated Games

Repeated Games



Repeated games are a special class of interactions,

Repeated games are a special class of interactions,

represented as extensive form games.

represented as extensive form games.



A simultaneous move game, represented as a normal

A simultaneous move game, represented as a normal

form game, is repeated over time.

form game, is repeated over time.



This yields to enlarging the set of equilibria, if players

This yields to enlarging the set of equilibria, if players

are sufficiently patient. For example, cooperation is a

are sufficiently patient. For example, cooperation is a

subgame perfect equilibrium in the prisoner’s dilemma.

subgame perfect equilibrium in the prisoner’s dilemma.

Definition

Let

G = (N, A, u)

be a strategic game. Let T

be finite or infinite. The

T-repeated game

of

for the

discount factor

δ

is the extensive game in which:



the set of players is



the set of terminal histories is the set of infinite

sequences (

, . . .) of action profiles in



the player function assigns the set of all players to every

proper sub-history of every terminal history



the set of actions of player

after any history is



 each player

evaluates each terminal history (

, . . .)

according to its discounted average

(1

−



) ∑

t=



t−

).

Repeated Prisoner Dilemma



Suppose that the following game is infinitely repeated

with discount factor



 2, 2

3, 0

1, 1

0, 3

Strategies



A player’s strategy in an extensive game specifies her

action after all possible histories after which it is her

turn to move.



A strategy of player

in an infinitely repeated game of

the strategic game

specifies an action of player

(a

member of

) for every sequence (

, …,

) of

outcomes of

Grim Trigger Strategy



Consider the repeated prisoner’s dilemma.



The strategy prescribes that the player initially

cooperates, and continues to do so if

both

players

cooperated at all previous times.

, . . . ,

) =

if

= (

C,C)

for some

= 1, . . . ,

T.

, . . . ,

) =

otherwise.



Note that a player defects if either she or her opponent

defected in the past.

Automaton Representation

An automaton for player i is (X, x

 , f, g ).

1.

X is a set of states.

2.

is the initial state of the automaton.

3.

f : X x A           X is the transition across states, as a

function of the play.

4.

g: X           A

is the play output at each state.

The automaton of the Grim Trigger Strategy is as follows:



There are two

states: C

 in which

is chosen, and

, in

which

is chosen.



The initial state is

C.



If the play is not (C,C) in any period then the state

changes to

D.



If the automaton is in state D, there it remains forever.

    *    C

(C,D)

(D,C)

(D,D)

(C,C)

Tit for Tat



The player initially cooperates.



At subsequent rounds, she plays the strategy played by

the opponent at the previous round.

, . . . ,

) =

if

or

 T=1.

, . . . ,

) =

if

    *    C

( . ,D)

( . ,C)

( . ,C)

( . ,D)

Grim Trigger Nash Equilibrium



Suppose that player j adopt the grim trigger strategy.



If player i plays grim trigger, then the outcome is (

in every period with payoffs (2, 2, . . .).

    The discounted average is 2.



If i deviates from the grim trigger strategy, then there is

one period (at least) in which she chooses



All subsequent periods player j chooses

. So the best

deviation for player i is choosing

in every subsequent

period (because

is her unique best response to

).



If i can increase her payoff by deviating then she can do

so by deviating to

in the first period.



She obtains the stream of payoffs (3, 1, 1, . . .) with

discounted average

(1

−



)[3 +







· · ·

] = 3(1

−



) +





Thus player i cannot increase her payoff by deviating if

and only if

≥

3(1

−



) +



,    or



≥

1/2.



Hence, if



 ≥ 1/2, then playing the grim trigger strategy

by both players is a Nash equilibrium of the infinitely

repeated Prisoner’s Dilemma.

Tit for Tat Nash Equilibrium



Suppose that player j adopts the tit for tat strategy.



If player i plays tit for tat, then the outcome is (

) in

every period with payoffs (2, 2, . . .).



If i deviates, then there is one period (at least) in which

she chooses



At the subsequent period player j chooses

D.

If player i plays D, she triggers one further D by j.

If i chooses C, then she obtains C.



If i can increase her payoff by deviating, then she can

do so by deviating to D in the first period.



Player i can obtain either the stream (3, 1, 1, 1,…) or

the stream (3, 0, 3, 0, . . .) with discounted average

(1 −



)[3 + 0



+ 3



+ 0



 +…] =

3 (1 −



) ∑

∞

t=0



2t

= 3 (1 −



)/(1 -



) = 3/(1 +





Thus player i cannot increase her payoff by deviating if

and only if

 2 ≥ 3(1 −



) +



and 2 ≥ 3/(1 +



), or



 ≥ 1/2.



Definition

The set of feasible

payoff profiles of a

strategic game is the set of all weighted averages of

payoff profiles in the game.



For any feasible pair (

) of payoffs there is a finite

sequence (

, . . . ,

) of outcomes for which each player

i’s average payoff is close to x

[u

)+…+ u

)]/k –





 + [u

)+…+ u

)]/k.

Nash Folk Theorem

in the Prisoner Dilemma



The discounted average payoff is as close as possible to

 when taking the discount factor close enough to 1:

(1

−



)∑

∞

t=1



t-1

) –





 + (1

−



)∑

∞

t=1



t-1

).



Consider the feasible payoff pair (

), and the

outcome path b that consists of repetitions of the

sequence (

, . . . ,

):

nk+l

 for l = 1,…,k.



Consider the strategy

(h

, . . . , h

T-1

) =

 if h

 for t = 1, . . . , T − 1

(h

, . . . , h

T-1

) =

 otherwise.



As long as

> u

D,D

) and

> u

D,D

), this “grim

trigger” strategy is a Nash Equilibrium.



We conclude that any

feasible payoff pair

feasible payoff pair

) such

that

> u

D,D

) and

> u

D,D

) is a Nash

Equilibrium payoff of the Prisoner’s Dilemma game.

Nash Folk Theorem



Consider a one-shot game. Suppose that each player i

can guarantee herself a “minimum” payoff



We will show that every feasible payoff profile

 such

that

> m

 can be achieved as the discounted average

payoff profile of a Nash equilibrium in the infinitely

repeated game, when



 is close to 1.



This payoff can be achieved with strategies similar to

grim trigger strategies. Deviation from path by player i

is punished by minimizing i’s payoff forever.



For the Prisoner’s Dilemma

the minimum payoff of

player

 supported by a Nash equilibrium is

).



Player

can ensure (by choosing

) that player

’s

payoff does not exceed

), and there is no lower

payoff with this property.



Hence,

) is the lowest payoff that player

can

force upon player



What is this minimum payoff for player

 in an arbitrary

game?



For any collection



−i

of the other players’ mixed

actions, player

’s highest possible payoff is

max



−i

).

∈



As



−i

changes, this maximal payoff changes.

    The collection



−i

of “punishments” that make

    this maximum as small as possible is the solution of

min        max



−i

).



−i

∈



(A

-i

)}

∈



This payoff is known as player

’s

minmax

payoff.

Theorem

(Nash Folk Theorem).

Let G be a strategic

game. Let w be a feasible payoff profile of G for which each player’s

payoff exceeds her minmax payoff. Then, for all



0,

there exists

δ <

such that if the discount factor exceeds δ, then the

infinitely repeated game of G has a Nash equilibrium whose

discounted average payoff profile w’ satisfies |w’ − w| <





For any discount factor δ with 0 < δ < 1, the discounted

average payoff of every player in any Nash equilibrium of

the infinitely repeated game of G is at least her minmax

payoff.



Let

be the payoff profile induced by the actions

. By

hypothesis, each

exceeds player

’s minmax payoff.



For each player

, let



-i

be a profile of mixed actions

for the players other than

that holds player

down to

her minmax payoff.



Define each player

’s strategy as follows.

    In each period, play

as long as the play was

 in every

    previous period.

   Otherwise play (



-j

, where

is the player who

    deviated in the first period in which the play was not



Let

∗

be the set of histories in which there is a period

in which exactly one player

chose an action different

from a



Refer to j as a

lone deviant



The strategy of player

is defined as follows:

∅

) =

) =

if

is not in

∗

) = (



-j

if

∈

∗

and

is the first lone deviant in



We now show that the profile

is a Nash equilibrium.



If each player

adheres to

then her payoff is x

in

every period.



If player

deviates from

, then she may gain in the

period in which she deviates, but she loses in every

subsequent period, obtaining at most her minmax

payoff, rather than x



Thus for a discount factor close enough to 1, s

is a best

response to s

-i

for every player

Subgame Perfect Equilibrium

Theorem

 (One-Shot Deviation Property of subgame

perfect equilibria of infinitely repeated games)

A strategy profile in an infinitely repeated game is a subgame

perfect equilibrium if and only if no player can gain by changing

her action after any history, given both the strategies of the other

players and the remainder of her own strategy.



The One-Shot Deviation Principle is deceptively simple.

    Its application is often not straightforward.



First, it requires that players cannot gain by deviating once,

in

any history of play.



But there are infinite many histories... So they cannot be

checked one by one.



They must grouped according to the prescriptions of the

strategy profile we are considering.



Second, the one-shot deviation may

change future play

according to the strategy that we are considering.

    The deviation is one shot from a strategy, not from a play.

Grim Trigger Strategy



Consider the repeated prisoner’s dilemma.



The strategy prescribes that the player initially cooperates,

and continues to do so if

both

players cooperated at all

previous times. Otherwise, they should defect forever.

, . . . ,

) =

if

= (

C,C)

for some

= 1, . . . ,

T.

, . . . ,

) =

otherwise.

Grim Trigger SPE



Suppose that both players adopt the grim trigger

strategy.



There are two “groups” of histories. Those for which

grim trigger strategy prescribes that the players play

(C,C) and those for which the grim trigger strategy

prescribes that they play (D,D).



In the first set of histories, if player i plays grim trigger,

then the outcome is (

) in every period with payoffs

(2, 2, . . .), whose discounted average is 2.



If i deviates only once, she plays D. Then she reverts to

the grim trigger strategy, that prescribes to play D at all

subsequent periods.



The opponent, playing grim trigger strategy, plays D

forever as a consequence of i’s one-shot deviation.



The OSD yields the stream of payoffs (3, 1, 1, . . .) with

discounted average

(1

−



)[3 +







· · ·

] = 3(1

−



) +





Thus player i cannot increase her payoff by deviating if

and only if

≥

3(1

−



) +



, or



≥

1/2.



In the second set of histories, if player i plays grim

trigger, then the outcome is (

) in every period with

payoffs (1, 1, . . .), whose discounted average is 1.



If I deviates only once, she plays C. Then she reverts to

the grim trigger strategy, that prescribes to play D at all

subsequent periods.



The opponent, playing grim trigger strategy, plays D

forever as a consequence of i’s one-shot deviation.



The OSD yields the stream of payoffs (0, 1, 1, . . .)

with discounted average

(1

−



)[0 +







· · ·

] =





Player i cannot increase her payoff by deviating: 1

≥





We conclude that if



 ≥ 1/2, then the strategy pair in

which each player’s strategy is the grim trigger strategy

is a Subgame-Perfect Equilibrium of the infinitely

repeated Prisoner’s Dilemma.

SPE Folk Theorem

Theorem

 (Simplified Subgame Perfect Folk Theorem for

Two-Player Games)

Let G be a two-player strategic game. Let w

be a feasible payoff profile of G for which each player’s payoff exceeds

her (pure-strategy) minmax payoff. Then for all



there exists

δ <

such that if the discount factor exceeds δ then the infinitely

repeated game of G has a subgame perfect equilibrium whose

discounted average payoff profile w satisfies|w’ − w| <





Take an outcome

 such that both players’ discounted

payoffs exceed their pure-strategy minmax payoffs.



Let

be an action of player

that holds player

down

to her minmax payoff, and let

= (

).



If the minmax profile

is a Nash Equilibrium of the

stage game, then consider a modified grim strategy such

that both players play the sequence

 at any time t; and

that, if either player deviates,

 is played for ever.



Because both players’ discounted payoffs for

exceed

their minmax payoffs, if the discount factor



is

sufficiently close to one, the players will obey to the

modified grim trigger strategy, yielding the outcome

a.



If p is not a Nash Equilibrium, the proof is as follows.



Let

be a strategy of player

that starts off choosing

i,0

, and continues to choose

i,t

so long as the previous

outcome was



; otherwise, it chooses the action

that

holds player

to her minmax payoff.



Once punishment begins, it continues for

periods, as

long as both players choose their punishment actions,

and then players revert to

a.



If any player j deviates from the assigned punishment

action, then the punishments are re-started, and player j

is now punished.



To prove that (

) is a subgame perfect equilibrium,

we now find

δ’

and

δ’)

 such that if

δ > δ’

then the

strategy pair (

) is a subgame perfect equilibrium of

the infinitely repeated game.



Suppose that player

adheres to



If player

adheres to

in any history with no

deviations, then her discounted average payoff is

a).



If she deviates, she obtains at most her maximal payoff

in the game, say

, in the period of her deviation, then

) for

periods, and subsequently

) in the future.



Her discounted payoff from the deviation is at most

(1

− δ

)[

δu

)+

· · ·

δ

)] +

δ

k+1

= (1

− δ

δ

(1-

δ

)u

)+

δ

k+1

).



Hence, she does not deviate if

≥

(1

− δ

δ

(1-

δ

)u

)+

δ

k+1

).



If player

adheres to

in any history where the players play

, she gets

) for at most

periods, then

) in every

subsequent period.



This yields a discounted payoff of (1

− δ

) +

δ

).



Note that

her minmax payoff, and

) >



If she deviates from

, she obtains at most her minmax

payoff in the period of her deviation, then

) for

periods, then

) in the future.



This yields a discounted average payoff of at most

(1

− δ

δ

(1

− δ

) +

δ

k+1

).



She does not deviate if (1

− δ

) +

δ

≥

(1

− δ

δ

(1

− δ

) +

δ

k+1

) or (1

− δ

) +

δ

≥ m



For each value of

δ

sufficiently close to 1 we can find

δ

such that (

δ

δ

)) satisfies the 2 no-deviation inequalities:

≥

(1

− δ

δ

(1-

δ

)u

)+

δ

k+1

),

(1

− δ

) +

δ

≥ m

Finitely Repeated Games



Consider any Subgame Perfect Equilibrium of a finitely

repeated game.



In the final stage, a Nash Equilibrium of the stage game

must be played.



Hence, the set of Equilibria is enlarged only if there are

multiple equilibria in the stage game.



Otherwise, the unique Subgame Perfect Equilibrium  of

the repeated game is the unique Nash Equilibrium of

the stage game.

Prisoner’s Dilemma



The following game is repeated for T periods



Proceeding by backward induction, in the last period,

the unique Nash equilibrium is (D,D).

 2, 2

3, 0

1, 1

0, 3



Because in the last period players play (D,D) regardless

of the previous play, in the second to last period future

payoffs do not depend on current play.



It is as if players were playing the following game.



The unique Nash Equilibrium is (D,D).



Proceeding by backward induction, the unique

subgame-perfect equilibrium is (D,D) in every period.

 2, 2

3, 0

1, 1

0, 3

Expanded Prisoner’s Dilemma

The following game is repeated for T periods

There are 2 Nash Equilibria: (D, D) and (E,E).

 2, 2

3, 0

1, 1

0, 3

 -2,-2

 -2,-2

 -2,-2

 -2,-2

 -1,-1

Expanded Prisoner’s Dilemma



In period T, a Nash Equilibrium is played, either (D, D),

with payoffs (1, 1), or (E, E), with payoffs (-1, -1).



We construct a Subgame Perfect Equilibrium as follows.

1.

In period T, the profile (D, D) is played.

2.

In all periods t = 1, …, T-1, the profile (C, C) is played

with payoffs (2, 2).

3.

If either player deviates to D, then the future play

switches to (E, E) forever.



This is a SPE if and only if players do not have an

incentive to deviate at the period before the last.



In fact, the punishment (E, E) forever is more severe if

there are more periods left to play.



Each player must prefer to play C with payoff 2 +



than to play D, with payoff 3 –





Hence, the strategies are a SPE if and only if:

3 -

δ

 2 + δ

, i.e.

δ

 1/2.

Summary of the Lecture



Infinitely Repeated Games



Nash and Subgame-Perfect Equilibrium



Finitely Repeated Games

Preview of the Next Lecture



Coalitional Games and the Core



Ownership and the Distribution of Wealth



Horse Trading and House Exchanges



Voting and Matching

Slide Note

Embed Share

Download

Exploring the concept of repeated games in game theory, this lecture covers infinitely repeated games, Nash and subgame-perfect equilibria, strategies, a Grim Trigger Strategy in the Prisoner's Dilemma, and automaton representation. Understanding how interactions evolve over time offers insights into cooperation, equilibrium, and decision-making strategies in various game scenarios.

dery519 Follow

Uploaded on Sep 10, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

EC941 - Game Theory Lecture 7 Prof. Francesco Squintani Email: f.squintani@warwick.ac.uk 1

Structure of the Lecture Infinitely Repeated Games Nash and Subgame-Perfect Equilibrium Finitely Repeated Games 2

Repeated Games Repeated games are a special class of interactions, represented as extensive form games. A simultaneous move game, represented as a normal form game, is repeated over time. This yields to enlarging the set of equilibria, if players are sufficiently patient. For example, cooperation is a subgame perfect equilibrium in the prisoner s dilemma. 3

Definition Let G = (N, A, u) be a strategic game. Let T be finite or infinite. The T-repeated game of G for the discount factor is the extensive game in which: the set of players is N the set of terminal histories is the set of infinite sequences (a1, a2, . . .) of action profiles in G the player function assigns the set of all players to every proper sub-history of every terminal history the set of actions of player i after any history is Ai each player i evaluates each terminal history (a1, a2, . . .) according to its discounted average (1 ) Tt=1 t 1 ui (at). 4

Repeated Prisoner Dilemma Suppose that the following game is infinitely repeated with discount factor C D C 0, 3 2, 2 D 3, 0 1, 1 5

Strategies A player s strategy in an extensive game specifies her action after all possible histories after which it is her turn to move. A strategy of player i in an infinitely repeated game of the strategic game G specifies an action of player i (a member of Ai) for every sequence (a1, , aT) of outcomes of G. 6

Grim Trigger Strategy Consider the repeated prisoner s dilemma. The strategy prescribes that the player initially cooperates, and continues to do so if both players cooperated at all previous times. si(a1, . . . , aT) = D if at= (C,C) for some t = 1, . . . , T. si(a1, . . . , aT) = C otherwise. Note that a player defects if either she or her opponent defected in the past. 7

Automaton Representation An automaton for player i is (X, x0 , f, g ). X is a set of states. x0 is the initial state of the automaton. f : X x A X is the transition across states, as a function of the play. g: X Ai is the play output at each state. 1. 2. 3. 4. 8

The automaton of the Grim Trigger Strategy is as follows: There are two states: C in which C is chosen, and D, in which D is chosen. The initial state is C. If the play is not (C,C) in any period then the state changes to D. If the automaton is in state D, there it remains forever. (C,D) (D,C) (D,D) (C,C) D * C 9

Tit for Tat The player initially cooperates. At subsequent rounds, she plays the strategy played by the opponent at the previous round. si(a1, . . . , aT) = C if aTj= C or T=1. si(a1, . . . , aT) = D if aTj= D ( . ,D) ( . ,C) D * C ( . ,C) ( . ,D) 10

Grim Trigger Nash Equilibrium Suppose that player j adopt the grim trigger strategy. If player i plays grim trigger, then the outcome is (C, C) in every period with payoffs (2, 2, . . .). The discounted average is 2. If i deviates from the grim trigger strategy, then there is one period (at least) in which she chooses D. All subsequent periods player j chooses D. So the best deviation for player i is choosing D in every subsequent period (because D is her unique best response to D). 11

If i can increase her payoff by deviating then she can do so by deviating to D in the first period. She obtains the stream of payoffs (3, 1, 1, . . .) with discounted average (1 )[3 + + 2 + 3 + ] = 3(1 ) + . Thus player i cannot increase her payoff by deviating if and only if 2 3(1 ) + , or 1/2. Hence, if 1/2, then playing the grim trigger strategy by both players is a Nash equilibrium of the infinitely repeated Prisoner s Dilemma. 12

Tit for Tat Nash Equilibrium Suppose that player j adopts the tit for tat strategy. If player i plays tit for tat, then the outcome is (C, C) in every period with payoffs (2, 2, . . .). If i deviates, then there is one period (at least) in which she chooses D. At the subsequent period player j chooses D. If player i plays D, she triggers one further D by j. If i chooses C, then she obtains C. 13

If i can increase her payoff by deviating, then she can do so by deviating to D in the first period. Player i can obtain either the stream (3, 1, 1, 1, ) or the stream (3, 0, 3, 0, . . .) with discounted average (1 )[3 + 0 + 3 2 + 0 3+ ] = 3 (1 ) t=0 2t= 3 (1 )/(1 - 2) = 3/(1 + ) Thus player i cannot increase her payoff by deviating if and only if 2 3(1 ) + and 2 3/(1 + ), or 1/2. 14

Nash Folk Theorem in the Prisoner Dilemma Definition The set of feasiblepayoff profiles of a strategic game is the set of all weighted averages of payoff profiles in the game. For any feasible pair (x1, x2) of payoffs there is a finite sequence (a1, . . . , ak) of outcomes for which each player i s average payoff is close to xi: [ui(a1)+ + ui(ak)]/k 1 < xi < 1 + [ui(a1)+ + ui(ak)]/k. 15

The discounted average payoff is as close as possible to xi when taking the discount factor close enough to 1: (1 ) t=1 t-1 ui(at) 2 < xi < 2 + (1 ) t=1 t-1 ui(at). Consider the feasible payoff pair (x1, x2), and the outcome path b that consists of repetitions of the sequence (a1, . . . , ak): bnk+l = alfor l = 1, ,k. Consider the strategy si (h1, . . . , hT-1) = bT if ht = btfor t = 1, . . . , T 1 si (h1, . . . , hT-1) = D otherwise. 16

As long as x1 > u1 (D,D) and x2 > u2 (D,D), this grim trigger strategy is a Nash Equilibrium. We conclude that any feasible payoff pair (x1, x2) such that x1 > u1 (D,D) and x2 > u2 (D,D) is a Nash Equilibrium payoff of the Prisoner s Dilemma game. 17

Nash Folk Theorem Consider a one-shot game. Suppose that each player i can guarantee herself a minimum payoff mi. We will show that every feasible payoff profile w such that wi > mi can be achieved as the discounted average payoff profile of a Nash equilibrium in the infinitely repeated game, when is close to 1. This payoff can be achieved with strategies similar to grim trigger strategies. Deviation from path by player i is punished by minimizing i s payoff forever. 18

For the Prisoners Dilemma, the minimum payoff of player i supported by a Nash equilibrium is ui(D, D). Player j can ensure (by choosing D) that player i s payoff does not exceed ui(D, D), and there is no lower payoff with this property. Hence, ui(D, D) is the lowest payoff that player j can force upon player i. What is this minimum payoff for player i in an arbitrary game? 19

For any collection iof the other players mixed actions, player i s highest possible payoff is max ui(ai, i). {ai Ai} As ichanges, this maximal payoff changes. The collection iof punishments that make this maximum as small as possible is the solution of min max ui(ai, i). { i (A-i)} {ai Ai} This payoff is known as player i s minmax payoff. 20

Theorem (Nash Folk Theorem). Let G be a strategic game. Let w be a feasible payoff profile of G for which each player s payoff exceeds her minmax payoff. Then, for all > 0, there exists < 1 such that if the discount factor exceeds , then the infinitely repeated game of G has a Nash equilibrium whose discounted average payoff profile w satisfies |w w| < . For any discount factor with 0 < < 1, the discounted average payoff of every player in any Nash equilibrium of the infinitely repeated game of G is at least her minmax payoff. 21

Let x be the payoff profile induced by the actions a. By hypothesis, each xiexceeds player i s minmax payoff. For each player i, let -ibe a profile of mixed actions for the players other than i that holds player i down to her minmax payoff. Define each player i s strategy as follows. In each period, play aias long as the play was a in every previous period. Otherwise play ( -j)i, where j is the player who deviated in the first period in which the play was not a. 22

Let Hbe the set of histories in which there is a period in which exactly one player j chose an action different from aj. Refer to j as a lone deviant. The strategy of player i is defined as follows: si( ) = ai, si(h) = aiif h is not in H , si(h) = ( -j)iif h H and j is the first lone deviant in h. 23

We now show that the profile s is a Nash equilibrium. If each player i adheres to si, then her payoff is xiin every period. If player i deviates from si, then she may gain in the period in which she deviates, but she loses in every subsequent period, obtaining at most her minmax payoff, rather than xi. Thus for a discount factor close enough to 1, siis a best response to s-ifor every player i. 24

Subgame Perfect Equilibrium Theorem (One-Shot Deviation Property of subgame perfect equilibria of infinitely repeated games) A strategy profile in an infinitely repeated game is a subgame perfect equilibrium if and only if no player can gain by changing her action after any history, given both the strategies of the other players and the remainder of her own strategy. 25

The One-Shot Deviation Principle is deceptively simple. Its application is often not straightforward. First, it requires that players cannot gain by deviating once, in any history of play. But there are infinite many histories... So they cannot be checked one by one. They must grouped according to the prescriptions of the strategy profile we are considering. Second, the one-shot deviation may change future play, according to the strategy that we are considering. The deviation is one shot from a strategy, not from a play. 26

Grim Trigger Strategy Consider the repeated prisoner s dilemma. The strategy prescribes that the player initially cooperates, and continues to do so if both players cooperated at all previous times. Otherwise, they should defect forever. si(a1, . . . , aT) = D if at= (C,C) for some t = 1, . . . , T. si(a1, . . . , aT) = C otherwise. 27

Grim Trigger SPE Suppose that both players adopt the grim trigger strategy. There are two groups of histories. Those for which grim trigger strategy prescribes that the players play (C,C) and those for which the grim trigger strategy prescribes that they play (D,D). In the first set of histories, if player i plays grim trigger, then the outcome is (C, C) in every period with payoffs (2, 2, . . .), whose discounted average is 2. 28

If i deviates only once, she plays D. Then she reverts to the grim trigger strategy, that prescribes to play D at all subsequent periods. The opponent, playing grim trigger strategy, plays D forever as a consequence of i s one-shot deviation. The OSD yields the stream of payoffs (3, 1, 1, . . .) with discounted average (1 )[3 + + 2 + 3 + ] = 3(1 ) + . Thus player i cannot increase her payoff by deviating if and only if 2 3(1 ) + , or 1/2. 29

In the second set of histories, if player i plays grim trigger, then the outcome is (D, D) in every period with payoffs (1, 1, . . .), whose discounted average is 1. If I deviates only once, she plays C. Then she reverts to the grim trigger strategy, that prescribes to play D at all subsequent periods. The opponent, playing grim trigger strategy, plays D forever as a consequence of i s one-shot deviation. The OSD yields the stream of payoffs (0, 1, 1, . . .) with discounted average (1 )[0 + + 2 + 3 + ] = . 30

Player i cannot increase her payoff by deviating: 1 . We conclude that if 1/2, then the strategy pair in which each player s strategy is the grim trigger strategy is a Subgame-Perfect Equilibrium of the infinitely repeated Prisoner s Dilemma. 31

SPE Folk Theorem Theorem (Simplified Subgame Perfect Folk Theorem for Two-Player Games) Let G be a two-player strategic game. Let w be a feasible payoff profile of G for which each player s payoff exceeds her (pure-strategy) minmax payoff. Then for all > 0 there exists < 1 such that if the discount factor exceeds then the infinitely repeated game of G has a subgame perfect equilibrium whose discounted average payoff profile w satisfies|w w| < . 32

Take an outcome a such that both players discounted payoffs exceed their pure-strategy minmax payoffs. Let pjbe an action of player i that holds player j down to her minmax payoff, and let p = (p2, p1). If the minmax profile p is a Nash Equilibrium of the stage game, then consider a modified grim strategy such that both players play the sequence at at any time t; and that, if either player deviates, p is played for ever. Because both players discounted payoffs for a exceed their minmax payoffs, if the discount factor is sufficiently close to one, the players will obey to the modified grim trigger strategy, yielding the outcome a. 33

If p is not a Nash Equilibrium, the proof is as follows. Let sibe a strategy of player i that starts off choosing ai,0, and continues to choose ai,tso long as the previous outcome was a ; otherwise, it chooses the action pjthat holds player j to her minmax payoff. Once punishment begins, it continues for k periods, as long as both players choose their punishment actions, and then players revert to a. If any player j deviates from the assigned punishment action, then the punishments are re-started, and player j is now punished. 34

To prove that (s1, s2) is a subgame perfect equilibrium, we now find and k( ) such that if > then the strategy pair (s1, s2) is a subgame perfect equilibrium of the infinitely repeated game. Suppose that player j adheres to sj. If player i adheres to siin any history with no deviations, then her discounted average payoff is ui(a). If she deviates, she obtains at most her maximal payoff in the game, say ui*, in the period of her deviation, then ui(p) for k periods, and subsequently ui(a) in the future. 35

Her discounted payoff from the deviation is at most (1 )[ui*+ ui(p)+ + kui(p)] + k+1ui (a) = (1 )ui*+ (1- k)ui(p)+ k+1ui (a). Hence, she does not deviate if ui(a) (1 )ui*+ (1- k)ui(p)+ k+1ui (a). If player i adheres to siin any history where the players play p, she gets ui(p) for at most k periods, then ui(a) in every subsequent period. This yields a discounted payoff of (1 k)ui(p) + kui(a). Note that ui(p) < mi, her minmax payoff, and ui(a) > mi. 36

If she deviates from si, she obtains at most her minmax payoff in the period of her deviation, then ui(p) for k periods, then ui(a) in the future. This yields a discounted average payoff of at most (1 )mi+ (1 k)ui(p) + k+1ui(a). She does not deviate if (1 k) ui(p) + k ui(a) (1 )mi+ (1 k) ui(p) + k+1ui(a) or (1 k) ui(p) + kui(a) mi. For each value of sufficiently close to 1 we can find k( ) such that ( , k( )) satisfies the 2 no-deviation inequalities: ui(a) (1 )ui*+ (1- k)ui(p)+ k+1ui (a), (1 k) ui(p) + kui(a) mi. 37

Finitely Repeated Games Consider any Subgame Perfect Equilibrium of a finitely repeated game. In the final stage, a Nash Equilibrium of the stage game must be played. Hence, the set of Equilibria is enlarged only if there are multiple equilibria in the stage game. Otherwise, the unique Subgame Perfect Equilibrium of the repeated game is the unique Nash Equilibrium of the stage game. 38

Prisoners Dilemma The following game is repeated for T periods. C D C 0, 3 2, 2 D 3, 0 1, 1 Proceeding by backward induction, in the last period, the unique Nash equilibrium is (D,D). 39

Because in the last period players play (D,D) regardless of the previous play, in the second to last period future payoffs do not depend on current play. It is as if players were playing the following game. C D C 0, 3 2, 2 D 3, 0 1, 1 The unique Nash Equilibrium is (D,D). Proceeding by backward induction, the unique subgame-perfect equilibrium is (D,D) in every period. 40

Expanded Prisoners Dilemma The following game is repeated for T periods. C D E C 0, 3 2, 2 -2,-2 D 3, 0 1, 1 -2,-2 E -2,-2 -2,-2 -1,-1 There are 2 Nash Equilibria: (D, D) and (E,E). 41

Expanded Prisoners Dilemma In period T, a Nash Equilibrium is played, either (D, D), with payoffs (1, 1), or (E, E), with payoffs (-1, -1). We construct a Subgame Perfect Equilibrium as follows. In period T, the profile (D, D) is played. In all periods t = 1, , T-1, the profile (C, C) is played with payoffs (2, 2). If either player deviates to D, then the future play switches to (E, E) forever. 1. 2. 3. 42

This is a SPE if and only if players do not have an incentive to deviate at the period before the last. In fact, the punishment (E, E) forever is more severe if there are more periods left to play. Each player must prefer to play C with payoff 2 + , than to play D, with payoff 3 . Hence, the strategies are a SPE if and only if: 3 - < 2 + , i.e. > 1/2. 43

Summary of the Lecture Infinitely Repeated Games Nash and Subgame-Perfect Equilibrium Finitely Repeated Games 44

Preview of the Next Lecture Coalitional Games and the Core Ownership and the Distribution of Wealth Horse Trading and House Exchanges Voting and Matching 45

Game Theory Lecture 7: Repeated Games and Equilibria

Download Presentation

Presentation Transcript

Related

More Related Content