No Regret Algorithms in Games

undefined

No Regret Algorithms in Games

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

No Regret Algorithms

in

Games

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

No Regret Algorithms

in

Games

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

No Regret Algorithms

in

Social Interactions

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

No Regret

Algorithms

in

Social Interactions

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

No Regret

Behavior

in

Social Interactions

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

No Regret

Behavior

in

Social Interactions

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

undefined

“

Reasonable”

Behavior

in

Social Interactions

Georgios Piliouras

Georgia Institute of Technology

John Hopkins University

No Regret Learning



Regret(T) in a history of T periods:

total profit of algorithm

total profit of best fixed

action in hindsight

An algorithm is characterized as “no regret” if for

every input sequence the regret grows sublinearly in T.

[Blackwell 56], [Hannan 57], [Fundberg, Levine 94],…

(review)

No single action significantly

outperforms the dynamic.

No single action significantly

outperforms the dynamic.

No Regret Learning

(review)

Games (i.e. Social Interactions)



Interacting entities



Pursuing their own goals



Lack of centralized control

Games



players



 Set of

strategies

 for each player i



 Possible

states (strategy profiles)

S=

×



Utility

:S→



Social Welfare

Q:S→



 Extend to allow probabilities

Δ(S

), Δ(S)

(Δ(S))=

(u

(S))        Q(Δ(S))=

(Q(S))

(review)

         Games & Equilibria

Rock

Rock

Paper

Paper

Scissors

Scissors

Nash:

A product of

mixed strategies

s.t. no player has

a profitable deviating strategy.

1/3

1/3

1/3

1/3

1/3

1/3

Nash:

probability distribution over outcomes,

that is a

product of mixed strategies

s.t. no player has a profitable deviating strategy

Choose any of the green outcomes uniformly (prob. 1/9)

Rock

Paper

Scissors

Rock

Paper

Scissors

1/3

1/3

1/3

1/3

1/3

1/3

         Games & Equilibria

Nash:

probability distribution over outcomes,

s.t. no player has a profitable deviating strategy

Rock

Paper

Scissors

Rock

Paper

Scissors

1/3

1/3

1/3

1/3

1/3

1/3

         Games & Equilibria

Coarse Correlated Equilibria (CCE):

probability distribution over outcomes,

s.t. no player has a profitable deviating strategy

Rock

Paper

Scissors

Rock

Paper

Scissors

         Games & Equilibria

Coarse Correlated Equilibria (CCE):

probability distribution over outcomes,

s.t. no player has a profitable deviating strategy

Rock

Paper

Scissors

Rock

Paper

Scissors

         Games & Equilibria

Coarse Correlated Equilibria (CCE):

Choose any of the green outcomes uniformly (prob. 1/6)

Rock

Paper

Scissors

Rock

Paper

Scissors

         Algorithms Playing Games

Online algorithm: Takes as input the past history

of play until day t, and chooses a randomized

action for day t+1.

Alg 2

Alg 1

Rock

Paper

Scissors

Rock

Paper

Scissors

         Today’s  class

Online algorithm: Takes as input the past history

of play until day t, and chooses a randomized

action for day t+1.

Alg 2

Alg 1

No-Regret

No-Regret

No Regret Algorithms & CCE



A history of

no-regret

algorithms

is a

sequence

of outcomes

s.t.

no agent has

a single deviating action

that can increase her

average

 payoff.



Coarse Correlated

Equilibrium

is a

probability distribution

over outcomes

s.t.

no agent has

a single deviating action

that can increase her

expected

 payoff.

How good are the CCE?



 It depends…



Which

class of games

are we interested in?



Which notion of

social welfare



Today



Class of games:

otential

ames



Social welfare: makespan

[Kleinberg, P., Tardos STOC 09]

Congestion Games



n players and m resources (“edges”)



Each strategy corresponds to a set of resources

(“paths”)



Each edge has a cost function c

(x) that determines

the cost as a function on the # of players using it.



Cost experienced by a player = sum of edge costs

2x

2x

Cost(red)=6

Cost(green)=8

Equilibria and Social Welfare

Load Balancing

Makespan: Expected maximum latency over all links

c(x)=x

c(x)=x

c(x)=x

…

…

Equilibria and Social Welfare

Pure Nash

     Makespan = 1

c(x)=x

c(x)=x

c(x)=x

…

…

Equilibria and Social Welfare

(Mixed) Nash

 Makespan =

Ω

(logn/loglogn)  > 1

c(x)=x

c(x)=x

c(x)=x

1/n

1/n

1/n

 [Koutsoupias, Mavronicolas, Spirakis ’02], [Czumaj, Vöcking ’02]

     Makespan =

Θ

(logn/loglogn)  > 1

…

…

Equilibria and Social Welfare

Coarse Correlated Equilibria

 Makespan =

Ω

√

n) >>

Θ

(logn/loglogn)

c(x)=x

c(x)=x

c(x)=x

…

[Blum, Hajiaghayi, Ligett, Roth ’08]

…

Linear Load Balancing

Δ(S)

Pure

Nash

OPT

Q(worst Nash)=

 Θ(logn/loglogn)

Q(worst CCE)

= Θ(√n)

>>

Q(OPT)

CCE

Q(worst pure Nash)

Nash

>>

Q=makespan

Linear Load Balancing

Δ(S)

Pure

Nash

OPT

Price of Anarchy =

 Θ(logn/loglogn)

Price of Total Anarchy

= Θ(√n)

Q=makespan

>>

CCE

Price of Pure Anarchy

Nash

>>

Natural

 no-regret algorithms should be

able to steer away from

worst case

equilibria.

Our Hope

The Multiplicative Weights Algorithm

(MWA

)  [Littlestone, Warmuth ’94], [Freund, Schapire ‘99]



Pick s with probability proportional to

(1-ε)

total(s)

where

total(s)

 denotes combined cost in all past

periods.



Provable performance guarantees against

arbitrary

opponents

No Regret:

Against any sequence of opponents’ play,

avg. payoff converges to that of the

best fixed option

(or better).



 (t) is the current state of

the system (this is a tuple of

randomized strategies, one

for each player).



Each player tosses their

coins and a specific

outcome is realized.



Depending on the outcome

of these random events, we

transition to the next state.

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

(t+1)

(t+1)

(t+1)

Infinite

Markov Chains with

Infinite States

O(

ε

O(

ε

O(

ε



Problem 1:  Hard to get

intuition about the problem,

let alone analyze.



Let’s try to come up with a

“discounted” version of the

problem.



Ideas??

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

(t+1)

(t+1)

(t+1)

Infinite

Markov Chains with

Infinite States

O(

ε

O(

ε

O(

ε



Idea 1: Analyze expected

motion.

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

(t+1)

(t+1)

(t+1)

Infinite

Markov Chains with

Infinite States

O(

ε

O(

ε

O(

ε



The system evolution is

now deterministic. (i.e.

there exists a function f, s.t.



I wish to analyze this

function (e.g. find fixed

points).

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

 E[   (t+1)]

O(

ε

 E[   (t+1)]= f (   (t),

ε



Idea 1: Analyze expected

motion.



Idea 2: I wish to analyze the

MWA dynamics for small

ε



Use Taylor expansion to

find a first order

approximation to f.

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

 E[   (t+1)]

O(

ε

  f (   (t),

ε

) =  f (   (t),

0) +

ε

×f

´

(   (t),

0) + O(

ε



Problem 2: The function f is

still rather complicated.



Idea 2: I wish to analyze the

MWA dynamics for small

ε



Use Taylor expansion to

find a first order

approximation to f.

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

 E[   (t+1)]

O(

ε

  f (   (t),

ε

) ≈  f (   (t),

0) +

ε

×f

´

(   (t),

0)



Problem 2: The function f is

still rather complicated.



As

ε→

0, the equation

specifies a vector on each

point of  our state space (i.e.

a vector field).

This vector field defines a

system of  ODEs which we

are going to analyze.

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

 f (   (t),

ε

)-f (   (t),

0)

= f

´

(   (t),

0)

ε

´

(   (t),

0)



As

ε→

0, the equation

specifies a vector on each

point of  our state space (i.e.

a vector field).

This vector field defines a

system of  ODEs which we

are going to analyze.

(Multiplicative Weights) Algorithm in

(Potential) Games

Δ(S)

(t)

 f (   (t),

ε

)-f (   (t),

0)

= f

´

(   (t),

0)

ε

´

(   (t),

0)

Deriving the ODE



Taking expectations:



Differentiate w.r.t. ε, take expected value:



This is the

replicator dynamic

studied in

evolutionary game theory.

Motivating Example

c(x)=x

c(x)=x

Motivating Example



Each player’s mixed strategy is summarized by

a single number.  (Probability of picking

machine 1.)  Plot mixed strategy profile in R

Pure Nash

Mixed Nash

Motivating Example



Each player’s mixed strategy is summarized by

a single number.  (Probability of picking

machine 1.)  Plot mixed strategy profile in R

Motivating Example



Even in the simplest case of two balls, two bins

with linear utility the replicator equation has a

nonlinear  form.

The potential function



The congestion game has a potential function



Let Ψ=E[Φ].  A calculation yields



Hence Ψ decreases except when every player

randomizes over paths of equal expected cost (i.e. is a

Lyapunov

 function of the dynamics).

[Monderer-Shapley ’96].

Unstable vs. stable fixed points



The derivative of ξ is a matrix J (the Jacobian) whose spectrum

distinguishes stable from unstable fixed points.

Unstable

if some

eigenvalue has

positive real part

, else

neutrally stable

•

Non-Nash fixed points are unstable

. (Easy)

•

Which Nash are

unstable

Unstable Nash

c(x)=x

c(x)=x

.5

.5

.5

Motivating Example

c(x)=x

c(x)=x

.5

.5

.49

Motivating Example

c(x)=x

c(x)=x

.6

.4

.49

Motivating Example

c(x)=x

c(x)=x

.6

.4

.35

Unstable vs. stable fixed points



The derivative of ξ is a matrix J (the Jacobian) whose spectrum

distinguishes stable from unstable fixed points.

Unstable

if some

eigenvalue has

positive real part

, else

neutrally stable

•

Non-Nash fixed points are unstable

. (Easy)

•

Which Nash are

unstable

Unstable vs. stable Nash



submatrix of J of strategies with positive prob.



Every eigenvalue of J

 is an eigenvalue of J.



Computing the entries of J

 reveals that at Nash,



If Tr(J

)=0 and no eigenvalue has positive real part, then

they all are pure imaginary, so Tr(J

) ≤ 0.  But clearly

Tr(J

) ≥ 0.



Hence E[cost

(R,Q,s

-i,j

)] = E[cost

(R’, Q,s

-i,j

)] whenever

iR

,p

iR’

,p

jQ’

>0.

∴

A new refinement of Nash equilibria!

Weakly stable Nash equilibrium



Definition:  A

 weakly stable Nash equilibrium

is

a mixed Nash equilibrium (

σ

,...,σ

) such that:



for all players

i,j

...



if

 switches to using any pure strategy in

support(

σ

)...



then

 remains indifferent between all the

strategies in support(

σ

).

Weakly stable

Nash

are Nash

Δ(S)

Nash

Weakly stable Nash

Weakly stable Nash equilibrium



Definition:  A

 weakly stable Nash equilibrium

is

a mixed Nash equilibrium (

σ

,...,σ

) such that:



for all players

i,j

...



if

 switches to using any pure strategy in

support(

σ

)...



then

 remains indifferent between all the

strategies in support(

σ

).

Pure Nash Weakly stable

Δ(S)

Pure

Nash

Nash

Weakly stable Nash

c(x)=x

c(x)=x

How bad can a weakly stable NE be?

Price of Anarchy

Price of Total Anarchy

>>

Price of Pure Anarchy

>>

Δ(S)

Pure

Nash

Nash

Weakly stable Nash

How bad can a weakly stable NE be?

Price of Anarchy

>>

Price of Pure Anarchy

Δ(S)

Pure

Nash

Nash

Weakly stable Nash

How bad can a weakly stable NE be?

Price of Anarchy

Price of Pure Anarchy

Δ(S)

Pure

Nash

Nash

Weakly stable Nash

Price of  weakly stable

Anarchy

Mixed weakly stable NE are due to

rare coincidences



Relies on a coincidence: two

edges having equal cost

functions.



If we perturb the cost

functions — e.g. scale each

one by an independent

random factor between 1-δ

and 1+δ — then the game has

no mixed equilibrium with

the same support sets.

c(x)=x

c(x)=x

Example:

Games as vectors



If one fixes the set of players, facilities and

strategy sets

(combinatorial structure)

, the game is

determined by the vector of edge costs

{c}



Game + Strategy profile:

{p}

×

{c}

(1)

(2)

e’

(1)

e’

(2)

…

…

Weakly Stable Nash Restrictions

  The assertion

{p

iR

iεN,RεP(i)

 is a fully mixed weakly stable eq.

of the game with edge costs {c

(x)}

eεE, 1≤x≤n

   entails many equations among {p

},{c

(x)}:

 These are all polynomial equations.  (In fact,

multilinear.)  Do they imply a nonzero

polynomial equation among {c

(x)}?

Sard’s Theorem



The solution set of these

polynomials is an algebraic

variety X in

{p}

×

{c}

Project it onto

{c}

.  Is the

image dense?



Sard’s Theorem:

If

f:X→Y is smooth, the

image of the set of critical

points of f has measure

zero.

Sard’s Theorem



The solution set of these

polynomials is an algebraic

variety X in

{p}

×

{c}

Project it onto

{c}

.  Is the

image dense?



Sard’s Theorem:

If

f:X→Y is smooth, the

image of the set of critical

points of f has measure

zero.

Use an

alg. geom. version

of Sard’s Theorem

and

prove the derivative of f is

rank-deficient everywhere.

Summary

Price of Pure Anarchy

????

Multiplicative

Updates in

Potential Games

Price of Anarchy

Price of Total Anarchy

Summary

Price of Pure Anarchy

Multiplicative

Updates in

Potential Games

Price of Anarchy

Price of Total Anarchy

Expectations

ε

→0

Summary

Price of Pure Anarchy

Multiplicative

Updates in

Potential Games

Price of Anarchy

Price of Total Anarchy

Expectations

ε

→0

Summary

Price of Pure Anarchy

Multiplicative

Updates in

Potential Games

Price of Anarchy

Expectations

ε

→0

Jacobian

weakly stable Nash

Summary

Expectations

ε

→0

Price of Pure Anarchy

Jacobian

weakly stable Nash

Algebraic

Geometry

Multiplicative

Updates in

Potential Games

Summary

Expectations

ε

→0

Price of Pure Anarchy

Jacobian

weakly stable Nash

Algebraic

Geometry

Multiplicative

Updates in

Potential Games

Taylor Series

Manipulations

[Kleinberg, P., Tardos STOC 09]

Learning Algs + Social Welfare

Price of  Pure Anarchy

Price of Anarchy



How low can we go?



[Kleinberg, P., Tardos

STOC 09]



[Roughgarden STOC 09]

Any no-regret alg in

potential games, when

social welfare is the sum of

utilities.

Learning Algs + Social Welfare

Price of  Pure Anarchy

Price of Anarchy

Price of Stability



How low can we go?



[Kleinberg, P., Tardos

STOC 09]



[Roughgarden STOC 09]



[Balcan, Blum, Mansour

ICS 2010]

In specific classes of potential

games via public

advertising.

>>

Learning Algs + Social Welfare

Price of  Pure Anarchy

Price of Anarchy

>>

1 (=OPT)

Price of Stability



How low can we go?



[Kleinberg, P., Tardos STOC

09]



[Roughgarden STOC 09]



[Balcan, Blum, Mansour ICS

2010]



[Kleinberg, Ligett, P.,Tardos

ICS 2011]  Cycles of the

replicator dynamics  in

specific games.

>>

Learning Algs + Social Welfare

Price of  Pure Anarchy

Price of Anarchy

>>

1 (=OPT)

Price of Stability



How low can we go?



[Kleinberg, P., Tardos STOC

09]



[Roughgarden STOC 09]



[Balcan, Blum, Mansour ICS

2010]



[Kleinberg, Ligett, P.,Tardos

ICS 2011]



…

>>

Thank you

Slide Note

Embed Share

Download

This content delves into the concept of no regret algorithms in games and social interactions, particularly focusing on the work of Georgios Piliouras from the Georgia Institute of Technology and John Hopkins University. It discusses the application of these algorithms in decision-making processes, highlighting their importance in ensuring reasonable behavior and learning outcomes. The exploration covers various aspects such as behavior strategies and profitable actions, aiming to understand how these algorithms contribute to efficient decision-making and interactions in dynamic environments.

sam_pa Follow

Uploaded on Feb 22, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

Reasonable Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Learning (review) 1 0 No single action significantly outperforms the dynamic. 0 1 Regret(T) in a history of T periods: total profit of best fixed action in hindsight - total profit of algorithm An algorithm is characterized as no regret if for every input sequence the regret grows sublinearly in T. [Blackwell 56], [Hannan 57], [Fundberg, Levine 94],

No Regret Learning (review) 1 0 No single action significantly outperforms the dynamic. 0 1 Weather Profit Algorithm 3 Umbrella 3 Sunscreen 1

Games (i.e. Social Interactions) Interacting entities Pursuing their own goals Lack of centralized control

Games (review) n players Set of strategies Si for each player i Possible states (strategy profiles)S= Si Utility ui:S R Social Welfare Q:S R Extend to allow probabilities (Si), (S) ui( (S))=E(ui(S)) Q( (S))=E(Q(S))

Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Nash: A product of mixed strategies s.t. no player has a profitable deviating strategy.

Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Choose any of the green outcomes uniformly (prob. 1/9) Nash: A probability distribution over outcomes, that is a product of mixed strategies s.t. no player has a profitable deviating strategy.

Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Nash: A probability distribution over outcomes, Coarse Correlated Equilibria (CCE): s.t. no player has a profitable deviating strategy.

Games & Equilibria Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Coarse Correlated Equilibria (CCE): A probability distribution over outcomes, s.t. no player has a profitable deviating strategy.

Games & Equilibria Choose any of the green outcomes uniformly (prob. 1/6) Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Coarse Correlated Equilibria (CCE): A probability distribution over outcomes, s.t. no player has a profitable deviating strategy.

Algorithms Playing Games Alg 2 Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Alg 1 . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.

Todays class No-Regret Alg 2 Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 No-Regret Rock Paper Scissors Alg 1 . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.

No Regret Algorithms & CCE A history of no-regret algorithms is a sequence of outcomes s.t. no agent has a single deviating action that can increase her average payoff. A Coarse Correlated Equilibrium is a probability distribution over outcomes s.t. no agent has a single deviating action that can increase her expected payoff.

How good are the CCE? It depends Which class of games are we interested in? Which notion of social welfare? Today Class of games: potential games Social welfare: makespan [Kleinberg, P., Tardos STOC 09]

Congestion Games n players and m resources ( edges ) Each strategy corresponds to a set of resources ( paths ) Each edge has a cost function ce(x) that determines the cost as a function on the # of players using it. Cost experienced by a player = sum of edge costs x x x x Cost(red)=6 Cost(green)=8 2x x x 2x

Equilibria and Social Welfare Load Balancing c(x)=x c(x)=x c(x)=x Makespan: Expected maximum latency over all links

Equilibria and Social Welfare Pure Nash 1 1 1 c(x)=x c(x)=x c(x)=x Makespan = 1

Equilibria and Social Welfare (Mixed) Nash 1/n 1/n 1/n c(x)=x c(x)=x c(x)=x Makespan = (logn/loglogn) > 1 Makespan = (logn/loglogn) > 1 [Koutsoupias, Mavronicolas, Spirakis 02], [Czumaj, V cking 02]

Equilibria and Social Welfare Coarse Correlated Equilibria c(x)=x c(x)=x c(x)=x Makespan = ( n) >> (logn/loglogn) [Blum, Hajiaghayi, Ligett, Roth 08]

Linear Load Balancing Q=makespan Q(worst CCE) = ( n) >> CCE Nash Q(worst Nash)= (logn/loglogn) >> Q(worst pure Nash) Pure Nash Q(OPT) OPT (S)

Linear Load Balancing Q=makespan Price of Total Anarchy = ( n) >> CCE Nash Price of Anarchy = (logn/loglogn) >> Price of Pure Anarchy Pure Nash 1 OPT (S)

Our Hope Natural no-regret algorithms should be able to steer away from worst case equilibria.

The Multiplicative Weights Algorithm (MWA) [Littlestone, Warmuth 94], [Freund, Schapire 99] Pick s with probability proportional to (1- )total(s), where total(s) denotes combined cost in all past periods. Provable performance guarantees against arbitrary opponents No Regret: Against any sequence of opponents play, avg. payoff converges to that of the best fixed option (or better).

(Multiplicative Weights) Algorithm in (Potential) Games (t) is the current state of the system (this is a tuple of randomized strategies, one for each player). Infinite Markov Chains with Infinite States (t+1) (t+1) Each player tosses their coins and a specific outcome is realized. O( ) (t) O( ) Depending on the outcome of these random events, we transition to the next state. (t+1) O( ) (S)

(Multiplicative Weights) Algorithm in (Potential) Games Problem 1: Hard to get intuition about the problem, let alone analyze. Infinite Markov Chains with Infinite States (t+1) Let s try to come up with a discounted version of the problem. (t+1) O( ) (t) O( ) Ideas?? (t+1) O( ) (S)

(Multiplicative Weights) Algorithm in (Potential) Games Idea 1: Analyze expected motion. Infinite Markov Chains with Infinite States (t+1) (t+1) O( ) (t) O( ) (t+1) O( ) (S)

(Multiplicative Weights) Algorithm in (Potential) Games Idea 1: Analyze expected motion. The system evolution is now deterministic. (i.e. there exists a function f, s.t. E[ (t+1)] (t) E[ (t+1)]= f ( (t), ) O( ) I wish to analyze this function (e.g. find fixed points). (S)

(Multiplicative Weights) Algorithm in (Potential) Games Problem 2: The function f is still rather complicated. Idea 2: I wish to analyze the MWA dynamics for small . E[ (t+1)] Use Taylor expansion to find a first order approximation to f. (t) O( ) f ( (t), ) = f ( (t), 0) + f ( (t), 0) + O( 2) (S)

(Multiplicative Weights) Algorithm in (Potential) Games Problem 2: The function f is still rather complicated. Idea 2: I wish to analyze the MWA dynamics for small . E[ (t+1)] Use Taylor expansion to find a first order approximation to f. (t) O( ) f ( (t), ) f ( (t), 0) + f ( (t), 0) (S)

(Multiplicative Weights) Algorithm in (Potential) Games As 0, the equation f ( (t), )-f ( (t), 0) = f ( (t), 0) specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (t) f ( (t), 0) (S)

(Multiplicative Weights) Algorithm in (Potential) Games As 0, the equation f ( (t), )-f ( (t), 0) = f ( (t), 0) specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (t) f ( (t), 0) (S)

Deriving the ODE Taking expectations: Differentiate w.r.t. , take expected value: This is the replicator dynamic studied in evolutionary game theory.

Motivating Example c(x)=x c(x)=x

Motivating Example Each player s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2. Mixed Nash Pure Nash

Motivating Example Each player s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2.

Motivating Example Even in the simplest case of two balls, two bins with linear utility the replicator equation has a nonlinear form.

The potential function The congestion game has a potential function Let =E[ ]. A calculation yields Hence decreases except when every player randomizes over paths of equal expected cost (i.e. is a Lyapunov function of the dynamics). [Monderer-Shapley 96].

Unstable vs. stable fixed points The derivative of is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?

Unstable Nash .5 .5 .5 .5 c(x)=x c(x)=x

Motivating Example .49 .5 .51 .5 c(x)=x c(x)=x

Motivating Example .49 .4 .51 .6 c(x)=x c(x)=x

Motivating Example .35 .4 .65 .6 c(x)=x c(x)=x

Unstable vs. stable fixed points The derivative of is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?

No Regret Algorithms in Games

Download Presentation

Presentation Transcript

Related

More Related Content