No Regret Algorithms in Games

undefined
No Regret Algorithms in Games
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
No Regret Algorithms
 in
Games
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
No Regret Algorithms
 in
Games
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
No Regret Algorithms
 in
Social Interactions
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
No Regret 
Algorithms
 in
Social Interactions
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
No Regret 
Behavior
 in
Social Interactions
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
No Regret 
Behavior
 in
Social Interactions
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
undefined
Reasonable”
 
Behavior
 in
Social Interactions
Georgios Piliouras
Georgia Institute of Technology
John Hopkins University
No Regret Learning
 
 
 
Regret(T) in a history of T periods:
total profit of algorithm
total profit of best fixed
action in hindsight
 
-
 
An algorithm is characterized as “no regret” if for
every input sequence the regret grows sublinearly in T.
 
[Blackwell 56], [Hannan 57], [Fundberg, Levine 94],…
(review)
No single action significantly
outperforms the dynamic.
 
No single action significantly
outperforms the dynamic.
No Regret Learning
(review)
Games (i.e. Social Interactions)
Interacting entities
Pursuing their own goals
Lack of centralized control
Games
 
 
 n 
players
 Set of 
strategies
 S
i
 for each player i
 Possible 
states (strategy profiles)
 
S=
×
S
i
 
Utility 
u
i
:S→
R
 
Social Welfare 
Q:S→
R
 Extend to allow probabilities
 
Δ(S
i
), Δ(S)
     
u
i
(Δ(S))=
E
(u
i
(S))        Q(Δ(S))=
E
(Q(S))
 
 
(review)
         Games & Equilibria
      
Rock
Rock
Paper
Paper
Scissors
Scissors
Nash:
A product of 
mixed strategies 
s.t. no player has
a profitable deviating strategy.
 
1/3
 
1/3
 
1/3
 
1/3
 
1/3
 
1/3
      
Nash:
 
A
 
probability distribution over outcomes,
that is a 
product of mixed strategies
s.t. no player has a profitable deviating strategy
.
 
Choose any of the green outcomes uniformly (prob. 1/9)
Rock
Paper
Scissors
Rock
Paper
Scissors
1/3
1/3
1/3
1/3
1/3
1/3
         Games & Equilibria
      
 
Nash:
 
A
 
probability distribution over outcomes,
 
s.t. no player has a profitable deviating strategy
.
Rock
Paper
Scissors
Rock
Paper
Scissors
1/3
1/3
1/3
1/3
1/3
1/3
         Games & Equilibria
 
Coarse Correlated Equilibria (CCE):
      
 
A
 
probability distribution over outcomes,
s.t. no player has a profitable deviating strategy
.
Rock
Paper
Scissors
Rock
Paper
Scissors
         Games & Equilibria
Coarse Correlated Equilibria (CCE):
      
 
A
 
probability distribution over outcomes,
s.t. no player has a profitable deviating strategy
.
Rock
Paper
Scissors
Rock
Paper
Scissors
         Games & Equilibria
Coarse Correlated Equilibria (CCE):
Choose any of the green outcomes uniformly (prob. 1/6)
      
.
Rock
Paper
Scissors
Rock
Paper
Scissors
         Algorithms Playing Games
 
Online algorithm: Takes as input the past history
of play until day t, and chooses a randomized
action for day t+1.
 
Alg 2
 
Alg 1
      
.
Rock
Paper
Scissors
Rock
Paper
Scissors
         Today’s  class
 
Online algorithm: Takes as input the past history
of play until day t, and chooses a randomized
action for day t+1.
Alg 2
Alg 1
 
No-Regret
 
No-Regret
No Regret Algorithms & CCE
 
A history of 
no-regret
algorithms
 
is a
sequence 
of outcomes
s.t.
no agent has
a single deviating action
that can increase her
average
 payoff.
 
A 
 
Coarse Correlated
Equilibrium  
is a
probability distribution
over outcomes 
s.t.
no agent has
a single deviating action
that can increase her
expected
 payoff.
How good are the CCE?
 
 It depends…
Which 
class of games 
are we interested in?
Which notion of 
social welfare
?
 
Today
Class of games: 
p
otential 
g
ames
Social welfare: makespan
 
[Kleinberg, P., Tardos STOC 09]
Congestion Games
n players and m resources (“edges”)
Each strategy corresponds to a set of resources
(“paths”)
Each edge has a cost function c
e
(x) that determines
the cost as a function on the # of players using it.
Cost experienced by a player = sum of edge costs
x
x
x
x
2x
2x
x
x
 
 
Cost(red)=6
Cost(green)=8
Equilibria and Social Welfare
  
Load Balancing
 
Makespan: Expected maximum latency over all links
c(x)=x
c(x)=x
c(x)=x
Equilibria and Social Welfare
  
Pure Nash
 
     Makespan = 1
c(x)=x
c(x)=x
c(x)=x
1
1
1
Equilibria and Social Welfare
 
  
(Mixed) Nash
 
 
 
 
 Makespan = 
Ω
(logn/loglogn)  > 1
c(x)=x
c(x)=x
c(x)=x
 
1/n
 
1/n
 
1/n
 
 [Koutsoupias, Mavronicolas, Spirakis ’02], [Czumaj, Vöcking ’02]
 
 
 
 
 
     Makespan = 
Θ
(logn/loglogn)  > 1
Equilibria and Social Welfare
  
Coarse Correlated Equilibria
 
 Makespan = 
Ω
(
n) >> 
Θ
(logn/loglogn)  
c(x)=x
c(x)=x
c(x)=x
[Blum, Hajiaghayi, Ligett, Roth ’08]
Linear Load Balancing
Δ(S)
Pure
Nash
OPT
Q(worst Nash)=
 Θ(logn/loglogn)
Q(worst CCE)
= Θ(√n)
>>
Q(OPT)
 
CCE
Q(worst pure Nash)
 
Nash
>>
Q=makespan
Linear Load Balancing
Δ(S)
Pure
Nash
OPT
Price of Anarchy =
 Θ(logn/loglogn)
Price of Total Anarchy
= Θ(√n)
Q=makespan
>>
1
 
CCE
Price of Pure Anarchy
 
Nash
>>
        
Natural
 no-regret algorithms should be
able to steer away from 
worst case 
equilibria.
Our Hope
The Multiplicative Weights Algorithm
(MWA
)  [Littlestone, Warmuth ’94], [Freund, Schapire ‘99]
 
Pick s with probability proportional to 
(1-ε)
total(s)
,
where 
total(s)
 denotes combined cost in all past
periods.
 
Provable performance guarantees against 
arbitrary
opponents
No Regret: 
Against any sequence of opponents’ play,
avg. payoff converges to that of the 
best fixed option
(or better).
 
  
 (t) is the current state of
the system (this is a tuple of
randomized strategies, one
for each player).
Each player tosses their
coins and a specific
outcome is realized.
Depending on the outcome
of these random events, we
transition to the next state.
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
 
(t+1)
 
(t+1)
 
(t+1)
 
Infinite
Markov Chains with
Infinite States
 
O(
ε
)
 
O(
ε
)
 
O(
ε
)
Problem 1:  Hard to get
intuition about the problem,
let alone analyze.
Let’s try to come up with a
“discounted” version of the
problem.
Ideas??
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
(t+1)
(t+1)
(t+1)
Infinite
Markov Chains with
Infinite States
O(
ε
)
O(
ε
)
O(
ε
)
Idea 1: Analyze expected
motion.
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
(t+1)
(t+1)
(t+1)
Infinite
Markov Chains with
Infinite States
O(
ε
)
O(
ε
)
O(
ε
)
 
 
 
The system evolution is
now deterministic. (i.e.
there exists a function f, s.t.
 
 
I wish to analyze this
function (e.g. find fixed
points).
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
 E[   (t+1)]
O(
ε
)
 
 E[   (t+1)]= f (   (t),
 
ε
 )
Idea 1: Analyze expected
motion.
 
 
 
Idea 2: I wish to analyze the
MWA dynamics for small 
ε
.
Use Taylor expansion to
find a first order
approximation to f.
 
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
 E[   (t+1)]
O(
ε
)
  f (   (t),
 
ε
) =  f (   (t),
 
0) +
ε
 ×f 
(   (t),
 
0) + O(
ε
2
)
Problem 2: The function f is
still rather complicated.
Idea 2: I wish to analyze the
MWA dynamics for small 
ε
.
Use Taylor expansion to
find a first order
approximation to f.
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
 E[   (t+1)]
O(
ε
)
  f (   (t),
 
ε
) ≈  f (   (t),
 
0) +
ε
 ×f 
(   (t),
 
0)
Problem 2: The function f is
still rather complicated.
As 
ε→
0, the equation
specifies a vector on each
point of  our state space (i.e.
a vector field).
This vector field defines a
system of  ODEs which we
are going to analyze.
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
 f (   (t),
 
ε
)-f (   (t),
 
0)
= f
(   (t),
 
0)
ε
 f
(   (t),
 
0)
As 
ε→
0, the equation
specifies a vector on each
point of  our state space (i.e.
a vector field).
This vector field defines a
system of  ODEs which we
are going to analyze.
(Multiplicative Weights) Algorithm in
(Potential) Games
Δ(S)
(t)
 
 f (   (t),
 
ε
)-f (   (t),
 
0)
= f
(   (t),
 
0)
ε
 f
(   (t),
 
0)
Deriving the ODE
Taking expectations:
Differentiate w.r.t. ε, take expected value:
This is the 
replicator dynamic 
studied in
evolutionary game theory.
Motivating Example
 
c(x)=x
c(x)=x
Motivating Example
Each player’s mixed strategy is summarized by
a single number.  (Probability of picking
machine 1.)  Plot mixed strategy profile in R
2
.
Pure Nash
Mixed Nash
Motivating Example
Each player’s mixed strategy is summarized by
a single number.  (Probability of picking
machine 1.)  Plot mixed strategy profile in R
2
.
Motivating Example
Even in the simplest case of two balls, two bins
with linear utility the replicator equation has a
nonlinear  form.
The potential function
The congestion game has a potential function
Let Ψ=E[Φ].  A calculation yields
Hence Ψ decreases except when every player
randomizes over paths of equal expected cost (i.e. is a
Lyapunov
 function of the dynamics).
[Monderer-Shapley ’96].
Unstable vs. stable fixed points
 
The derivative of ξ is a matrix J (the Jacobian) whose spectrum
distinguishes stable from unstable fixed points.  
Unstable 
if some
eigenvalue has 
positive real part
, else 
neutrally stable
.
Non-Nash fixed points are unstable
. (Easy)
Which Nash are 
unstable
?
Unstable Nash
 
c(x)=x
c(x)=x
.5
.5
.5
Motivating Example
 
c(x)=x
c(x)=x
.5
.5
.49
Motivating Example
 
c(x)=x
c(x)=x
.6
.4
.49
Motivating Example
 
c(x)=x
c(x)=x
.6
.4
.35
Unstable vs. stable fixed points
The derivative of ξ is a matrix J (the Jacobian) whose spectrum
distinguishes stable from unstable fixed points.  
Unstable 
if some
eigenvalue has 
positive real part
, else 
neutrally stable
.
Non-Nash fixed points are unstable
. (Easy)
Which Nash are 
unstable
? 
Unstable vs. stable Nash
J
p 
submatrix of J of strategies with positive prob.
Every eigenvalue of J
p
 is an eigenvalue of J.
Computing the entries of J
p
 reveals that at Nash,
If Tr(J
p
)=0 and no eigenvalue has positive real part, then
they all are pure imaginary, so Tr(J
p
2
) ≤ 0.  But clearly
Tr(J
p
2
) ≥ 0.
Hence E[cost
i
(R,Q,s
-i,j
)] = E[cost
i
(R’, Q,s
-i,j
)] whenever
p
iR
,p
iR’
,p
jQ’
>0.  
 
A new refinement of Nash equilibria!
Weakly stable Nash equilibrium
Definition:  A
 weakly stable Nash equilibrium 
is
a mixed Nash equilibrium (
σ
1
,...,σ
n
) such that:
for all players 
i,j
...
if 
i
 switches to using any pure strategy in
support(
σ
i
)...
then 
j
 remains indifferent between all the
strategies in support(
σ
j
).
Weakly stable 
Nash 
are Nash
Δ(S)
Nash
Weakly stable Nash
Weakly stable Nash equilibrium
Definition:  A
 weakly stable Nash equilibrium 
is
a mixed Nash equilibrium (
σ
1
,...,σ
n
) such that:
for all players 
i,j
...
if 
i
 switches to using any pure strategy in
support(
σ
i
)...
then 
j
 remains indifferent between all the
strategies in support(
σ
j
).
Pure Nash Weakly stable
Δ(S)
Pure
Nash
Nash
Weakly stable Nash
 
c(x)=x
 
c(x)=x
How bad can a weakly stable NE be?
Price of Anarchy 
Price of Total Anarchy
>>
1
 
Price of Pure Anarchy
 
>>
Δ(S)
Pure
Nash
Nash
Weakly stable Nash
How bad can a weakly stable NE be?
Price of Anarchy 
 
>>
1
 
Price of Pure Anarchy
 
Δ(S)
Pure
Nash
Nash
Weakly stable Nash
How bad can a weakly stable NE be?
Price of Anarchy 
 
1
 
Price of Pure Anarchy
 
Δ(S)
Pure
Nash
Nash
Weakly stable Nash
 
=
 
Price of  weakly stable
Anarchy
Mixed weakly stable NE are due to
rare coincidences
 
Relies on a coincidence: two
edges having equal cost
functions.
If we perturb the cost
functions — e.g. scale each
one by an independent
random factor between 1-δ
and 1+δ — then the game has
no mixed equilibrium with
the same support sets.
c(x)=x
c(x)=x
Example:
Games as vectors
If one fixes the set of players, facilities and
strategy sets 
(combinatorial structure)
, the game is
determined by the vector of edge costs 
R
{c}
.
Game + Strategy profile: 
R
{p}
×
R
{c}
.
C
e
(1)
C
e
(2)
C
e’
(1)
C
e’
(2)
Weakly Stable Nash Restrictions
  The assertion
  
{p
iR
}
iεN,RεP(i)
 is a fully mixed weakly stable eq.
of the game with edge costs {c
e
(x)}
eεE, 1≤x≤n
   entails many equations among {p
i
},{c
e
(x)}:
 These are all polynomial equations.  (In fact,
multilinear.)  Do they imply a nonzero
polynomial equation among {c
e
(x)}?
Sard’s Theorem
The solution set of these
polynomials is an algebraic
variety X in 
R
{p}
×
R
{c}
.
Project it onto 
R
{c}
.  Is the
image dense?
Sard’s Theorem:  
If
f:X→Y is smooth, the
image of the set of critical
points of f has measure
zero.
Sard’s Theorem
The solution set of these
polynomials is an algebraic
variety X in 
R
{p}
×
R
{c}
.
Project it onto 
R
{c}
.  Is the
image dense?
Sard’s Theorem:  
If
f:X→Y is smooth, the
image of the set of critical
points of f has measure
zero.
 
Use an 
alg. geom. version
of Sard’s Theorem 
and
prove the derivative of f is
rank-deficient everywhere.
Summary
1
Price of Pure Anarchy
????
Multiplicative
Updates in
Potential Games
Price of Anarchy
Price of Total Anarchy
Summary
1
Price of Pure Anarchy
Multiplicative
Updates in
Potential Games
Price of Anarchy
Price of Total Anarchy
Expectations
 &  
ε
→0
Summary
1
Price of Pure Anarchy
Multiplicative
Updates in
Potential Games
Price of Anarchy
 
Price of Total Anarchy
Expectations
 &  
ε
→0
Summary
1
Price of Pure Anarchy
Multiplicative
Updates in
Potential Games
Price of Anarchy
Expectations
 &  
ε
→0
Jacobian
weakly stable Nash
Summary
Expectations
 &  
ε
→0
1
Price of Pure Anarchy
Jacobian
weakly stable Nash
Algebraic
Geometry
Multiplicative
Updates in
Potential Games
Summary
Expectations
 &  
ε
→0
1
Price of Pure Anarchy
Jacobian
weakly stable Nash
Algebraic
Geometry
Multiplicative
Updates in
Potential Games
Taylor Series
Manipulations
 
[Kleinberg, P., Tardos STOC 09]
Learning Algs + Social Welfare
 
Price of  Pure Anarchy
Price of Anarchy
How low can we go?
[Kleinberg, P., Tardos
STOC 09]
[Roughgarden STOC 09]
Any no-regret alg in
potential games, when
social welfare is the sum of
utilities.
Learning Algs + Social Welfare
Price of  Pure Anarchy
Price of Anarchy
Price of Stability
 
How low can we go?
[Kleinberg, P., Tardos
STOC 09]
[Roughgarden STOC 09]
[Balcan, Blum, Mansour
ICS 2010]
In specific classes of potential
games via public
advertising.
>>
Learning Algs + Social Welfare
Price of  Pure Anarchy
Price of Anarchy
>>
1 (=OPT)
 
Price of Stability
 
How low can we go?
[Kleinberg, P., Tardos STOC
09]
[Roughgarden STOC 09]
[Balcan, Blum, Mansour ICS
2010]
[Kleinberg, Ligett, P.,Tardos
ICS 2011]  Cycles of the
replicator dynamics  in
specific games.
>>
Learning Algs + Social Welfare
Price of  Pure Anarchy
Price of Anarchy
>>
1 (=OPT)
 
Price of Stability
 
How low can we go?
[Kleinberg, P., Tardos STOC
09]
[Roughgarden STOC 09]
[Balcan, Blum, Mansour ICS
2010]
[Kleinberg, Ligett, P.,Tardos
ICS 2011]
>>
Thank you
Slide Note
Embed
Share

This content delves into the concept of no regret algorithms in games and social interactions, particularly focusing on the work of Georgios Piliouras from the Georgia Institute of Technology and John Hopkins University. It discusses the application of these algorithms in decision-making processes, highlighting their importance in ensuring reasonable behavior and learning outcomes. The exploration covers various aspects such as behavior strategies and profitable actions, aiming to understand how these algorithms contribute to efficient decision-making and interactions in dynamic environments.

  • Algorithms
  • Games
  • Social Interactions
  • Decision-making
  • Georgia Institute of Technology

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

  2. No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

  3. No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

  4. No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

  5. No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

  6. No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

  7. No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

  8. Reasonable Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

  9. No Regret Learning (review) 1 0 No single action significantly outperforms the dynamic. 0 1 Regret(T) in a history of T periods: total profit of best fixed action in hindsight - total profit of algorithm An algorithm is characterized as no regret if for every input sequence the regret grows sublinearly in T. [Blackwell 56], [Hannan 57], [Fundberg, Levine 94],

  10. No Regret Learning (review) 1 0 No single action significantly outperforms the dynamic. 0 1 Weather Profit Algorithm 3 Umbrella 3 Sunscreen 1

  11. Games (i.e. Social Interactions) Interacting entities Pursuing their own goals Lack of centralized control

  12. Games (review) n players Set of strategies Si for each player i Possible states (strategy profiles)S= Si Utility ui:S R Social Welfare Q:S R Extend to allow probabilities (Si), (S) ui( (S))=E(ui(S)) Q( (S))=E(Q(S))

  13. Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Nash: A product of mixed strategies s.t. no player has a profitable deviating strategy.

  14. Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Choose any of the green outcomes uniformly (prob. 1/9) Nash: A probability distribution over outcomes, that is a product of mixed strategies s.t. no player has a profitable deviating strategy.

  15. Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Nash: A probability distribution over outcomes, Coarse Correlated Equilibria (CCE): s.t. no player has a profitable deviating strategy.

  16. Games & Equilibria Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Coarse Correlated Equilibria (CCE): A probability distribution over outcomes, s.t. no player has a profitable deviating strategy.

  17. Games & Equilibria Choose any of the green outcomes uniformly (prob. 1/6) Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Coarse Correlated Equilibria (CCE): A probability distribution over outcomes, s.t. no player has a profitable deviating strategy.

  18. Algorithms Playing Games Alg 2 Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Alg 1 . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.

  19. Todays class No-Regret Alg 2 Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 No-Regret Rock Paper Scissors Alg 1 . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.

  20. No Regret Algorithms & CCE A history of no-regret algorithms is a sequence of outcomes s.t. no agent has a single deviating action that can increase her average payoff. A Coarse Correlated Equilibrium is a probability distribution over outcomes s.t. no agent has a single deviating action that can increase her expected payoff.

  21. How good are the CCE? It depends Which class of games are we interested in? Which notion of social welfare? Today Class of games: potential games Social welfare: makespan [Kleinberg, P., Tardos STOC 09]

  22. Congestion Games n players and m resources ( edges ) Each strategy corresponds to a set of resources ( paths ) Each edge has a cost function ce(x) that determines the cost as a function on the # of players using it. Cost experienced by a player = sum of edge costs x x x x Cost(red)=6 Cost(green)=8 2x x x 2x

  23. Equilibria and Social Welfare Load Balancing c(x)=x c(x)=x c(x)=x Makespan: Expected maximum latency over all links

  24. Equilibria and Social Welfare Pure Nash 1 1 1 c(x)=x c(x)=x c(x)=x Makespan = 1

  25. Equilibria and Social Welfare (Mixed) Nash 1/n 1/n 1/n c(x)=x c(x)=x c(x)=x Makespan = (logn/loglogn) > 1 Makespan = (logn/loglogn) > 1 [Koutsoupias, Mavronicolas, Spirakis 02], [Czumaj, V cking 02]

  26. Equilibria and Social Welfare Coarse Correlated Equilibria c(x)=x c(x)=x c(x)=x Makespan = ( n) >> (logn/loglogn) [Blum, Hajiaghayi, Ligett, Roth 08]

  27. Linear Load Balancing Q=makespan Q(worst CCE) = ( n) >> CCE Nash Q(worst Nash)= (logn/loglogn) >> Q(worst pure Nash) Pure Nash Q(OPT) OPT (S)

  28. Linear Load Balancing Q=makespan Price of Total Anarchy = ( n) >> CCE Nash Price of Anarchy = (logn/loglogn) >> Price of Pure Anarchy Pure Nash 1 OPT (S)

  29. Our Hope Natural no-regret algorithms should be able to steer away from worst case equilibria.

  30. The Multiplicative Weights Algorithm (MWA) [Littlestone, Warmuth 94], [Freund, Schapire 99] Pick s with probability proportional to (1- )total(s), where total(s) denotes combined cost in all past periods. Provable performance guarantees against arbitrary opponents No Regret: Against any sequence of opponents play, avg. payoff converges to that of the best fixed option (or better).

  31. (Multiplicative Weights) Algorithm in (Potential) Games (t) is the current state of the system (this is a tuple of randomized strategies, one for each player). Infinite Markov Chains with Infinite States (t+1) (t+1) Each player tosses their coins and a specific outcome is realized. O( ) (t) O( ) Depending on the outcome of these random events, we transition to the next state. (t+1) O( ) (S)

  32. (Multiplicative Weights) Algorithm in (Potential) Games Problem 1: Hard to get intuition about the problem, let alone analyze. Infinite Markov Chains with Infinite States (t+1) Let s try to come up with a discounted version of the problem. (t+1) O( ) (t) O( ) Ideas?? (t+1) O( ) (S)

  33. (Multiplicative Weights) Algorithm in (Potential) Games Idea 1: Analyze expected motion. Infinite Markov Chains with Infinite States (t+1) (t+1) O( ) (t) O( ) (t+1) O( ) (S)

  34. (Multiplicative Weights) Algorithm in (Potential) Games Idea 1: Analyze expected motion. The system evolution is now deterministic. (i.e. there exists a function f, s.t. E[ (t+1)] (t) E[ (t+1)]= f ( (t), ) O( ) I wish to analyze this function (e.g. find fixed points). (S)

  35. (Multiplicative Weights) Algorithm in (Potential) Games Problem 2: The function f is still rather complicated. Idea 2: I wish to analyze the MWA dynamics for small . E[ (t+1)] Use Taylor expansion to find a first order approximation to f. (t) O( ) f ( (t), ) = f ( (t), 0) + f ( (t), 0) + O( 2) (S)

  36. (Multiplicative Weights) Algorithm in (Potential) Games Problem 2: The function f is still rather complicated. Idea 2: I wish to analyze the MWA dynamics for small . E[ (t+1)] Use Taylor expansion to find a first order approximation to f. (t) O( ) f ( (t), ) f ( (t), 0) + f ( (t), 0) (S)

  37. (Multiplicative Weights) Algorithm in (Potential) Games As 0, the equation f ( (t), )-f ( (t), 0) = f ( (t), 0) specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (t) f ( (t), 0) (S)

  38. (Multiplicative Weights) Algorithm in (Potential) Games As 0, the equation f ( (t), )-f ( (t), 0) = f ( (t), 0) specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (t) f ( (t), 0) (S)

  39. Deriving the ODE Taking expectations: Differentiate w.r.t. , take expected value: This is the replicator dynamic studied in evolutionary game theory.

  40. Motivating Example c(x)=x c(x)=x

  41. Motivating Example Each player s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2. Mixed Nash Pure Nash

  42. Motivating Example Each player s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2.

  43. Motivating Example Even in the simplest case of two balls, two bins with linear utility the replicator equation has a nonlinear form.

  44. The potential function The congestion game has a potential function Let =E[ ]. A calculation yields Hence decreases except when every player randomizes over paths of equal expected cost (i.e. is a Lyapunov function of the dynamics). [Monderer-Shapley 96].

  45. Unstable vs. stable fixed points The derivative of is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?

  46. Unstable Nash .5 .5 .5 .5 c(x)=x c(x)=x

  47. Motivating Example .49 .5 .51 .5 c(x)=x c(x)=x

  48. Motivating Example .49 .4 .51 .6 c(x)=x c(x)=x

  49. Motivating Example .35 .4 .65 .6 c(x)=x c(x)=x

  50. Unstable vs. stable fixed points The derivative of is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#