No Regret Algorithms in Games
This content delves into the concept of no regret algorithms in games and social interactions, particularly focusing on the work of Georgios Piliouras from the Georgia Institute of Technology and John Hopkins University. It discusses the application of these algorithms in decision-making processes, highlighting their importance in ensuring reasonable behavior and learning outcomes. The exploration covers various aspects such as behavior strategies and profitable actions, aiming to understand how these algorithms contribute to efficient decision-making and interactions in dynamic environments.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University
Reasonable Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University
No Regret Learning (review) 1 0 No single action significantly outperforms the dynamic. 0 1 Regret(T) in a history of T periods: total profit of best fixed action in hindsight - total profit of algorithm An algorithm is characterized as no regret if for every input sequence the regret grows sublinearly in T. [Blackwell 56], [Hannan 57], [Fundberg, Levine 94],
No Regret Learning (review) 1 0 No single action significantly outperforms the dynamic. 0 1 Weather Profit Algorithm 3 Umbrella 3 Sunscreen 1
Games (i.e. Social Interactions) Interacting entities Pursuing their own goals Lack of centralized control
Games (review) n players Set of strategies Si for each player i Possible states (strategy profiles)S= Si Utility ui:S R Social Welfare Q:S R Extend to allow probabilities (Si), (S) ui( (S))=E(ui(S)) Q( (S))=E(Q(S))
Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Nash: A product of mixed strategies s.t. no player has a profitable deviating strategy.
Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Choose any of the green outcomes uniformly (prob. 1/9) Nash: A probability distribution over outcomes, that is a product of mixed strategies s.t. no player has a profitable deviating strategy.
Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 1/3 1/3 0, 0 1, -1 -1, 1 -1, 1 0, 0 1, -1 1, -1 -1, 1 0, 0 Rock Paper Scissors Nash: A probability distribution over outcomes, Coarse Correlated Equilibria (CCE): s.t. no player has a profitable deviating strategy.
Games & Equilibria Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Coarse Correlated Equilibria (CCE): A probability distribution over outcomes, s.t. no player has a profitable deviating strategy.
Games & Equilibria Choose any of the green outcomes uniformly (prob. 1/6) Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Coarse Correlated Equilibria (CCE): A probability distribution over outcomes, s.t. no player has a profitable deviating strategy.
Algorithms Playing Games Alg 2 Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 Rock Paper Scissors Alg 1 . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.
Todays class No-Regret Alg 2 Paper -1, 1 0, 0 1, -1 Scissors 1, -1 -1, 1 0, 0 Rock 0, 0 1, -1 -1, 1 No-Regret Rock Paper Scissors Alg 1 . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.
No Regret Algorithms & CCE A history of no-regret algorithms is a sequence of outcomes s.t. no agent has a single deviating action that can increase her average payoff. A Coarse Correlated Equilibrium is a probability distribution over outcomes s.t. no agent has a single deviating action that can increase her expected payoff.
How good are the CCE? It depends Which class of games are we interested in? Which notion of social welfare? Today Class of games: potential games Social welfare: makespan [Kleinberg, P., Tardos STOC 09]
Congestion Games n players and m resources ( edges ) Each strategy corresponds to a set of resources ( paths ) Each edge has a cost function ce(x) that determines the cost as a function on the # of players using it. Cost experienced by a player = sum of edge costs x x x x Cost(red)=6 Cost(green)=8 2x x x 2x
Equilibria and Social Welfare Load Balancing c(x)=x c(x)=x c(x)=x Makespan: Expected maximum latency over all links
Equilibria and Social Welfare Pure Nash 1 1 1 c(x)=x c(x)=x c(x)=x Makespan = 1
Equilibria and Social Welfare (Mixed) Nash 1/n 1/n 1/n c(x)=x c(x)=x c(x)=x Makespan = (logn/loglogn) > 1 Makespan = (logn/loglogn) > 1 [Koutsoupias, Mavronicolas, Spirakis 02], [Czumaj, V cking 02]
Equilibria and Social Welfare Coarse Correlated Equilibria c(x)=x c(x)=x c(x)=x Makespan = ( n) >> (logn/loglogn) [Blum, Hajiaghayi, Ligett, Roth 08]
Linear Load Balancing Q=makespan Q(worst CCE) = ( n) >> CCE Nash Q(worst Nash)= (logn/loglogn) >> Q(worst pure Nash) Pure Nash Q(OPT) OPT (S)
Linear Load Balancing Q=makespan Price of Total Anarchy = ( n) >> CCE Nash Price of Anarchy = (logn/loglogn) >> Price of Pure Anarchy Pure Nash 1 OPT (S)
Our Hope Natural no-regret algorithms should be able to steer away from worst case equilibria.
The Multiplicative Weights Algorithm (MWA) [Littlestone, Warmuth 94], [Freund, Schapire 99] Pick s with probability proportional to (1- )total(s), where total(s) denotes combined cost in all past periods. Provable performance guarantees against arbitrary opponents No Regret: Against any sequence of opponents play, avg. payoff converges to that of the best fixed option (or better).
(Multiplicative Weights) Algorithm in (Potential) Games (t) is the current state of the system (this is a tuple of randomized strategies, one for each player). Infinite Markov Chains with Infinite States (t+1) (t+1) Each player tosses their coins and a specific outcome is realized. O( ) (t) O( ) Depending on the outcome of these random events, we transition to the next state. (t+1) O( ) (S)
(Multiplicative Weights) Algorithm in (Potential) Games Problem 1: Hard to get intuition about the problem, let alone analyze. Infinite Markov Chains with Infinite States (t+1) Let s try to come up with a discounted version of the problem. (t+1) O( ) (t) O( ) Ideas?? (t+1) O( ) (S)
(Multiplicative Weights) Algorithm in (Potential) Games Idea 1: Analyze expected motion. Infinite Markov Chains with Infinite States (t+1) (t+1) O( ) (t) O( ) (t+1) O( ) (S)
(Multiplicative Weights) Algorithm in (Potential) Games Idea 1: Analyze expected motion. The system evolution is now deterministic. (i.e. there exists a function f, s.t. E[ (t+1)] (t) E[ (t+1)]= f ( (t), ) O( ) I wish to analyze this function (e.g. find fixed points). (S)
(Multiplicative Weights) Algorithm in (Potential) Games Problem 2: The function f is still rather complicated. Idea 2: I wish to analyze the MWA dynamics for small . E[ (t+1)] Use Taylor expansion to find a first order approximation to f. (t) O( ) f ( (t), ) = f ( (t), 0) + f ( (t), 0) + O( 2) (S)
(Multiplicative Weights) Algorithm in (Potential) Games Problem 2: The function f is still rather complicated. Idea 2: I wish to analyze the MWA dynamics for small . E[ (t+1)] Use Taylor expansion to find a first order approximation to f. (t) O( ) f ( (t), ) f ( (t), 0) + f ( (t), 0) (S)
(Multiplicative Weights) Algorithm in (Potential) Games As 0, the equation f ( (t), )-f ( (t), 0) = f ( (t), 0) specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (t) f ( (t), 0) (S)
(Multiplicative Weights) Algorithm in (Potential) Games As 0, the equation f ( (t), )-f ( (t), 0) = f ( (t), 0) specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (t) f ( (t), 0) (S)
Deriving the ODE Taking expectations: Differentiate w.r.t. , take expected value: This is the replicator dynamic studied in evolutionary game theory.
Motivating Example c(x)=x c(x)=x
Motivating Example Each player s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2. Mixed Nash Pure Nash
Motivating Example Each player s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2.
Motivating Example Even in the simplest case of two balls, two bins with linear utility the replicator equation has a nonlinear form.
The potential function The congestion game has a potential function Let =E[ ]. A calculation yields Hence decreases except when every player randomizes over paths of equal expected cost (i.e. is a Lyapunov function of the dynamics). [Monderer-Shapley 96].
Unstable vs. stable fixed points The derivative of is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?
Unstable Nash .5 .5 .5 .5 c(x)=x c(x)=x
Motivating Example .49 .5 .51 .5 c(x)=x c(x)=x
Motivating Example .49 .4 .51 .6 c(x)=x c(x)=x
Motivating Example .35 .4 .65 .6 c(x)=x c(x)=x
Unstable vs. stable fixed points The derivative of is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?