Algorithmic Game Theory Learning in Games by Viliam Lis

ý

Artificial Intelligence Center

Department of Computer Science, Faculty of Electrical Engineering

Czech Technical University in Prague

(Apr

15, 2019)

Online learning and prediction

single agent learns to select the best action

Learning in extensive form games

generalizing these ideas to sequential games

DeepStack

ý

Artificial Intelligence Center

Department of Computer Science, Faculty of Electrical Engineering

Czech Technical University in Prague

(Apr 15, 2019)

How may simple learning agents achieve equilibrium outcomes?

Best Response Dynamics (Fictitious play)

best response to average empirical play

needs to know the game

No-Regret Dynamics

each player uses no-regret algorithm

may know only their own actions and received rewards

Pure Nash equilibria – converges even play

   - does not have to converge

constant-sum games

two player games where each player has up to two actions

games solvable by iterated strict dominance

identical interest games

potential games

Proof: Polynomial black-box reduction.

–

–

–

–

Asu Ozdaglar. 6.254 : Game Theory with Engineering

Applications. Lecture 11: Learning in Games. March 11, 2010.

Brandt, Felix, Felix Fischer, and Paul Harrenstein. "On the rate of

convergence of fictitious play." International Symposium on

Algorithmic Game Theory. Springer Berlin Heidelberg, 2010.

T. Roughgarden, “Lecture Notes: Algorithmic Game Theory,”

tech. rep., Stanford, 2013.

Slide Note

Embed Share

Download Presentation

The content discusses the concept of algorithmic game theory learning in games, covering topics such as online learning, prediction, best response dynamics, and convergence to Nash equilibrium. It explores how simple learning agents achieve equilibrium outcomes and the application of algorithms in various game scenarios.

jullie Follow

Uploaded on Sep 25, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Algorithmic Game Theory Learning in Games Viliam Lis Artificial Intelligence Center Department of Computer Science, Faculty of Electrical Engineering Czech Technical University in Prague (Apr 15, 2019)

Plan Online learning and prediction single agent learns to select the best action Learning in normal form games the same algorithms used by multiple agents Learning in extensive form games generalizing these ideas to sequential games DeepStack 2

Algorithmic Game Theory Learning in Normal Form Games Viliam Lis Artificial Intelligence Center Department of Computer Science, Faculty of Electrical Engineering Czech Technical University in Prague (Apr 15, 2019)

Introduction How may simple learning agents achieve equilibrium outcomes? Best Response Dynamics (Fictitious play) best response to average empirical play needs to know the game No-Regret Dynamics each player uses no-regret algorithm may know only their own actions and received rewards 4

Best response dynamics Fictitious play Players maintain empirical distribution of past opponent s actions ? ? ?=1 ?) (often in form of frequencies ?? ? ? T ?=1 ? ? In each round, each player plays a BR to these distributions ?= arg max ?) ?? ?? ????(??, ? ? 5

Result of FP in case of convergence Theorem: If the empirical action frequencies of fictitious play converge ( ?? ? ) they converge to the Nash equilibrium of the game. Proof: For contradiction assume ? is not a NE. ??? (??) > 0, Then there exists player ? and actions ??,?? such that ???? > ????,? ? Choose ?, such that 0 < ? <1 ,? ? . ,? ? ). 2(???? ????,? ? Since ( ?? ? ), there is ?,such that for all ? > ? ? ? ? ? | ? ? For all ? > ?, we have ????, ? ? ? ?????,? ? ? ? ? ???? ?,? ? ? ? ? ? =??? ?, ? ? ?(? ?) ? ? (? ?)| < ?. ? ? = ? ?????,? ? ? ? ? ? ? ? + ? < ? ???? ?,? ? ? ? ? ? ? ? ? . Hence ?? is not played after T, which contradicts ? (??) > 0. 7

Convergence of FP Theorem: The empirical frequencies of FP converge to NE in constant-sum games two player games where each player has up to two actions games solvable by iterated strict dominance identical interest games potential games 8

Why it may not converge? Shapley s example in a modified rock-paper-scissors: R S P R 0, 0 1, 0 0, 1 S 0, 1 0, 0 1, 0 P 1, 0 0, 1 0, 0 Unique NE is the uniform strategy for both players. Let ?1 0= (1,0,0) and ?2 0= 0,1,0 . Play may be (P,R),(P,R) for ? steps until column switches to S. Then (P,S) follows until row switches to R (for ?? steps, ? > 1). Then (R,S) follows until column switches to P (for ?2? steps). The play cycles among all 6 non-diagonal profiles with periods of ever- increasing length, hence, the empirical frequencies cannot converge. 9

Convergence of FP Theorem (Brandt, Fischer, Harrenstein, 2010): In symmetric two-player constant-sum games, FP may require exponentially many rounds (in the size of the representation of the game) before an equilibrium action is eventually played. This holds even for games solvable via iterated strict dominance. a b c Proof: a 0 -1 -? b 1 0 -? c 0 ? ? With ? = 2 ?, FP may take 2? rounds to play the equilibrium action ? for the first time. (a,a),(b,b), ,(b,b) 2? 1 times 11

NO-REGRET DYNAMICS 12

No-Regret Learning Summary Immediate regret at time ? for not choosing action ? ??? = ??? ?? ?? Cumulative external regret for playing ?0,?1 ?? ? ? ? ??= ???? ? ?=0 ??(?) = ???? ? ?=0 ??(?) ?=0 ?? ?? Average external regret for playing ?0,?1 ?? ??= 1 ??? An algorithm has no regret if for any ?0,?1 ??produces ?0,?1 ?? such that ?? 0 as ? . 13

From External to Swap Regret Cumulative swap regret for playing ?0,?1 ?? ? ??= ????:? ? ?=0 ? ???? (???(?) ??(?)) 14

From External to Swap Regret Theorem (Blum & Mansour 2007):If there is a no-external-regret algorithm for a setting, there is also a no-swap-regret algorithm. Proof: Polynomial black-box reduction. ?? ?? ?? ?? 15

From External to Swap Regret Proof: Average expected reward of the overall algorithm ? ? 1 ? ?=1 1, ,?? ??? ??(?) ?=1 ? and gets ?1? ?1, ,??? ??. Thus Algorithm ?? choses ?? ? ? ? ? ?:1 ?? (??? ??? ) 1 ??? ??? ?j ?? ? ?=1 ? ?=1 ?=1 Fix an arbitrary ?:? ? and sum over all ? ?: ? ? ? ?? ??? ??? ? ? ? 1 ? ?=1 1 ??? ??? ? ?? ?? ? ?=1 ?=1 ?=1 ?=1 ?=1 16

From External to Swap Regret We are done if we ensure ? ?? ??(?) ??? = ?? ?=1 ? for ? = 1. This is true if ?? is the eigenvector of matrix given by ?? Equivalently, ?? are the stationary distribution of Markov chain. Such vector ?? always exists and can be easily found! 17

From External to Swap Regret Corollary: Let ??? 0 be the external regret convergence bound for a base algorithm used in the black-box reduction with ? actions. Than the swap regret of the overall algorithm is ???? ? ??? . Corollary: The black-box reduction with Hedge for all actions produces an algorithm with ???? 2 ? ln|?|/?. 18

No-Regret Dynamics full information Definition: ? using 1) Each player ? choses independently a mixed strategy ?? a no-regret algorithm. 2) Each player receives for all ?? ?? rewards ?? ??? = ?? ?~? ?[? ??,? ?] 19

No-Regret Dynamics full information Theorem: If after T iterations of no-regret dynamics each player has external regret lower then ? than ? =1 ?=1 ?? for any ?? ??~???? ??~????? ???, where ??= ? ? ? ?, is an ?-coarse correlated equilibrium of the game. I.e., ?? ,? ? ? Corollary: If we run Hedge in a game with less than |?| actions for each player for ?iterations, the resulting average strategy is an ( ??(|?|)/?)-coarse correlated equilibrium of the game. Corollary: If we run regret matching+ in a game with less than |?| actions for each player for ?iterations, the resulting average strategy is an ( |?|/?)-coarse correlated equilibrium of the game. 20

Minimax Theorem Note: In zero-sum games, coarse correlated equilibria are Nash. Theorem (Minimax Theorem): For any matrix game ? ???? = min ???? max ? min ? max ? ? Proof: For contradiction assume that for some ? > 0 ???? < min ???? ? . max ? min ? max ? ? Set ? =? Let ?, ? be the empirical frequencies of their play and ? the average reward of the maximizer. max ? ? ? 2 and let both players run Hedge for time ? = 2ln?/?2. ???? min ???? ? ? max ??? ? 2? min ???? ? min max ? ? ? 21

No-Regret Dynamics Theorem: If after T iterations of no-regret dynamics each player has swap regret lower then ? than ? =1 ?=1 ?? player ? and switching function ?:? ? ??~???? ??~????(??),? ? ???, where ??= ? ? ? ?, is an ?-correlated equilibrium of the game. I.e., for any ? 22

No-Regret Dynamics bandit case Definition: ? using 1) Each player ? choses independently a mixed strategy ?? a no-regret algorithm and independently samples ??~?? ?. ??? = ? ??,? ? 2) Each player receives single reward ?? 23

No-Regret Dynamics bandit case Theorem: For any ? 0,1 there are parameters for Exp3.P, such that if both players use Exp3.P to choose their actions for ? time steps then ? =1 ? ? correlated equilibrium of the game with probability at least ? and ? ?, is an ?-coarse ???, where ??= ?=1 ?? ? ? ln ? ? = 5.15 1 ?. Proof sketch: It is enough to run Exp3.P for long enough so that both players have regret lower then ? at once with high probability. It can be achieved by using Exp3.P convergence bound with ? = 1 ?. 24

References Asu Ozdaglar. 6.254 : Game Theory with Engineering Applications. Lecture 11: Learning in Games. March 11, 2010. Brandt, Felix, Felix Fischer, and Paul Harrenstein. "On the rate of convergence of fictitious play." International Symposium on Algorithmic Game Theory. Springer Berlin Heidelberg, 2010. T. Roughgarden, Lecture Notes: Algorithmic Game Theory, tech. rep., Stanford, 2013. 26

Algorithmic Game Theory Learning in Games by Viliam Lis

Download Presentation

Presentation Transcript

Related

More Related Content