Algorithms and Game Theory Overview
An overview of algorithms, uninformed and informed searches, heuristics, adversarial searches, and game theory. Explore the concepts with examples like the Prisoner's Dilemma and deterministic games in multiagent environments. Learn about the application of game theory in AI and how games are well-suited for AI development.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
269202 ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 12
REVIEW Searching Uninformed Searches Breadth First Depth First Iterative Informed Searches Best First Heuristics Hill Climbing Simulated Annealing
THIS WEEK Adversarial Searches Investigating how to plan ahead in environments where other agents are planning against us. Game Theory
WHO STOLE THE FINAL EXAM? OK, I know one of you stole the final exam You have 3 choices, Admit it , Accuse someone else or Deny knowledge . If you Deny Knowledge you will get a D, unless everyone denies knowledge, when you ll get a B If you Accuse someone you will get a C, or if everyone else accuses the same person you will get a B, unless they admit it when you get a D. If you Admit it you get an F, unless everyone else accuses you, when you get an A.
GAME THEORY That example is similar to the famous Prisoner s Dilemma Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both stay silent, the police can sentence both prisoners to only six months in jail for a minor charge. If each betrays the other, each will receive a two-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what choice the other prisoner will make. So the question this dilemma poses is: What will happen? How will the prisoners act?
GAME THEORY A branch of economics investigates any multiagent environment where one agents actions impact on another agent, whether co-operatively or competitively. Games are deterministic, turn-taking, two-player, zero-sum games with perfect information. In other words a deterministic and fully observable environment with two agents who alternate actions. Utility values at the end of the game are equal and opposite - the winner of chess receives 1 , the loser -1 (or possibly 0 in stalemate).
GAMES Games fit well with A.I. as they are easy to represent and agents are restricted to a small number of actions, the results of the actions are well defined. This is in contrast to physical games such as football - here there are imprecise rules and a wide variety of actions. Computer game manufacturers attempts at producing such games often result in limited and repetitive versions.
GAME SUCCESS AI has had some success at game playing; Machines are better than humans in Othello and checkers. Sometimes they can winand at chess and backgammon. Why not more success? At Go AI only performs amateurly. And at Go The average game of chess has a branching factor of 35, with roughly 50 moves per player - which is 35100or 10154nodes on a search graph! So we have to make a decision, even if it isn t optimal. Now we study Stockfish (for example) to learn how to become better Chess players, and then there is Alphazero
FORMAL DEFINITION Initial State - the initial board position indicating who s move. Successor Function - returns a list of moves and their resulting states. Terminal Test - determines if it is a terminal state, i.e. if the game is over. Utility Function -An objective function giving a numeric value for states - terminal states could be -1, 0 or 1.
SEARCHING From that definition we can create a game tree, which can be searched for optimal moves. Normal searches produce a sequence of actions leading to a terminal win state, but in games the other agent has an impact. Therefore the search needs to find a contingent strategy . The optimal strategy is one which finds an outcome at least as good as any other against an infallible opponent.
GAME TREES X X X Initial Moves (X): X X X X X X XO X OO X X X X X X Move 2 (O): . O OO O O XOXX XO XO X XO XO XO X XO Move 3 (X): . XX X . . . The search tree even for a game as simple as tic tac toe, is v. complex XOX XOX XOX O OX X X Terminal States: OOO X X OO X Utility: -1 0 1
MINIMAX ALGORITHM Assuming the opponent will play optimally, the minimax algorithm is designed to maximise the worst case outcome - in tictactoe this is a draw. A a1 a3 a2 B C D b3 c3 b1 c1 d1 d3 b2 c2 d2 2 3 2 12 6 10 11 7 4 In this simple example - which action should A take (a1, a2 or a3)?
MINIMAX ALGORITHM A a1 a3 a2 2 3 2 B C D b3 c3 b1 c1 d1 d3 b2 c2 d2 2 3 2 12 6 10 11 7 4 Even though the best utility is found with a2 (12), and the best average is found with a3 (7.67), a1 is the choice as we assume that the opponent will choose their optimum strategy.
MULTIPLAYER GAMES In games with more than 2 players, the utility is scored as a vector containing utilities for each player (5,2,3). Minimax functions will choose an action which still leads to their best worst case scenario. In multiplayer games, often alliances are formed, with more than one player joining forces in order to stop another player from reaching a terminal win state.
MINIMAX EFFICIENCY If the maximum depth of the tree is m and there are b legal moves at each point, then time complexity is O(bm) and space complexity is O(bm), i.e. Exponential time efficiency. We can improve the efficiency, for instance through Alpha-Beta Pruning; If a player has a better choice m either at the parent node of n or any node higher up, then n will never be reached, so effectively we can prune it.
PRUNING EXAMPLE A a1 a3 a2 First we examine choice a1 with depth first search, finding the worst case 3 . 3 2 2 B C D b3 c3 b1 c1 d1 d3 b2 c2 d2 3 2 12 6 10 11 2 7 4 A a1 a3 a2 3 B b3 b1 b2 7 4 3
PRUNING EXAMPLE 2 A a1 a3 a2 3 2 Examining choice a2 , we immediately find a worse worst case 2 , so no need to continue searching B C b3 c3 b1 c1 b2 c2 A 3 2 7 4 a1 a3 a2 3 2 2 B C D When we look at choice a3 , we have to search all 3 leaves before we find a worse worst case. b3 c3 b1 c1 d1 d3 b2 c2 d2 3 2 10 11 2 7 4
PRUNING EFFICIENCY Pruning choices in this way has dramatic effects on efficiency. We only need to examine O(bd/2) nodes This is assuming that we can examine successors that are likely to be best first. -You know what ASS U ME means. Even with random selection, the efficiency improves, however, for many games this time inefficiency makes playing the game impractical - a minimax decision for chess used to take longer than the allowed 2 hour game.
IMPERFECT, REAL-TIME DECISIONS Our algorithm needs to be able to make a decision within a specified time period, based on imperfect information. Humans use (sub-consciously) a heuristic utility function to judge the value of a position as humans are even more limited in searching capabilities - so a similar heuristic is needed here I.e. Cut the search off at a non-terminal state.
DESIGNING A HEURISTIC A heuristic needs to comprise various features, weighted by their importance - perhaps using probabilities of success based on previous experiences (in this case 72% of previous instances have lead to success). Human experience can be used to guide the heuristic, for example in chess; Material Value - each piece can be given a value, pawn = 1, queen =8 Pawn Structure - a good pawn layout might be worth more. King Safety - a well protected King might be worth a few points. 2 bishops together are worth a little over 2*1bishop. Bishops are more valuable during the end game than at the start.
SEARCH CUT-OFF Given the sophisticated heuristic function, which can be applied to various states, a cut off point must be chosen so the amount of time used is acceptable. The wrong cut off point could give rise to what is known as the horizon effect - where a poor worst case is just over the horizon of the search.
GAMES WITH CHANCE Some games, such as Backgammon combine chance with skill, where chance is controlled by the roll of a dice. We can t construct a game tree as we don t know what our opponents possible moves are - but we do know what their likely moves are. So the game tree needs to be expanded to include probabilities of the opponent receiving certain dice rolls.
PROBABILITY Probability is based on our percepts of the environment - what we know about the environment. My doctor said golf caused my shoulder injury as soon as he knew I played golf - even though the shoulder injury dates from before I started playing golf. When we pick a card there is a 1/52 chance it is the Ace of Spades, after we look at it, the chance is either 0 or 1. Probabilities can change when more evidence is acquired.
ACTING UNDER UNCERTAINTY Suppose we need to check in at the airport and need to choose a time to set off. Plan A60involves leaving 60 minutes before check-in - we could assume this was a good plan. We can t say Plan A60will get us to the airport in time , only Plan A60 will get us to the airport in time so long as we have enough gas, and there are no accidents, and the car doesn t break down, and check-in doesn t close early, and Perhaps Plan B90would be better?
ACTING UNDER UNCERTAINTY While Plan B90increases the degree of belief that we will get to the airport on time, it also introduces a likely long unproductive wait at the airport. To maximise an agents performance, the relative importance of both goals needs to be considered; getting to the airport avoiding a long wait The rational decision is A60, but how do we draw that conclusion?
UTILITY THEORY Utility Theory represents preferences for certain states - the quality of a state being useful; Is it preferable to choose plan C1440(leaving for the airport 24 hours early) which has a 99.999% probability of success over plan A60, which has a 95% probability of success? Given the poor utility of the long wait, perhaps not. Should I stop playing golf to improve my lack of shoulder pain utility, and risk lowering my playing golf pleasure utility?
DECISION THEORY Decision Theory = Probability Theory + Utility Theory. An agent is rational if they choose the action which yields the highest utility averaged across all possible outcomes of the action. The principle of Maximum Expected Utility (MEU).
DIAGNOSIS Diagnosis is a task aimed at dealing with uncertainty normally using probability theory to construct a degree of belief in various statements. Your car won t start, so there s a 80% chance the battery is dead. You ve got pain in your left calf so there s a 70% chance you ve pulled a muscle, and a 2% chance your leg has fallen off. Probability provides a way of summarising the uncertainty that comes from our laziness and ignorance.
DOMAIN RANDOM VARIABLES Boolean Random Variables Late<True, False> Discrete RandomVariables Weather<Sunny, Rainy, Snowy, Cloudy> Continuous Random Variables Temperature = 30.2
ATOMIC EVENTS Atomic Events are a particular combination of states; Late = True, Weather = Cloudy Late = False, Weather = Sunny The existence of certain atomic events can lead to certain understandings; Late = False, Weather = Rainy, means <Rainy => Late> = False
PRIOR PROBABILITY Unconditional Probabilities can be assigned to each state - degree of belief with no other information; P(Weather = Sunny) = 0.8 P(Weather = Rainy) = 0.1 P(Weather = Cloudy) = 0.0999 P(Weather = Snowy) = 0.0001
JOINT PROBABILITY The probabilities for a combination of random variables can be stored in a grid - 2 or more dimensional. Take a simple example with 3 boolean variables, Late, Male, AM. Late Not Late Male Not Male Male Not Male AM 0.22 0.08 0.25 0.02 not AM 0.1 0.18 0.1 0.05
JOINT PROBABILITY The sum of all probabilities adds up to 1. Suppose we want to know the probability of being male OR late; P(Male OR Late) = 0.22+0.25+0.1+0.18+0.08+0.1 = 0.93 Or the probability of being male AND late; P(Male AND Late) = 0.22+0.1 = 0.32 Thus we can deduce probabilities given certain inputs.
JOINT PROBABILITY Suppose we add a further random variable - the weather. We have to expand our table, in reality adding 4 times the size of the weather, one for each weather condition. In doing this it is reasonable to ask how the former and latter table are related; how does P(Late, AM, Male, Weather = sunny) relate to P(Late, AM, Male)?
PRODUCT RULE We can use the Product Rule The probability of a and b is the same as the probability of b multiplied by the probability of a given that b is the case; P(a^b) = P(a|b)P(b) Or in our case; P(Late, AM, Male, Weather = sunny) = P(Weather = sunny | Late, AM, Male)P(Late, AM, Male) The probability of a man being late on a sunny morning is the same as the probability of it being sunny given that a man is late in the morning, multiplied by the probability of a man being late in the morning.
HANG ON A MINUTE! Let s review that; The probability of a man being late on a sunny morning is the same as the probability of it being sunny given that a man is late in the morning, multiplied by the probability of a man being late in the morning. Unless we are a weathermonger, the probability of it being sunny isn t influenced by a man s morning tardiness!
PRODUCT RULE The Product Rule can be stated in 2 ways; P(a^b) = P(a|b)P(b) P(a^b) = P(b|a)P(a) Or in our case; P(Late, AM, Male, Weather = sunny) = P(Late, AM, Male | Weather = sunny)P(Weather = sunny) The probability of a man being late on a sunny morning is the same as the probability of a man is late in the morning given that it is sunny, multiplied by the probability of it being sunny.
JOINT PROBABILITIES There is intuitively something more satisfactory about this statement - perhaps the weather could be a factor in determining lateness. We could expand our knowledge about the domain by adding further random variables - perhaps adding Transport<Car, Bike, Walk> or FavouriteFood<Icecream, Steak, Fish>. Transport might have a direct influence over tardiness - it could be argued that walkers should set off earlier, but surely FavouriteFood can be considered Independent . When Independence can be found, the subset of data to be analysed can be greatly reduced.
BAYES LAW Given P(a^b) = P(a|b)P(b) P(a^b) = P(b|a)P(a) Then, P(b|a)P(a) = P(a|b)P(b) And so, P(b|a) = P(a|b)P(b) / P(a) Great! - so what?
BAYES LAW Requires 2 unconditional probabilities and 1 conditional probability, to calculate 1 further conditional probability! But, if we have those probabilities, then it can be very useful. Meningitis causes patients to have stiff necks 50% of the time. The probability of having meningitis is 1 in 50,000 and the probability that any patient has a stiff neck is 1 in 20. P(s|m) = 0.5 P(m) = 1/50000 P(s) = 1/20 P(m|s) = P(s|m)P(m)/P(s) = (0.5*(1/50000))/(1/20) = 0.0002 or 1 in 5,000
WELCOME TO WUMPUS WORLD Wumpus World is a cave consisting of rooms connected by passageways. Somewhere in the cave is the evil Wumpus beast who eats anyone who enters the room. An agent can kill the Wumpus though by shooting an arrow - the arrow goes straight until it hits the Wumpus or a wall. Wumpus World also contains deathly pits in the floor which trap anyone who enters the room (except the Wumpus). On the positive side there is a big heap of gold that can be won by a successful agent. In rooms surrounding the Wumpus the agent can perceive a stench, in rooms surrounding a pit the agent perceives a breeze, and in rooms surrounding the gold the agent perceives a glitter. Agents can move through the 4*4 grid by turning left or right (90 degrees) or walking forward.
WUMPUS WORLD PIT Breeze Glitter Stench WUMPUS Breeze Stench Breeze PIT Glitter GOLD Glitter Breeze Stench PIT Breeze Breeze
WUMPUS WORLD OK OK
WUMPUS WORLD OK ?P Breeze ?P
WUMPUS WORLD W! OK Stench Breeze P!
WUMPUS WORLD OK ?G W! OK ?G Glitter Stench Breeze P!
QUESTION? To what extent does Bayes Law affect the probability of locating a pit? Consider the layout to the right, what can you deduce about the probability of where the pit is? Breeze Breeze
SUMMARY Games are an excellent test for A.I. The minimax algorithm assumes the opponent is infallible and chooses the best worst case scenario for each turn. First developing a search tree and then performing depth first search on it. The search tree is pruned by alpha beta pruning, but in many cases it is still too large a search tree - so decisions have to be made when to cut off the search.