
Understanding Markov Chains: Properties and Applications
Explore the concept of Markov chains through discussions on reversible and symmetric chains, along with practical applications like sampling from large spaces. Learn about stationary distributions, random walks on graphs, and more in this informative guide.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Introduction to Markov chains (part 2) Haim Kaplan and Uri Zwick Algorithms in Action Tel Aviv University Last updated: April 18 2016 1
Reversible Markov chain A distribution ? is reversible for a Markov chain if ?,? ?????= ????? (detailed balance) A Markov chain is reversible if it has a reversible distribution Lemma: A reversible distribution is a stationary distribution Proof: ?11 ?21 ?31 ?41 ?12 ?22 ?32 ?42 ?13 ?23 ?33 ?43 ?14 ?24 ?34 ?44 ?1,?2,?3,?4 2
Reversible Markov chain ?11 ?21 ?31 ?41 ?12 ?22 ?32 ?42 ?13 ?23 ?33 ?43 ?14 ?24 ?34 ?44 ?1,?2,?3,?4 = ?1?11+ ?2?21+ ?3?31+ ?4?41, , , = ?11?1+ ?12?1+ ?13?1+ ?14?1, , , = ?1(?11+ ?12+ ?13+ ?14), , , = (?1, , , ) 3
Symmetric Markov chain A Markov chain is symmetric if ???= ??? What is the stationary distribution of an irreducible symmetric Markov chain ? 4
Example: Random walk on a graph Given a connected undirected graph ?, define a Markov chain whose states are the vertices of the graph. We move from a vertex ? to one of its neighbors with equal probability 1/3 ? ? ?1 ?1 1/3 1/3 ?2 ?2 ?3 ?3 ?1 2?, ?2 2?, ,?? Consider ? = 2? 5
Example: Random walk on a graph 1/3 ? ? ?1 ?1 1/3 1/3 ?2 ?2 ?3 ?3 ?1 2?, ?2 2?, ,?? Consider ? = 2? ?? 2?= ?? 2? 1 ?? =1 1 ?????= ????? ?? 2? Where do we use the fact that the graph is undirected ? 6
Reversible Markov chain ?[?0= ?0, ?1= ?1, ,??= ??] = ?[?0= ??, ?1= ?? 1, ,??= ?0] = If ?0is drawn from ? Prove as an exercise 7
Another major application of Markov chains 8
Sampling from large spaces Given a distribution ? on a set ?, we want to draw an object from ? with the distribution ? Say we want to estimate the average size of an independent set in a graph Suppose we could draw an independent set uniformly at random Then we can draw multiple times and use the average size of the independents sets we drew as an estimate Useful also for approximate counting
Markov chain Monte carlo Given a distribution ? on a set ?, we want to draw an object from ? with the distribution ? Build a Markov chain whose stationary distribution is ? Run the chain for sufficiently long time (until it mixes) from some starting position ? Your position is a random draw from a distribution close to ?, its distribution is ???~? 10
Independent sets Say we are given a graph ? and we want to sample an independent set uniformly at random
Independent sets Transitions: Pick a vertex ? uniformly at random, flip a coin. Heads switch to ? ? if ? ? is an independent set Tails switch to ? ? 1 2? This chain is irreducible and aperiodic (why?)
Independent sets Transitions: Pick a vertex ? uniformly at random, flip a coin. Heads switch to ? ? if ? ? is an independent set Tails switch to ? ? 1 2? What is the stationary distribution ?
Independent sets So if we walk sufficiently long time on this chain we have an independent set almost uniformly at random Lets generalize this 14
Gibbs samplers We have a distribution ? over functions f:? ? = {1,2, ,5} There are |?||?|? s (states) ? 1 4 1 1 4 5 5 1 4 Want to sample from ?
Gibbs samplers Chain: At state ?, pick a vertex ? uniformly at random. There are |?| states ?? 1, ,?? ?in which ? ? is kept fixed (?? ?is ? with ? assigned to ?). Pick ?? ?with probability ??(?? ?) ?(?? ?) ? ?? ?? ?. 1 ????? 1 =1 ?(?? 1) ? ?? ?? ? 1 ? 4 ? 1 1 1 1 5 4 ? 5 1 1 1 4 4 5 5 1 4
Gibbs samplers Claim: This chain is reversible with respect to ? Need to verify: ?,? ? ? ??? = ?(? )?? ? ??? = 0 iff ?? ?= 0 Otherwise ? = ?? ?and ? = ?? ? We need to verify that: 1 ???(?? ?) = ?(?? ?)1 ? ?? ? ???(?? ?)
Gibbs samplers ? ?? ? ? ?? ?? ? 1 ? 1 ? ? ?? ? ? ?? ?? ? ? ?? ? = ? ?? ? Easy to check that the chain is aperiodic, so if it is also irreducible then we can use it for sampling
Gibbs for uniform q-coloring Transitions: Pick a vertex ? uniformly at random, pick a (new) color for ? uniformly at random from the set of colors not attained by a neighbor of ? ? = 5 1 4? 19
Gibbs for uniform q-coloring Notice that ? ? is hard to compute but ???? ? is easy ? = 5 1 4? 20
Gibbs samplers (summary) Chain: At state ?, pick a vertex ? uniformly at random. There are |?| states ?? 1, ,?? ?consistent with ? ? (?? ?is ? with ? assigned to ?). Pick ?? ?with probability ?(?? ?) ? ?? ?? ?. Call this distribution ?? Notice that even if ? ? may be hard to compute it is typically easy to compute ???? ? = ?(?? ?) ? ?? ?? ?
Metropolis chain Want to construct a chain over ?1,?2, ,??with a stationary distribution ? States do not necessarily correspond to labelings of the vertices of a graph 22
Metropolis chain Start with some chain over ?1,?2, ,?? Say ???= ???(symmetric) ? Need that ???is easy to compute when at ? ? 23
Metropolis chain We now modify the chain and obtain a Metropolis chain: At ??: 1) Suggest a neighbor ??with probability ??? 2) Move to ??with probability min (otherwise stay at ??) ?? ??,1 24
Metropolis chain ?? ?? ?? ?? ???min ,1 ? 1 ???min ,1 ? ? 25
A more general presentation ? is not symmetric, but ???> 0 ???> 0 The metropolis chain with respect to ?: At ??: 1) Suggest a neighbor ??with probability ??? 2) Move to ??with probability min (otherwise stay at ??) ????? ?????,1 26
A more general presentation ????? ????? ????? ????? ???min ,1 ? 1 ???min ,1 ? ? 27
Detailed balance conditions ????? ????? ????? ????? ?????min ,1 = ?????min ,1 ????? ????? 1 Assume ????? ????? ????? = ????? Other case is symmetric 28
Metropolis/Gibbs Often ? ?? =? ?? Then it is possible to compute the transition probabilities in the Gibbs and Metropolis chains where ? = ??(??) ? 29
Metropolis chain for bisection ? ? = ?, ? = ?,? ? ?,? ? + ? ? ? ? We introduce a parameter ? and take the exponent of this quality measure ??? = ? ? ? ? Our target distribution is proportional to ?? 31
Boltzmann distribution 1 ? ? ? ??? = ? ?? ?? ? ? ??= ? Generate a metropolis chain for ???
Boltzmann distribution ? ? ? ? 0.5
The base chain Consider the chain over the cuts in the graph where the neighbors of a cut (?,?) are the cuts we can obtain from (?,?) by flipping the side of a single vertex (? {?},? {?}) 1? (?,?) 1 ? Symmetric ???= ???=
Metropolis chain for bisection At ??: 1) Suggest a neighbor ??with probability 1 ? ???? ????,1 2) Move to ??with probability min (otherwise stay at ??) ? ?? ? ? ?? ? ?? ???? ???? 1 ??? ???? = = ? ? f s = ?, ? ?,? ? ?,? ? 2 = + ? ? ?
Properties of the Boltzmann distribution Let ? = {?1,?2, ,??} the global minima, ? ?? = ? ? ? ?? Z? = ?? ? ? ? ? ??? = Z? ?=1 ?? ? ? ??= ? ? ? ? ??? = ? ?? ? ? ?
Properties of the Boltzmann distribution e ? ? ??? = ? ?? ?(?) ? ? ??? = ? ? ? ? ? + ? ? ? >?? lim ? 0??(?) = 1
Properties of the Boltzmann distribution As ? gets smaller ? get concentrated on the global minima
Generalization of local search This is a generalization of local search Allows non improving moves We take a non-improving move with probability that decreases with the amount of degradation in the quality of the bisection 40
Generalization of local search As ? decreases it is harder to take non- improving moves For very small ?, this is like local search For very large ?, this is like random walk So which ? should we use ? 41
Simulated annealing Start with a relatively large ? Perform ? iterations Decrease ? 42
Motivated by physics Growing crystals First we melt the raw material Then we start cooling it Need to cool carefully/slowly in order to get a good crystal We want to bring the crystal into a state with lowest possible energy Don t want to get stuck in a local optimum
Experiments with annealing Average running times: Annealing 6 min Local search 1 sec KL 3.7 sec
The annealing parameters Two parameters control the range of temperature considered: ????????: Pick the initial temperature so that you accept ???????? of the moves ??????????: You freeze when you accept at most ?????????? at 5 temperatures since the last winner found 46
???????? = 0.9, ?????????? = 0.1
Tails of 2 runs Left: ???????? = 0.4, ?????????? = 0.2 Right: ???????? = 0.9, ?????????? = 0.1 Same quality for half the time !
Running time/quality tradeoff Two natural parameters control this: ? and ? ? was set to be ?????????? (#???? ????) = 16? ? = 0.95 Doubling ?????????? doubles the running time Changing ? ? should double the running time (experiment shows that it grows only by a factor of 1.85)