Insights into Recent Progress on Sampling Problems in Convex Optimization
Recent research highlights advancements in solving sampling problems in convex optimization, exemplified by works by Yin Tat Lee and Santosh Vempala. The complexity of convex problems, such as the Minimum Cost Flow Problem and Submodular Minimization, are being unraveled through innovative formulas and algorithms. The interplay between algorithmic convex geometry and integration, membership, separation, width, optimization, and sampling operations is paving the way towards efficient solutions and a deeper understanding of convex optimization theory. Furthermore, the significance of sampling in optimization, integration, learning, and rounding is underscored, showcasing its pivotal role in modern computational paradigms.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Recent Progress On Sampling Problem Recent Progress On Sampling Problem Yin Tat Lee (MSR/UW), Santosh Vempala (Gatech)
My Dream Tell the complexity of a convex problem by looking at the formula. Example Minimum Cost Flow Problem: This is a linear program, each row has two non-zero. It can be solved in ?(? ?). [LS14] (Previous: ?(? ?) for graph with ? edges and ? vertices.)
My Dream Tell the complexity of a convex problem by looking at the formula. Example Fundamental in combinatorial optimization. Submodular Minimization: Worth 2 Fulkerson prizes where ? satisfies diminishing return, i.e. ? ? + ? ? ? ? ? + ? ? ? ? ?,? ?. ? can be extended to a convex function on 0,1?. subgradient of ? can be computed in ?2time. It can be solved in ?(?3). [LSW15] (Previous: ?(?5))
Algorithmic Convex Geometry To describe a formula, we need some operations. Given a convex set ?, we have following operations Membership(x): Check if ? ?. Separation(x): Assert ? ?, or find a hyperplane separate ? and ?. Width(c): Compute min ? ????. Optimize(c): Compute argmin ???. ? ? Sample(g): Sample according to ? ? 1?. (assume ? is logconcave) Integrate(g): Compute ?? ? ??. (assume ? is logconcave) Theorem: They are all equivalent by polynomial time algorithms. One of the Major Source of Polynomial Time Algorithms!
Traditionally viewed as impractical. Now, we have an efficient version of ellipsoid method. Algorithmic Convex Geometry Why those operations? For any convex ?, define the dual ? ? = argmax???? ?(?), and ??= 1??. Progress: We are getting the tight polynomial equivalence between left 4. Integration Membership ??(?) ? ? ?? Width ?? ? (?) Separation ???(?) Sample ~? ?1? Optimization ??? (?) Convex Optimization
Problem: Sampling Problem: Sampling Input: a convex set ?. Output: sample a point from the uniform distribution on K. Generalized Problem: Input: a logconcave distribution ? Output: sample a point according to ?. Why? useful for optimization, integration/counting, learning, rounding. Best way to minimize convex function with noisy value oracle. Only way to compute volume of convex set.
Non-trivial application: Convex Bandit Game: For each round ? = 1,2, ,?, the player S bastien Bubeck Ronen Eldan Adversary selects a convex loss function ? Chooses (possibly randomly) ?? from unit ball in n dim based on past observations. Receives the loss/observation ??? [0,1]. Nothing else about ? is revealed! Measure performance by regret: There is a good fixed action, but The gold of standard is getting ?( ?). Namely, ?1000? is better than ? ?2/3. We only learn one point each iteration! Adversary can give confusing information!
Non-trivial application: Convex Bandit Game: For each round ? = 1,2, ,?, the player S bastien Bubeck Ronen Eldan Adversary selects a convex loss function ? Chooses (possibly randomly) ?? from unit ball in n dim based on past observations. Receives the loss/observation ??? [0,1]. Nothing else about ? is revealed! Measure performance by regret: After a decade of research, we have The gold of standard is getting ?( ?). Namely, ?1000? is better than ? ?2/3. ??= ?10.5?. (The first polynomial time and regret algorithm.)
How to Input the set How to Input the set Oracle Setting: A membership oracle: answer YES/NO to ? ? . A ball ? + ?? such that ? + ?? ? ? + poly ? ??. Explicit Setting: Given explicitly, such as polytopes, spectrahedrons, In this talk, we focus on polytope {?? ?}. (m = # constraints)
Outline Outline Oracle Setting: Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: (original promised talk) Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
Sampling Problem Input: a convex set ? with a membership oracle Output: sample a point from the uniform distribution on K. Conjectured Lower Bound: ?2. Generalized Problem: Given a logconcave distribution ?, sampled ? from ?.
Conjectured Optimal Algorithm: Ball Walk At ?, pick random ? from ? + ???, if ? is in ?, go to ?. otherwise, sample again This walk may get trapped on one side if the set is not convex.
Isoperimetric constant For any set ?, we define the isoperimetric constant ?? by Area(??) ??= min min(vol ? ,vol ??) ? ? large, hard to cut the set Theorem Given a random point in ?, we can generate another in ? ?( 2log(1/?)) ?2?? iterations of Ball Walk where ? is step size. ? small, easy to cut the set ?? or ? larger, mix better. ? cannot be too large, otherwise, fail probability is ~1.
Isoperimetric constant of Convex Set Note that ?? is not affine invariant and can be arbitrary small. L 1 ??= 1/?. However, you can renormalize ? such that Cov ? = ?. Definition: ? is isotropic, if it is mean 0 and Cov ? = ?. Theorem: If ? <0.001 ?, ball walk stays inside the set with constant probability. Theorem: Given a random point in isotropic ?, we can generate another in ?(?2 2log(1/?)) ?? To make body isotropic, we can sample the body to compute covariance.
KLS Conjecture Kannan-Lov sz-Simonovits Conjecture: For any isotropic convex ?, ??= (1). If this is true, Ball Walk takes O(?2) iter for isotropic ? (Matched the believed information theoretical lower bound.) To get the tight reduction from membership to sampling, it suffices to prove KLS conjecture
KLS conjecture and its related conjectures KLS conjecture and its related conjectures Slicing Conjecture: Any unit volume convex set ? has a slice with volume (1). Thin-Shell Conjecture: For isotropic convex ?, ?( ? ?2) = ?(1). Generalized Levy concentration: For logconcave distribution ?, 1-Lipschitz ? with ?? = 0, |? ? ??| > ? = exp( ? ). Essentially, it is asking if all convex sets looks like ellipsoids.
What if we cut the body by sphere only? Main Result [Lovasz-Simonovits 93]? = 1 ? 1/2. ? ?? ?2 ?? ??? [Klartag 2006] ? = 1 ? 1/2log1/2?. [Fleury, Guedon, Paouris 2006] ? = 1 ? 1/2log1/6?log 2log?. [Klartag 2006] ? = (1)? 0.4. [Fleury 2010] ? = (1)? 0.375. [Guedon, Milman 2010] ? = (1)? 0.333. Do you know better way to bound mixing time of ball walk? [Eldan 2012] ? = 1 ? = (1)? 0.333. [Lee Vempala 2016] ? = 1 ? 0.25. In particular, we have ?(?2.5) mixing for ball walk.
Outline Outline Oracle Setting: Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
Problem: Sampling Problem: Sampling Input: a polytope with ? constraints and ? variables. Output: sample a point from the uniform distribution on K. {?? ?} Polytopes KN09 Dikin walk LV16 Ball walk LV16 Geodesic walk First sub-quadratic algorithm. Iterations ?? ?2.5 ??0.75 Time/Iter ??1.38 ?? ??1.38 Cost of matrix inversion
How does nature mix particles? How does nature mix particles? Brownian Motion. It works for sampling on ?. However, convex set has boundary . Option 1) Reflect it when you hit the boundary. However, it need tiny step for discretization.
How does the nature mixes particle? How does the nature mixes particle? Brownian Motion. It works for sampling on ?. However, convex set has boundary . Option 2) Remove the boundary by blowing up. However, this requires explicit polytopes.
Blowing Up? Blowing Up? Non-Uniform Distribution on Real The distortion makes the hard constraint becomes soft . Real Line After blow up Original Polytope Uniform Distribution on [0,1]
Enter Riemannian manifolds Enter Riemannian manifolds ?-dimensional manifold M is an ?-dimensional surface. Each point ? has a tangent space ??? of dimension ?, the local linear approximation of M at ?; tangents of curves in ? lie in ???. The inner product in ??? depends on ?: ?,?? Informally, you can think it is like assigning an unit ball for every point
Enter Riemannian manifolds Enter Riemannian manifolds Each point ? has a linear tangent space ???. The inner product in ??? depends on ?: ?,?? Length of a curve ?: 0,1 ? is 1 ? ??? ? ? ? = ?? 0 ?(?) Distance ?(?,?) is the infimum over all paths in M between x and y.
Generalized Ball Walk Generalized Ball Walk At x, pick random y from ?? where ??= {?:? ?,? 1}.
Hessian manifold Hessian manifold Hessian manifold: a subset of ? with inner product defined by ?,??= ???2? ? ?. ?? ?? ? , we use the log barrier function ? 1 ???) For polytope ?? ? ? = log( ?=1 ?? ?? is the distance from ? to constraint ? ??? = ?? ? blows up when ? close to boundary Our walk is slower when it is close to boundary.
Suggested algorithm Suggested algorithm At x, pick random y from ?? where ??= {?:? ?,? 1} is induced by log barrier. Doesn t work! (Called Dikin Ellipsoid) random walk on real line Corresponding Hessian Manifold Original Polytope Converges to the boundary since the volume of boundary is + .
Getting Uniform Distribution Getting Uniform Distribution Lemma If ? ? ? = ?(? ?), then stationary distribution is uniform. To make a Markov chain ? symmetric, we use ? ? ? = min ? ? ? ,? ? ? ?? ? ? . ?? ? = ? To implement it, we sample ? according to ?(? ?) if ? ? ? < ? ? ? , go to ?. Else, we go to ? with probability ? ? ? /?(? ?); Stay at x otherwise.
Dikin Walk Dikin Walk At x, pick random y from ?? if ? ??, reject ? else, accept ? with probability min(1,vol ?? [Copied from KN09] vol ??). [KN09] proved it takes ?(??) steps. Better than the previous best ? ?2.5 for oracle setting.
At x, pick random y from ?? if ? ??, reject ? else, accept ? with probability min(1,vol ?? Dikin Walk and its Limitation Dikin Walk and its Limitation vol ??). Dikin ellipsoid is fully contained in ?. Idea: Pick next step y from a blown-up Dikin ellipsoid. Can afford to blow up by ~ ?/log? . WHP ? ?. In high dimension, volume of ?? is not that smooth. (Worst case 0,1?) Any larger step makes the success probability exponentially small! 0,1? is the worst case for ball walk, hit-and-run, Dikin walk .
At x, pick random y from ?? if ? ??, reject ? else, accept ? with probability min(1,vol ?? Going back to Brownian Motion Going back to Brownian Motion vol ??). The walk is not symmetric in the space . Tendency of going to center. Corresponding Hessian Manifold Original Polytope Taking step size to 0, Dikin walk becomes a stochastic differential equation: ???= ? ???? + ? ????? where ? ?? = ? ?? 1/2 and ?(??) is the drift towards center.
What is the drift? Fokker What is the drift? Fokker- -Planck equation Planck equation The probability distribution of the SDE ???= ? ???? + ? ????? is given by ?2 ??2?2? ? ?,? . ?? ?? ?,? = ? +1 ? ? ? ?,? ?? 2 To make the stationary distribution constant, we need ? ?? Hence, we have ? ? = ?? . ?2 ??2?2? +1 ? ? = 0 2
A New Walk A New Walk A new walk: ??+ = ??+ ? ?? + ? ??? with ?~?(0, ?). It doesn t make sense.
Exponential map Exponential map Exponential map exp?:??? ?is defined as exp?(?) = ??(1), ??: unique geodesic (shortest path) from p with initial velocity ?.
Geodesic Walk Geodesic Walk A new walk: ??+ = exp??( /2 ? ?? + ? ???) with ?~?(0, ?). Anyway to avoid using filter? However, this walk has discretization error. So, we do a metropolis filter after. Since our walk is complicated, the filter is super complicated.
Outline Outline Oracle Setting: Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: (original promised talk) Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
Geodesic Walk Geodesic Walk A new walk: ??+ = exp??( /2 ? ?? + ?) with ?~?(0, ?). Geodesic is better than straight line : 1) It extends infinitely. 2) It gives a massive cancellation.
Key Lemma 1: Provable Long Geodesic Key Lemma 1: Provable Long Geodesic Straight line defines finitely; Geodesic defines infinitely. Thm [LV16]: For manifold induced by log barrier, a random geodesic ? starting from ? satisfies ?(? 1 Namely, the geodesic is well behavior for a long time. ?? ?) for 0 ? ?(?1/4). ?? ? ?? 4)(?? Remark: If central path in IPM had this, we have a ?5/4 time algorithm for MaxFlow!
Key Lemma 2: Massive Cancellation Key Lemma 2: Massive Cancellation Consider a SDE on 1 dimensional real line (NOT manifold) ???= ? ???? + ? ?????. How good is the Euler method , namely ?0+ ? ?0 + ? ?0?? By Taylor expansions, we have ? = ?0+ ? ?0 + ? ?0? + If ? ?0 0, the error is ?( ). If ? ?0 = 0, the error is ?( 1.5). For geodesic walk, ? ?0 = 0 (Christoffel symbols vanish in normal coordinates) 2? ?0? ?0 ?2 1 + ? 1.5.
Convergence Theorem Convergence Theorem Thm [LV16]: For log barrier, the geodesic walk mixes in ? ??0.75 steps. Thm [LV16]: For log barrier on 0,1?, it mixes in ?(?1/3) steps. The best bound for ball-walk, hit-and-run and Dikin walk is ?(?2) steps for 0,1?. Is high order method for SDE used in MCMC? Our walk is similar to Milstein method.
Outline Outline Oracle Setting: Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: (original promised talk) Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
Can we simply do Taylor expansion? How to implement the algorithm How to implement the algorithm In high dim, it may take ?? time to compute the ?? derivatives. In tangent plane at x, 1. pick ? ??(0,?), i.e. standard Gassian in . ? 2? ? + ? 3. Accept with probability Min 1,? ? ? 2. Compute ? = exp? ? ? ? How to compute geodesic and rejection probability? Need high accuracy for rejection probability due to directedness . Geodesic is given by geodesic equation; probability is given by Jacobi field.
Collocation Method for ODE Collocation Method for ODE A weakly polynomial time algorithm for some ODEs A weakly polynomial time algorithm for some ODEs Consider the ODE ? = ? ?,?(?) with ? 0 = ?0. Given a degree ? poly ? and distinct points ?1,?2, ,??, let ?(?) be the unique degree ? poly ? s.t. ? ? = ?(?,?(?)) on ? = ?1,?2, ,?? p ? 0 = ?(0). Lem [LV16]: ? is well defined. If ?? are Chebyshev points on [0,1], then ??? ? = ? ??? ? . Thm [LV16]: If ??? ? 0.001, we can find a fix point of ? efficiently.
Collocation Method for ODE Collocation Method for ODE A weakly polynomial time algorithm for some ODEs A weakly polynomial time algorithm for some ODEs Consider the ODE ? = ? ?,?(?) with ? 0 = ?0. Thm [LV16]: Suppose that ??? ? 0.001 There is a degree ? poly ? such that ? ? ?. Then, we can find a ? such that ? ? 1 ?(?log2?? 1) with ?(?log ?? 1) evaluations of ?. Remark: No need to compute ? ! In general, the runtime is ?(??Lip?(1)(?)) instead. = ?(?) in time
How can I bound the How can I bound the ????? derivatives? derivatives? For 1 variable function, we can estimate ?? derivatives easily. Idea: reduce estimating derivatives of general functions to 1 variable. In general, we write ? ?? D??(?) ??0 . Calculus rule: ? ?? and ? ?(?)?, then ? ? ?? (? ? 0 ).
Implementation Theorem Implementation Theorem Using the trick before, we show geodesic can be approximated by ?(1) degree poly. Hence, collocation method finds in ?(1) steps. Thm [LV16]: If ? 1/2, 1 step of Geodesic walk can be implemented in matrix multiplication time. For hypercube, ?(1) suffices.
Questions Questions We have no background on numerical ODE/SDE and RG. So, the running time should be improvable easily. How to avoid the filtering step? Is there way to tell a walk mixed or not? (i.e. even if we cannot prove KLS, the algorithm can still stop early.) Is higher order method in SDE useful in MCMC? Any other suggestion/heuristic for sampling on convex set?