Unconstrained Minimization in Convex Optimization

1 / 23

Embed Share

Explore Chapter 9 on Unconstrained Minimization in Convex Optimization, covering topics such as Taylor's Expansion, Descent Methods, Newton Method, and necessary conditions for optimality. Dive into scalar cases, examples, and bounds in vector cases.

jveron Follow

Uploaded on Apr 03, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

CSE203B Convex Optimization: Chapter 9: Unconstrained Minimization CK Cheng Dept. of Computer Science and Engineering University of California, San Diego 1

Chapter 9 Unconstrained Minimization Introduction Taylor s Expansion & Bounds Descent Methods Newton Method Summary 2

Introduction Problem: min ?(?) where ?:?? ? is convex and twice continuously differentiable Theorem: Necessary and sufficient condition for a point ? to be optimal is ? ? = 0. Remark: keywords Taylor s expansion 3

Taylors Expansion & Bounds: Scalar case ?? ?0 +1 for some z on the segment [?,?0] (1)Scalar case: ? ? = ? ?0 + ? (?0) ? ?0 +1 We simplify the notations ? ? =? For fixed ?,?,??? ?, the optimal solution can be derived as: ?? ? = 0 ? ? ?0 + ? = 0 ? ?0= ? Thus, we have ? ? =? 2 ? Or? ? ? ?0 = ?2 2? a. How far from opt. ? ? ? ?0= ? ??2?(?)(? ?0) ? ? = ? ?0 + ?? ?0 2? ?0 2? ? ? ?0 2+ ? ? ?0 + ? 2 2? ?0 ? ?2 ?2+ ? + ? =?2 2? ?2 ?+ ? = ?2 ? 2?+ ? ? ?2 2? b. How much difference from opt. ?(? )? ? ?0 ? ? = 4

Taylors Expansion & Bounds: Example ? ? = ?2+ 4? + 1 For the format ? ? =? 2?2+ ?? + ?, ? = 2 ,? = 4,? = 1. Let?0= 0,we have the answer. a. How far? ? ?0= ? ?= 2 ?2 2?= 4 b. How much? ? ?0 ? ? = 5

Taylors Expansion & Bounds: Bounds (2) Vector case: Assumption A: ?2? ?is bounded, i.e. ?? ?2? ? ?? Theorem A:We have the following bounds 1 ? 1 2? 4 1 2 2 2 ?0 ? 3 ?? ?0 ?? ?0 2 ? 1 2 2 ? ?0 ? 2 ?? ?0 ?? ?0 2 2 2? Proof: ? ? + ?? ??? ? +? ? ? + ?? ??? ? +? 1 2 ? ? ? ?2 2 2 ? ?2 2 (Taylor s Expansion + Assumption A) 6

Taylors Expansion & Bounds: Bounds 2 ??? ? Proof : ? ? ? = ? ? ? ? + ?? ??? ? +? exp + Assumption A. ) ? ? ?? ? 2 1 2 2(Taylor s ? ?2 2 2? ?2+? 2 ? ?2 2 We shift f(x) to the left hand side. 0 ? ? ? ?? ? 2? ?2+? 2 ? ?2 2 2? ?2 to the left, 2? ?2 ? Therefore we have 1. ?? ? 2 2. ? ?2 Shift ?? ? 2 ? ?2 ?? ? 2 2 ? ? ?2 2 ??? ? 2 7

Taylors Expansion & Bounds: Bounds Proof :? ? ? ? + ?? ??? ? +? (Taylor s exp + assumption A. ) ? ? 2 ? ?2 2 2 1 2 (Minimization with y) 2??? ? 2 Thus, we have 1 2, ? ? ? ? ?? ? ? 2 2? Therefore 1 2 ? ? ? ? ?? ? 2 2? 8

Taylors Expansion & Bounds: Bounds Proof : ? ? ? ? + ?? ??? ? +? (Taylor s exp + assumption A. ) ? ? 1 ??? ? , we have ? ? 1 ??? ? = ? ? 2 ? ?2 3 2 1 2 (Minimization with y) 2??? ? 2 Let ? = ? 2 ? ? + ?? ?? 1 ??? ? +? 2 1 ??? ? 2 2 1 2??? ? 2 Shift the terms on the left and right, we have 1 2? ? ? ? ? 2 ? ? ? ? 1 ?? ? ??? ? 2 9

Taylors Expansion & Bounds: Bounds (4) Proof: ? ? ? ? + ? ??? ? +? (Taylor s exp. + assumption A) ( ) Let ? = ? , we have ? ? = 0, thus, we can write the above eq. ? ? ? ? +? or ? ? ? ? ( ) From (3), we have 1 2? ? ?? 2 ( ) From (i)&(ii), we have 1 2? ? ?? 2 Therefore, we have 1 ? ? ?? 2 ? ?2 2 2 ? ? 2 2 2 ? ? 2 2 2 ? ?? ? 2 ? 2 ?? ? 2 2 2 ?? ? 2 10

Taylors Expansion & Bounds Remark: 1 2 (1) If ? ? We have ? ? 2 2?? 1 2 2 ?2?? 2 2 ? ? 2? ? ? ? ? (2) The bounds can be used to design algorithms. prove the convergence. (3) If ? ? (?.?.1010) Impact on the bounds become very loose Efficiency of gradient descent approaches. (4) Quadratic obj. with sparse matrix (A) 1 2???? + ??? + ? is a preferred formulation in terms of algorithm efficiency. 2 = ? 11

II. Descent Methods Given convex function, twice continuously differentiable ? ? and an initial point ?? ??? ?. Repeat 1. Determine a descent direction ? ( ? ?? ? < 0) 2. Line Search, choose a step size ? > 0. 3. Update ? = ? + ? ? Until stopping criterion is met. Line Search : ? = ???min ?>0?(? + ? ?) Backtracking line search (? 0,1/2 ,? (0,1)) Start at ? = 1, repeat ? ?? until ? ? + ? ? < ? ? + ?? ? ?? ? 1 2 (Theorem A (2)) Stopping criterion ? ? 2 ?? = 2?? 12

II. Descent Methods: Example Problem: min ? ? =1 ??= ?,1 ,? ?0=? ?+1 Thus, ?1= ?,1 ? ?,? = ? 1 ? ,1 ?? and ? ?1= (? 1 ? ,? 1 ?? ) 1. To opt ?(?1) with respect to variable ?, we have ? ?1=1 ?? ?1 ?? Thus, ? = ?2+?3= 2+ ??2 2 ? > 0 2?1 , ? ?0= (?,?) 2 2(?21 ?2+ ? 1 ??2) = ?21 ? + ? 1 ?? ? = 0 2?2 2 ? ? 1 1+?,1 ? 10 9 11, 1 ? 1+? 9 1+?, and ?1= = 1+? 11 ? ?) ? 1 ?+1 2. We repeat the process to step ?,??= (? 3. Equal potential plot ? ??=? ? + 1 2 ? + 1 , 2? 2? 2? ? 1 ? 1 ? + 1 1 ?/? 1 + ?/? ? ??= ? ?? = 13

II. Descent Methods: Descent for various norms 1. Problem: Min ?(?) 2. For each iteration, we try the steepest descent in terms of a given norm. Min ? ?? ? s.t. || ?|| 1 3. We show the step of i. Quadratic norm ii. L1 norm 14

II. Descent Methods: Descent for quadratic norm 1. Problem: Min ?(?) 2. For each iteration, we try the steepest descent in terms of a given norm. ??? ? ? ?? ? s.t. || ?| ? 1 ? ?= ??? ?1/2, ? ?++ Lagrangian ? ?,? = ? ?? ?+? (|| ?| ? 1), ? 0 We can derive: ????= ? ??? 1 ? ? Or ???= ? 1 ?(?) ? 1/2? 1 ?(?) 15

II. Descent Methods: Descent for quadratic norm The coordinate change has effects on the descent direction. Example: min ? ? =1 Affine transform: ? = ?1/2? ? 2???? + ???,? ?++ 16

II. Descent Methods: Descent for L1 norm 1. Problem: Min ?(?) 2. For each iteration, we try the steepest descent in terms of a given norm. Min ? ?? ? < 0 s.t. || ? 1 1 Lagrangian ? ?,? = ? ?? ?+? (|| ?| 1 1), ? 0 We can derive: ????= ???? ?? ? ??? ??, = ? ?? where ? is the index for which ? ? Or ???= ?? ? ????? 17

Gradient descent method: Convergence analysis 2+??2 ? ? ? ? ? ? ? 2 ? ? ? ? ? ? ? 2 2 2 1 ? 1 ? ?????? ? ? = A. ? ?????? ? ? ? ? 2(min ? ? 2? ? ? 1 2? ? ? ? ? ? ? ) 2 ? 2 2 2 1 2 ? ? ? 2? ?(? ? ? ) since ? ? ? B. C. From B,we have 2 2? ? ? 2 1 2 ? ? ? ? ? ? ? = (? ? ? )(1 ? D. We can conclude from A & C ? ??+1 ? 1 ? To achieve ? ? ? ?, we need log ? ?? ? /? log 1/? ? ? ? ? ? 2 2? ? ?) ? ? ?? ? 1 ? (? ?? ? ) ? ? ?????, where ? = 1 ? ?< 1, 18

Gradient descent method : Convergence analysis log 1/? = log 1 ?/? ?/? for large ?/? Remark: when ?/? > 100 the method can be very slow. 19

Newton Step Use the approximation of 2nd order Taylor s Exp. ? ? + ? ? ? + ? ??? +1 We would like to derive ?? ? + ? = 0 ? ? + 2? ? ? = 0 Thus, we have ? = 2? ? 1 ? ? ? ? + ? = ? ? + 1 ? ?? 2? ? 1 ? ? + 1 2 ? ?? 2? ? 1 ?(?) = ? ? 1 Input ? ??? ?, ? > 0 Repeat: 1. ??? 2? ? 1 ? ? ,?2? = ? ?? 2? ? 1 ?(?) 2. ???? ?? ?2/2 ? 3. ???? ????? ? 4. ? ? + ? ??? 2?? 2? ? ? 2 ? ?? 2? ? 1 ?(?) 20

Newton Method : Convergence analysis Assumptions: ? = {? ??? ? ? ? ? ??} ? strongly convex on S with constant m, s.t. 2? ? ??, ? ? 2? is Lipschitz continuous on S with constant ?, i.e. 2? ? 2? ? 2 ? ? ?2 Outlines: ? 0,?2/? ,two cases. 1. Damped Newton Phase: (? < 1) ? ? 2 ? then ? ??+1 ? ?? ???2?/?2 2. Pure Newton Phase (Quadratically Convergent Stage): (? = 1) ? ? ? ?2 ? ??+1 2< ? then 2 ? 2?2 ? ?? 2?+1 ? 2 2 2?+1 ? ? 1 2 2?2 ? ?? ? + 1 ? 2 21

Newton Method: Affine Invariant Problem: min?(?) Theorem: Newton s step is invariant to affine transform. Proof: Let ? = ??,? ???,? ? = ? ?? = ?(?) For the ? coordinate system, we have. ???= 2? ? 1 ? ? For the ? coordinate system, we have. 1. ? ?(?)= ?? ??(??), ? 2. The Newton step at ?, ??? = ? = ?? 2? ? ? 1(?? ? ? ) = ? 1 2? ? 1 ? ? = ? 1 ??? 2 ? ? = ?? 2? ?? ? Therefore, we have the invariant results ? + ???= ?(?+ ???). 2 ? ? 1 ? ? ? 22

Summary 1. Gradient Descent Method: (minimization solution) 1. Vector operations per iteration 2. Linear convergence rate 2. Newton s Method: (equality solution) 1. Matrix operations per iteration 2. Quadratic convergence rate (near the solution) 3. Gradient Descent Method Variations: 1. Conjugate gradient method 2. Nesterov gradient descent method 3. Quasi-Newton method 23

Unconstrained Minimization in Convex Optimization

Download Presentation

Presentation Transcript

Related

More Related Content