Advanced Techniques in Machine Learning

support vector machine i n.w
1 / 44
Embed
Share

Explore topics in machine learning including support vector machines, logistic regression, gradient descent, Lasso regularization, and more. Understand the concepts behind single and multiple predictors, as well as terminology related to regularization functions and solvers. Dive deep into the world of machine learning with these informative slides.

  • Machine Learning
  • Support Vector Machines
  • Regularization
  • Gradient Descent
  • Logistic Regression

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Support Vector Machine I Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW 2 on the way.

  3. Regularized logistic regression ?? = ?(?0+ ?1? + ?2?2+ ?3?1 ?6?1 2+ ?5?1?2+ 3+ ) 2+ ?4?2 3?2+ ?7?1?2 Age Tumor Size Cost function: ? ? =1 ? ? +? 2 ??log ??? + (1 ??)log 1 ??? ? 2 ?=1 ?? ?=1 Slide credit: Andrew Ng

  4. Gradient descent (Regularized) Repeat { ?0 ?0 ?1 ? 1 ?? = ??? ?? ? ?=1 1 + ? ? ? ? 1 ? ?=1 ?+ ??? ??? ?? ?? ?? ? ?? } ? ?(?) ??? Slide credit: Andrew Ng

  5. ?1: Lasso regularization ? ? 1 2+ ? ??? ?? ? ? = 2? |??| ?=1 ?=1 LASSO: Least Absolute Shrinkage and Selection Operator

  6. Single predictor: Soft Thresholding 2+ ? ?1 1 ? ?(?)? ?? minimize? 2? ?=1 1 ?< ?,? > ? 1 ?< ?,? > > ? 1 ?| < ?,? > | ? 1 ?< ?,? > < ? if ? = 0 if 1 ?< ?,? > +? if ? = ??(1 ?< ?,? >) Soft Thresholding operator ??? = sign ? ? ?+

  7. Multiple predictors: : Cyclic Coordinate Descenet 2 1 ???+ ? ???? ??? ?? ? minimize? 2? ?=1 ?? + ? |??| + ? ??1 ? ? For each ?, update ?? with ? 1 (?)2 ??? ?? minimize? 2? ?? + ? ??1 ?=1 ??? (?)= ?? ? ???? where ??

  8. L1 and L2 balls Image credit: https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS.pdf

  9. Terminology Regularization function Name Solver ? Tikhonov regularization Ridge regression Close form 2 2= ?2 ?? ?=1 ? LASSO regression Proximal gradient descent, least angle regression Proximal gradient descent ? 1= |??| ?=1 2Elastic net regularization ? ? 1+ (1 ?) ?2

  10. Things to remember Overfitting Complex model: doing well on the training set, but perform poorly on the testing set Cost function Norm penalty: L1 or L2 Regularized linear regression Gradient decent with weight decay (L2 norm) Regularized logistic regression Gradient decent with weight decay (L2 norm)

  11. Support Vector Machine Cost function Large margin classification Kernels Using an SVM

  12. Support Vector Machine Cost function Large margin classification Kernels Using an SVM

  13. Logistic regression ?(?) ?? = ? ? ? 1 ? ? = 1 + ? ? ? = ? ? Suppose predict y = 1 if ?? 0.5 ? = ? ? 0 predict y = 0 if ?? < 0.5 ? = ? ? < 0 Slide credit: Andrew Ng

  14. Alternative view ?(?) ?? = ? ? ? 1 ? ? = 1 + ? ? ? = ? ? If y = 1 , we want ?? 1 ? = ? ? 0 If y = 0 , we want ?? 0 ? = ? ? 0 Slide credit: Andrew Ng

  15. Cost function for Logistic Regression Logistic Regression Cost( ?? ,?) = log ?? if ? = 1 log 1 ?? if ? = 0 if ? = 1 if ? = 0 ?? 1 Slide credit: Andrew Ng ?? 1 0 0

  16. Alternative view of logistic regression Cost ?? ,? = ? log h?x (1 y)log 1 ?? 1 1 = ? log + 1 y log 1 1 + ? ? ? 1 + ? ? ? 1 1 log log 1 1 + ? ? ? 1 + ? ? ? 0 if ? = 1 if ? = 0 ? ? ? ? 1 1 0 0 2 1 2 2 1 2

  17. Logistic regression (logistic loss) 1 ? ?=1 ? ? ? 2 ?(?) log ??(?) + (1 ??) log 1 ??(?) min ? + 2? ?? ?=1 Support vector machine (hinge loss) 1 ? ?=1 ? ? ? 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? + 2? ?? ?=1 1 1 log log 1 1 + ? ? ? 1 + ? ? ? 0 if ? = 1 if ? = 0 ? ? 2 ? ? 1 1 0 0 2 1 2 2 1

  18. Optimization objective for SVM ? ? 1 ? ? 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? + 2? ?? ?=1 ?=1 1) Multiply 1 ? 1 ? 2) Multiply C = ? ? 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? + ?? ?=1 ?=1 Slide credit: Andrew Ng

  19. Hypothesis of SVM ? ? 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? + ?? ?=1 ?=1 Hypothesis ?? = 1 if ? ? 0 0 if ? ? < 0 Slide credit: Andrew Ng

  20. Support Vector Machine Cost function Large margin classification Kernels Using an SVM

  21. Support vector machine ? ?(?) cost1? ?? ? 2 + (1 ??) cost0? ?? min ? ? + ?? ?=1 ?=1 cost0? ? cost1? ? if ? = 1 if ? = 0 ? ? 2 ? ? 1 1 0 0 2 1 2 2 1 If y = 1 , we want ? ? 1(not just 0) If y = 0 , we want ? ? 1(not just < 0) Slide credit: Andrew Ng

  22. SVM decision boundary ? ? +1 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? 2 ?=1 ?? ?=1 Let s say we have a very large ? 1 2 ?=1 Whenever ??= 1: ? ?? 1 Whenever ??= 0: ? ?? 1 ? 2 min ? s.t. ? ?? 1 ? ?? 1 if y?= 0 ?? if y?= 1 Slide credit: Andrew Ng

  23. SVM decision boundary: Linearly separable case ?2 ?1 Slide credit: Andrew Ng

  24. SVM decision boundary: Linearly separable case ?2 margin ?1 Slide credit: Andrew Ng

  25. Large margin classifier in the presence of outlier ? very large ?2 ? not too large ?1 Slide credit: Andrew Ng

  26. Vector inner product ?1 ?2 ?1 ?2 ? = ? = ? = length of vector ? ? ?2 2+ ?2 2 = ?1 ? ?2 ? = length of projection of ? onto ? ? ? = ? ? = u1v1+ u2v2 ?1 ?1 Slide credit: Andrew Ng

  27. SVM decision boundary 2 1 2 ?=1 1 2 ?=1 2=1 2=1 =1 ? 2 ? 2+ ?2 2+ ?2 2 2 min ? s.t. ? ?? 1 ? ?? 1 if y?= 0 Simplication: ?0= 0,? = 2 What s ? ??? ?? ?? 2?1 ?1 2? 2 if y?= 1 ?(?) ? ?2 ? ?2 ? ??= ?(?)? 2 ?(?) ? ?1 ?1 Slide credit: Andrew Ng

  28. SVM decision boundary 1 2 Simplication: ?0= 0,? = 2 2 min ? s.t. ?1,?(2) small ? ? ?(?)? ?(?)? 2 1 2 1 if y?= 0 if y?= 1 2 large ?1,?(2) large ? 2 can be small ? ?2 ?2 ? ?1 ?1 Slide credit: Andrew Ng

  29. Support Vector Machine Cost function Large margin classification Kernels Using an SVM

  30. Non-linear decision boundary ?2 Predict ? = 1 if ?0+ ?1?1+ ?2?2+ ?3?1?2 + ?4?1 ?1 2+ ?5?2 2+ 0 ?0+ ?1?1+ ?2?2+ ?3?3+ ?1= ?1,?2= ?2,?3= ?1?2, Is there a different/better choice of the features ?1,?2,?3, ? Slide credit: Andrew Ng

  31. Kernel Give ?, compute new features depending on proximity to landmarks ?(1), ?(2), ?(3) ?2 ?(2) ?(1) ?1= similarity(?,?(1)) ?2= similarity(?,?(2)) ?3= similarity(?,?(3)) ?(3) ?1 Gaussian kernel 2 ? ?? 2?2 similarity ?,?? = exp( ) Slide credit: Andrew Ng

  32. ?2 ?(2) ?(1) Predict ? = 1 if ?0+ ?1?1+ ?2?2+ ?3?3 0 ?(3) ?1 Ex: ?0= 0.5,?1= 1,?2= 1,?3= 0 ?1= similarity(?,?(1)) ?2= similarity(?,?(2)) ?3= similarity(?,?(3)) Slide credit: Andrew Ng

  33. Choosing the landmarks Given ? 2 ? ?? 2?2 ??= similarity ?,?? = exp( ) Predict ? = 1 if ?0+ ?1?1+ ?2?2+ ?3?3 0 ?1 ?2 ?3 ?2 Where to get ?(1),?(2),?3, ? ?? ?1 Slide credit: Andrew Ng

  34. SVM with kernels Given ?1,?1, ?2,?2, , ??,?? Choose ?1= ?1,?2= ?2, ?3= ?3, , ??= ?? Given example ?: ?1= similarity ?,?(1) ?2= similarity ?,?(2) For training example ??,??: ?? ?(?) ?0 ?1 ?2 ?? ? = Slide credit: Andrew Ng

  35. SVM with kernels Hypothesis: Given ?, compute features ? ?+1 Predict ? = 1 if ? ? 0 Training (original) ? ? +1 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? 2 ?=1 ?? ?=1 Training (with kernel) ? ? +1 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? 2 ?=1 ?? ?=1

  36. SVM parameters ? =1 Large ?: Lower bias, high variance. Small ?: Higher bias, low variance. ? ?2 Large ?2: features ?? vary more smoothly. Higher bias, lower variance Small ?2: features ?? vary less smoothly. Lower bias, higher variance Slide credit: Andrew Ng

  37. SVM song https://www.youtube.com/watch?v=g15bqtyidZs Video source:

  38. SVM Demo https://cs.stanford.edu/people/karpathy/svmjs/demo/

  39. Support Vector Machine Cost function Large margin classification Kernels Using an SVM

  40. Using SVM SVM software package (e.g., liblinear, libsvm) to solve for ? Need to specify: Choice of parameter ?. Choice of kernel (similarity function): Linear kernel: Predict ? = 1 if ? ? 0 Gaussian kernel: ? ?? 2?2 Need to choose ?2. Need proper feature scaling 2 ), where ?(?)= ?(?) ??= exp( Slide credit: Andrew Ng

  41. Kernel (similarity) functions Note: not all similarity functions make valid kernels. Many off-the-shelf kernsl available: Polynomial kernel String kernel Chi-square kernel Histogram intersection kernel Slide credit: Andrew Ng

  42. Multi-class classification ?2 ?1 Use one-vs.-all method. Train ? SVMs, one to distinguish ? = ? from the rest, get ?1,?(2), ,?(?) Pick class ? with the largest ?? ? Slide credit: Andrew Ng

  43. Logistic regression vs. SVMs ? = number of features (? ?+1), ? = number of training examples 1. If ? is large (relative to ?): (? = 10,000,m = 10 1000) Use logistic regression or SVM without a kernel ( linear kernel ) 2. If ? is small, ? is intermediate: (? = 1 1000,m = 10 10,000) Use SVM with Gaussian kernel 3. If ? is small, ? is large: (? = 1 1000,m = 50,000+) Create/add more features, then use logistic regression of linear SVM Neural network likely to work well for most of these case, but slower to train Slide credit: Andrew Ng

  44. Things to remember Cost function ? ? +1 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? 2 ?=1 ?? ?=1 Large margin classification Kernels ?2 Using an SVM margin ?1

Related


More Related Content