Machine Learning Algorithms and Models Overview

 
Class Summary
 
Jia-Bin Huang
Virginia Tech
 
Spring 2019
 
ECE-5424G / CS-5824
 
 
 
 
Thank you all for participating this class!
 
SPOT survey!
 
Please give us feedback: lectures, topics, homework, exam, office
hour, piazza
 
 
Machine learning algorithms
 
k-NN (Classification/Regression)
 
Linear regression (Regression)
 
Naïve Bayes (Classification)
 
Logistic regression (Classification)
SVM with kernels
 
SVM parameters
 
Slide credit: Andrew Ng
Neural network
Layer 1
“Output”
Layer 2
Layer 3
Slide credit: Andrew Ng
Neural network
Slide credit: Andrew Ng
 
Neural network
 
“Pre-activation”
 
Slide credit: Andrew Ng
Neural network
“Pre-activation”
Slide credit: Andrew Ng
Neural network learning its own features
 
Slide credit: Andrew Ng
 
Bias / Variance Trade-off
 
Training error
Cross-validation error
 
Loss
 
Degree of Polynomial
 
Source: Andrew Ng
 
Bias / Variance Trade-off
 
Training error
Cross-validation error
 
Loss
 
Degree of Polynomial
 
High bias
 
High Variance
 
Bias / Variance Trade-off with
Regularization
 
Training error
Cross-validation error
 
Loss
 
λ
 
Source: Andrew Ng
 
Bias / Variance Trade-off with
Regularization
 
Training error
Cross-validation error
 
Loss
 
High bias
 
High Variance
 
Source: Andrew Ng
 
λ
K-means algorithm
Slide credit: Andrew Ng
Expectation Maximization (EM)
Algorithm
Expectation Maximization (EM)
Algorithm
 
EM algorithm
 
Anomaly detection algorithm
 
Problem motivation
 
Problem motivation
 
Collaborative filtering optimization objective
Collaborative filtering algorithm
 
Semi-supervised Learning
Problem Formulation
 
Deep Semi-supervised Learning
 
 
Ensemble methods
 
Ensemble methods
Combine multiple classifiers to make better one
Committees, majority vote
Weighted combinations
Can use same or different classifiers
Boosting
Train sequentially; later predictors focus on mistakes by earlier
Boosting for classification (e.g., AdaBoost)
Use results of earlier classifiers to know what to work on
Weight hard examples so we focus on them more
Example: Viola-Jones for face detection
 
Generative models
 
 
Simple Recurrent Network
 
 
Reinforcement learning
 
 
 
 
 
 
 
Markov decision process
Q-learning
Policy gradient
 
Final exam sample questions
 
 
Conceptual questions
 
[True/False] Increasing the value of k in a k-nearest neighbor classifier
will decrease its bias
[True/False] Backpropagation helps neural network training get
unstuck from local minimum
[True/False] Linear regression can be solved by either matrix algebra
or gradient descent
[True/False] Logistic regression can be solved by either matrix algebra
or gradient descent
[True/False] K-means clustering has a unique solution
[True/False] PCA has a unique solution
 
Classification/Regression
 
Given a simple dataset
 
1) Estimate the parameters
 
2) Compute training error
 
3) Compute leave-one-out cross-validation error
 
4) Compute testing error
 
Naïve Bayes
Slide Note
Embed
Share

This class summary covers topics such as supervised learning, unsupervised learning, classification, clustering, regression, k-NN models, linear regression, Naive Bayes, logistic regression, and SVM formulations. The content provides insights into key concepts, algorithms, cost functions, learning approaches, and inferences related to machine learning.

  • Machine Learning
  • Algorithms
  • Models
  • Supervised Learning
  • Unsupervised Learning

Uploaded on Jul 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Class Summary Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Thank you all for participating this class! SPOT survey! Please give us feedback: lectures, topics, homework, exam, office hour, piazza

  3. Machine learning algorithms Supervised Learning Unsupervised Learning Discrete Classification Clustering Dimensionality reduction Continuous Regression

  4. k-NN (Classification/Regression) Model ?1,?1, ?2,?2, , ??,?? Cost function None Learning Do nothing Inference ? = ?test= ?(?), where ? = argmin? ?(?test,?(?))

  5. Linear regression (Regression) Model ?? = ?0+ ?1?1+ ?2?2+ + ????= ? ? Cost function 1 2? Learning ? 2 ??? ?? ? ? = ?=1 1 ? ?=1 ?} ? ??? ?? 1) Gradient descent: Repeat {?? ?? ? 2) Solving normal equation ? = (? ?) 1? ? Inference ?? ? = ??test= ? ?test

  6. Nave Bayes (Classification) Model ?? = ?(?|?1,?2, ,??) ? ? ?? ???) Cost function Maximum likelihood estimation: ? ? = log? Data ? Maximum a posteriori estimation :? ? = log? Data ? ? ? Learning ??= ?(? = ??) (Discrete ??) ????= ?(??= ????|? = ??) (Continuous ??) mean ???, variance ??? Inference ? argmax ?? 2,? ??? = ??) = ?(??|???,??? 2) test? = ??) ? ? = ?? ?? ??

  7. Logistic regression (Classification) Model 1 ?? = ? ? = 1 ?1,?2, ,?? = 1+? ? ? Cost function ? ? =1 ? Learning Gradient descent: Repeat {?? ?? ? ? Cost( ?? ,?) = log ?? if ? = 1 Cost( ?(??),?(?))) log 1 ?? if ? = 0 ?=1 1 ? ?=1 ?} ? ??? ?? ?? Inference 1 ? = ??test= 1 + ? ? ?test

  8. Hard-margin SVM formulation 1 2 ?=1 ? 2 min ? ?? s.t. ? ?? 1 ? ?? 1 if y?= 0 if y?= 1 ?2 margin Soft-margin SVM formulation ?1 1 2 ?=1 ? 2+ ? ??(?) min ? ?? ?2 s.t. ? ?? 1 ?(?) ? ?? 1 + ?? ?? 0 if y?= 1 if y?= 0 ? ?1

  9. ?0 ?1 ?2 ?? SVM with kernels ? = Hypothesis: Given ?, compute features ? ?+1 Predict ? = 1 if ? ? 0 Training (original) ? ? +1 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? 2 ?=1 ?? ?=1 Training (with kernel) ? ? +1 2 ?(?) cost1? ?? + (1 ??) cost0? ?? min ? ? 2 ?=1 ?? ?=1

  10. SVM parameters ? =1 Large ?: Lower bias, high variance. Small ?: Higher bias, low variance. ? ?2 Large ?2: features ?? vary more smoothly. Higher bias, lower variance Small ?2: features ?? vary less smoothly. Lower bias, higher variance Slide credit: Andrew Ng

  11. Neural network (2) ?0 ?0 (2) ?1 ?1 Output (2) ?2 ? ?2 (2) ?3 ?3 Layer 2 Layer 3 Layer 1 Slide credit: Andrew Ng

  12. Neural network (?)= activation of unit ? in layer ? ?? ?= matrix of weights controlling function mapping from layer ? to layer ? + 1 (2) ?0 ?0 (2) ?1 ?1 (2) ? ?2 ?2 ?? unit in layer ? ??+1 units in layer ? + 1 (2) ?3 ?3 (2)= ? 10 (2)= ? 20 (2)= ? 30 (1)?0+ 11 (1)?0+ 21 (1)?0+ 31 (2)?0 (1)?1+ 12 (1)?1+ 22 (1)?1+ 32 (2)+ 11 (1)?2+ 13 (1)?2+ 23 (1)?2+ 33 (2)+ 12 (1)?3 (1)?3 (1)?3 (2)+ 13 ?1 Size of ?? ?2 ?3 ??+1 (??+ 1) (1)?1 (1)?2 (1)?3 (2) (?) = ? 10 Slide credit: Andrew Ng

  13. Neural network Pre-activation ?0 ?1 ?2 ?3 (2) z1 z2 z3 (2) ?0 ?0 (2) z(2)= ? = (2) ?1 ?1 (2) (2) ? ?2 ?2 (2) ?3 ?3 (2)= ? 10 (2)= ? 20 (2)= ? 30 (1)?0+ 11 (1)?0+ 21 (1)?0+ 31 2?0 (1)?1+ 12 (1)?1+ 22 (1)?1+ 32 2+ 11 (1)?2+ 13 (1)?2+ 23 (1)?2+ 33 2+ 12 (1)?3 = ?(z1 (1)?3 = ?(z2 (1)?3 = ?(z3 1?2 (2)) ?1 (2)) ?2 (2)) ?3 1?1 2+ 13 1?3 2 = ?(?(3)) ? = ? 10 Slide credit: Andrew Ng

  14. Neural network Pre-activation ?0 ?1 ?2 ?3 (2) z1 z2 z3 (2) ?0 ?0 (2) z(2)= ? = (2) ?1 ?1 (2) (2) ? ?2 ?2 (2) ?3 ?3 ?(2)= (1)? = (1)?(1) ?(2)= ?(?(2)) Add ?0 ?(3)= (2)?(2) ? = ?(3)= ?(?(3)) (2)= ?(z1 (2)= ?(z2 (2)= ?(z3 (2)) (2)) (2)) ?1 ?2 ?3 ? = ?(?(3)) (2)= 1 Slide credit: Andrew Ng

  15. Neural network learning its own features (2) ?0 ?0 (2) ?1 ?1 (2) ? ?2 ?2 (2) ?3 ?3 Slide credit: Andrew Ng

  16. Bias / Variance Trade-off Training error Cross-validation error Loss Degree of Polynomial Source: Andrew Ng

  17. Bias / Variance Trade-off Training error Cross-validation error High bias High Variance Loss Degree of Polynomial

  18. Bias / Variance Trade-off with Regularization Training error Cross-validation error Loss Source: Andrew Ng

  19. Bias / Variance Trade-off with Regularization Training error Cross-validation error High Variance High bias Loss Source: Andrew Ng

  20. K-means algorithm Randomly initialize ? cluster centroids ?1,?2, ,?? ? Cluster assignment step ? ?1, ,??,?1, ,?? =1 ? Repeat{ closest to ?(?) for ? = 1 to ? ?? average (mean) of points assigned to cluster ? } 2 ?? ??? ? ?=1 for ? = 1 to ? ?(?) index (from 1 to ?) of cluster centroid Centroid update step ? ?1, ,??,?1, ,?? =1 ? 2 ?? ??? ? ?=1 Slide credit: Andrew Ng

  21. Expectation Maximization (EM) Algorithm Goal: Find ? that maximizes log-likelihood ?log?(??;?) ?log?(??;?) = ?log ?(?)?(??,?(?);?) = ?log ?(?)???? ? ??,??;? ??(?(?)) log? ??,??;? ??(?(?)) ? ?(?)???? Jensen s inequality: ? ? ? ?[? ? ]

  22. Expectation Maximization (EM) Algorithm Goal: Find ? that maximizes log-likelihood ?log?(??;?) log? ??,??;? ??(?(?)) ?log?(??;?) ? ?(?)???? - The lower bound works for all possible set of distributions ?? - We want tight lower-bound: ? ? ? = ?[? ? ] - When will that happen? ? = ? ? with probability 1 (? is a constant) ? ??,??;? ??(?(?)) = ?

  23. How should we choose ??(?(?))? ? ??,??;? ??(?(?)) ??(?(?)) ? ??,??;? ???(?(?)) = 1 (because it is a distribution) = ? ? ??,??;? ?? ??,??;?= ? ??,??;? ? ??;? ???? = = ?(??|??;?)

  24. EM algorithm Repeat until convergence{ (E-step) For each ?, set ???? ?(??|??;?) (Probabilistic inference) (M-step) Set log? ??,??;? ??(?(?)) ? argmax? ? ?(?)???? }

  25. Anomaly detection algorithm 1. Choose features ?? that you think might be indicative of anomalous examples 2,?2 2, ,?? 2 2. Fit parameters ?1,?2, ,??, ?1 ?? = ?? 1 ? ?=1 1 ? ?=1 (?) ??? ?(?? ? ??)2 2= 3. Given new example ?, compute ? ? 2) ? ? = ? ?(??;??,?? Anomaly if ? ? < ?

  26. Problem motivation ?1 ?2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 0.9 0 Romance forever 5 ? ? 0 1.0 0.01 Cute puppies of love ? 4 0 ? 0.99 0 Nonstop car chases 0 0 5 4 0.1 1.0 Swords vs. karate 0 0 5 ? 0 0.9

  27. Problem motivation ?1 ?2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 ? ? Romance forever 5 ? ? 0 ? ? Cute puppies of love ? 4 0 ? ? ? Nonstop car chases 0 0 5 4 ? ? Swords vs. karate 0 0 5 ? ? ? 0 5 0 0 5 0 0 0 5 0 0 5 ? ? ? ?1= ?1= ?2= ?3= ?4=

  28. Collaborative filtering optimization objective Given ?1,?2, ,???, estimate ?1,?2, ,??? 1 2 ?=1 ?:? ?,? =1 Given ?1,?2, ,???, estimate ?1,?2, ,??? 1 2 ?=1 ?:? ?,? =1 Minimize ?1,?2, ,??? and ?1,?2, ,??? simultaneously ? =1 2 ?:? ?,? =1 ?? ?? ? 2+? 2 ? (??) ?? ??,? min 2 ?=1 ?? ?1,?2, ,??? ?=1 ?? ?? ? 2+? (?)2 (??) ?? ??,? min 2 ?=1 ?? ?(1),?(2), ,?(??) ?=1 ?? ? ?? ? 2+? +? 2 (?)2 ? (??) ?? ??,? 2 ?=1 ?? 2 ?=1 ?? ?=1 ?=1

  29. Collaborative filtering algorithm Initialize ?1,?2, ,???,?1,?2, ,??? to small random values Minimize ?(?1,?2, ,???,?1,?2, ,???) using gradient descent (or an advanced optimization algorithm). For every ? = 1 ??,? = 1, ,??: ?? (?) ? ?? ? ? ?+ ??? ( ?? ??,?) ?? ?? ?:? ?,? =1 ?? (?) ? ?? ? ? ?+ ??? ( ?? ??,?) ?? ?? ?:? ?,? =1 For a user with parameter ? and movie with (learned) feature ?, predict a star rating of ? ?

  30. Semi-supervised Learning Problem Formulation Labeled data ?1,?1, ?2,?2, , ???,??? ??= Unlabeled data ?1,?1, ?2,?2, , ???,??? ??= Goal: Learn a hypothesis ? (e.g., a classifier) that has small error

  31. Deep Semi-supervised Learning

  32. Ensemble methods Ensemble methods Combine multiple classifiers to make better one Committees, majority vote Weighted combinations Can use same or different classifiers Boosting Train sequentially; later predictors focus on mistakes by earlier Boosting for classification (e.g., AdaBoost) Use results of earlier classifiers to know what to work on Weight hard examples so we focus on them more Example: Viola-Jones for face detection

  33. Generative models

  34. Simple Recurrent Network

  35. Reinforcement learning Markov decision process Q-learning Policy gradient

  36. Final exam sample questions

  37. Conceptual questions [True/False] Increasing the value of k in a k-nearest neighbor classifier will decrease its bias [True/False] Backpropagation helps neural network training get unstuck from local minimum [True/False] Linear regression can be solved by either matrix algebra or gradient descent [True/False] Logistic regression can be solved by either matrix algebra or gradient descent [True/False] K-means clustering has a unique solution [True/False] PCA has a unique solution

  38. Classification/Regression Given a simple dataset 1) Estimate the parameters 2) Compute training error 3) Compute leave-one-out cross-validation error 4) Compute testing error

  39. Nave Bayes Compute individual probabilities Compute ? ? = 1 ?1= 0,?2= 1,?3= 1 using Na ve Bayes classifier

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#