Overview of Unsupervised Learning in Machine Learning
This presentation covers various topics in unsupervised learning, including clustering, expectation maximization, Gaussian mixture models, dimensionality reduction, anomaly detection, and recommender systems. It also delves into advanced supervised learning techniques, ensemble methods, structured prediction, and reinforcement learning, providing a comprehensive insight into the field of machine learning.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Clustering Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824
Administrative HW 2 grades out. HW 3 March 27. Started grading Midterm. Results expected to be out by March 27 Final project question? Feedback please! https://goo.gl/forms/A6NSq5yMpkuWEMR43
Where are we? (Supervised Learning) K-Nearest Neighbor Linear Regression Naive Bayes Logistic Regression Regularization Support Vector Machines Neural Networks Diagnosing ML systems
Whats next? (Unsupervised Learning) Clustering Expectation Maximization (EM) and Gaussian Mixture Models (GMM) Dimensionality reduction Anomaly Detection Recommender systems
Whats next? (Advanced topics) Supervised Learning Ensemble methods Structured prediction Semi-supervised learning Generative Models Probabilistic Graphical Models Sequence Prediction Models Reinforcement Learning Markov Decision Process, Q-learning, Policy gradient
Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering
Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering
Supervised learning ?2 ? ?1 Training set: ?1,?1, ?2,?2, , ??,?? Slide credit: Andrew Ng
Unsupervised learning ?2 ?1 Training set: ?1,?2,?3, ,?? Slide credit: Andrew Ng
Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering
K-means algorithm Input: ? (number of clusters) Training set ?(1),?(2),?(3), ,?(?) ?(?) ? (note: drop ?0= 1 convention) Slide credit: Andrew Ng
K-means algorithm Randomly initialize ? cluster centroids ?1,?2, ,?? ? Repeat{ for ? = 1 to ? ?(?) index (from 1 to ?) of cluster centroid closest to ?(?) for ? = 1 to ? ?? average (mean) of points assigned to cluster ? Cluster assignment step Centroid update step } Slide credit: Andrew Ng
Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering
K-means optimization objective ?(?) =Index of cluster (1, 2, K) to which example ?? is currently assigned ?? = cluster centroid ? (?? ?) ??(?) = cluster centroid of cluster to which example ?? has been assigned Optimization objective: ? ?1, ,??,?1, ,?? =1 Example: ?(?)= 5 ?(?)= 5 ??(?) = ?5 ? 2 ?? ??? ? ?=1 ? ?1, ,??,?1, ,?? min ?1, ,?? ?1, ,?? Slide credit: Andrew Ng
K-means algorithm Randomly initialize ? cluster centroids ?1,?2, ,?? ? Cluster assignment step ? ?1, ,??,?1, ,?? =1 ? Repeat{ closest to ?(?) for ? = 1 to ? ?? average (mean) of points assigned to cluster ? } 2 ?? ??? ? ?=1 for ? = 1 to ? ?(?) index (from 1 to ?) of cluster centroid Centroid update step ? ?1, ,??,?1, ,?? =1 ? 2 ?? ??? ? ?=1 Slide credit: Andrew Ng
Objective based clustering Input: ? (number of clusters) Training set ?(1),?(2),?(3), ,?(?) Distance/dissimilarity measure d(??,??) Goal: output a partition of the data K-means: Euclidean distance K-median: Absolute distance K-center: minimize the maximum radius
Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering
Random initialization Randomly pick ? training examples Set ?1,?2, ,?? equal to those ? examples ?2 ?1 Slide credit: Andrew Ng
Local optima Slide credit: Andrew Ng
Local optima Slide credit: Maria-Florina Balcan
1) Multiple random initialization For i = 1 to 100 { Randomly initialize K-means Run K-means. Get ?1, ,??,?1, ,?? Compute the cost function (distortion) ?(?1, ,??,?1, ,??) } Pick clustering that gave the lowest cost ?(?1, ,??,?1, ,??) Slide credit: Andrew Ng
2) Furthest Point Heuristic Choose ?1 arbitrarily (or at random) For j = 2 to K Pick ?? among data points ?(1),?(2), ,?(?) that is farthest from previously chosen ?1,?2, ,?? 1 Slide credit: Maria-Florina Balcan
3) K-means++ initialization Interpolate between random and furthest point initialization Idea: Let D(?) be the distance between a point ? and its nearest center. Chose the next center proportional to D2(?). Choose ?1 arbitrarily (or at random) For j = 2 to K Pick ?? among data points ?(1),?(2), ,?(?) according to the distribution 2 Pr ??= ?? ? <??? ?? min [Arthur and Vassilvitskii 2007]
3) K-means++ initialization Interpolate between random and furthest point initialization Idea: Let D(?) be the distance between a point ? and its nearest center. Chose the next center proportional to D?(?). ? = 0: random sampling ? = : furthest point ? = 2: K-means++ scikit-learn uses K-means++ as default [Arthur and Vassilvitskii 2007]
How to choose K? 1) Elbow method 2) Try multiple K and use performance from downstream task for selection
Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering
Hierarchical Clustering A hierarchy might be more nature Different users might care about different levels of granularity or even prunings. Slide credit: Maria-Florina Balcan
Hierarchical Clustering Top-down (divisive) Partition data into 2-groups (e.g., 2-means) Recursively cluster each group Bottom-up (agglomerative) Start with every point in its own cluster. Repeatedly merge the closest two clusters Different definitions of closest give different algorithms. Slide credit: Maria-Florina Balcan
Bottom-up (agglomerative) Have a distance measure on pairs of objects. ? ?,? : Distance between ? and ? Single linkage: dist A,B = min x ?,? ? d(x,x ) Complete linkage: dist A,B = max x ?,? ? d(x,x ) Average linkage: dist A,B = average x ?,? ? d(x,x ) ? |?| ? +|?|mean ? mean ? 2 Ward s method dist A,B = Slide credit: Maria-Florina Balcan
Bottom-up (agglomerative) Single linkage: dist A,B = min x ?,? ? d(x,x ) At any time, distance between any two points in a connected components < r. Complete linkage: dist A,B = max x ?,? ? d(x,x ) Keep max diameter as small as possible at any level ? |?| ? +|?|mean ? mean ? 2 Ward s method dist A,B = Merge the two clusters such that the increase in k-means cost is as small as possible. Works well in practice Slide credit: Maria-Florina Balcan
Things to remember Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering