Overview of Unsupervised Learning in Machine Learning

Clustering

Jia-Bin Huang

Virginia Tech

Spring 2019

ECE-5424G / CS-5824

Administrative

•

HW 2 grades out.

•

HW 3 March 27.

•

Started grading Midterm. Results expected to be out by March 27

•

Final project question?

•

Feedback please!

https://goo.gl/forms/A6NSq5yMpkuWEMR43

Where are we? (Supervised Learning)

•

K-Nearest Neighbor

•

Linear Regression

•

Naive Bayes

•

Logistic Regression

•

Regularization

•

Support Vector Machines

•

Neural Networks

•

Diagnosing ML systems

What’s next? (Unsupervised Learning)

•

Clustering

•

Expectation Maximization (EM) and

Gaussian Mixture Models (GMM)

•

Dimensionality reduction

•

Anomaly Detection

•

Recommender systems

What’s next? (Advanced topics)

•

Supervised Learning

•

Ensemble methods

•

Structured prediction

•

Semi-supervised learning

•

Generative Models

•

Probabilistic Graphical Models

•

Sequence Prediction Models

•

Reinforcement Learning

•

Markov Decision Process, Q-learning, Policy gradient

Clustering

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

Clustering

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

Supervised learning

Slide credit: Andrew Ng

Unsupervised learning

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Clustering

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

K-means algorithm

Slide credit: Andrew Ng

K-means algorithm

Cluster assignment step

Centroid update step

Slide credit: Andrew Ng

Clustering

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

K-means optimization objective

Slide credit: Andrew Ng

K-means algorithm

Slide credit: Andrew Ng

Objective based clustering

Clustering

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

Random initialization

Slide credit: Andrew Ng

Local optima

Slide credit: Andrew Ng

Local optima

Slide credit: Maria-Florina Balcan

1) Multiple random initialization

Slide credit: Andrew Ng

2) Furthest Point Heuristic

Slide credit: Maria-Florina Balcan

3) K-means++ initialization

[Arthur and Vassilvitskii 2007]

3) K-means++ initialization

[Arthur and Vassilvitskii 2007]

scikit-learn uses K-means++ as default

How to choose K?

•

1) Elbow method

•

2) Try multiple K and use performance from downstream task for

selection

Clustering

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

Hierarchical Clustering

•

A hierarchy might be more nature

•

Different users might care about different levels of granularity or even

prunings.

Slide credit: Maria-Florina Balcan

Hierarchical Clustering

•

Top-down (divisive)

•

Partition data into 2-groups (e.g., 2-means)

•

Recursively cluster each group

•

Bottom-up (agglomerative)

•

Start with every point in its own cluster.

•

Repeatedly merge the “closest” two clusters

•

Different definitions of “closest” give different algorithms.

Slide credit: Maria-Florina Balcan

Bottom-up (agglomerative)

Slide credit: Maria-Florina Balcan

Bottom-up (agglomerative)

Slide credit: Maria-Florina Balcan

Things to remember

•

Intro to unsupervised learning

•

K-means algorithm

•

Optimization objective

•

Initialization and the number of clusters

•

Hierarchical clustering

Slide Note

Embed Share

Download Presentation

This presentation covers various topics in unsupervised learning, including clustering, expectation maximization, Gaussian mixture models, dimensionality reduction, anomaly detection, and recommender systems. It also delves into advanced supervised learning techniques, ensemble methods, structured prediction, and reinforcement learning, providing a comprehensive insight into the field of machine learning.

neave Follow

Uploaded on Aug 03, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Clustering Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative HW 2 grades out. HW 3 March 27. Started grading Midterm. Results expected to be out by March 27 Final project question? Feedback please! https://goo.gl/forms/A6NSq5yMpkuWEMR43

Where are we? (Supervised Learning) K-Nearest Neighbor Linear Regression Naive Bayes Logistic Regression Regularization Support Vector Machines Neural Networks Diagnosing ML systems

Whats next? (Unsupervised Learning) Clustering Expectation Maximization (EM) and Gaussian Mixture Models (GMM) Dimensionality reduction Anomaly Detection Recommender systems

Whats next? (Advanced topics) Supervised Learning Ensemble methods Structured prediction Semi-supervised learning Generative Models Probabilistic Graphical Models Sequence Prediction Models Reinforcement Learning Markov Decision Process, Q-learning, Policy gradient

Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

Supervised learning ?2 ? ?1 Training set: ?1,?1, ?2,?2, , ??,?? Slide credit: Andrew Ng

Unsupervised learning ?2 ?1 Training set: ?1,?2,?3, ,?? Slide credit: Andrew Ng

Slide credit: Andrew Ng

Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

K-means algorithm Input: ? (number of clusters) Training set ?(1),?(2),?(3), ,?(?) ?(?) ? (note: drop ?0= 1 convention) Slide credit: Andrew Ng

K-means algorithm Randomly initialize ? cluster centroids ?1,?2, ,?? ? Repeat{ for ? = 1 to ? ?(?) index (from 1 to ?) of cluster centroid closest to ?(?) for ? = 1 to ? ?? average (mean) of points assigned to cluster ? Cluster assignment step Centroid update step } Slide credit: Andrew Ng

Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

K-means optimization objective ?(?) =Index of cluster (1, 2, K) to which example ?? is currently assigned ?? = cluster centroid ? (?? ?) ??(?) = cluster centroid of cluster to which example ?? has been assigned Optimization objective: ? ?1, ,??,?1, ,?? =1 Example: ?(?)= 5 ?(?)= 5 ??(?) = ?5 ? 2 ?? ??? ? ?=1 ? ?1, ,??,?1, ,?? min ?1, ,?? ?1, ,?? Slide credit: Andrew Ng

K-means algorithm Randomly initialize ? cluster centroids ?1,?2, ,?? ? Cluster assignment step ? ?1, ,??,?1, ,?? =1 ? Repeat{ closest to ?(?) for ? = 1 to ? ?? average (mean) of points assigned to cluster ? } 2 ?? ??? ? ?=1 for ? = 1 to ? ?(?) index (from 1 to ?) of cluster centroid Centroid update step ? ?1, ,??,?1, ,?? =1 ? 2 ?? ??? ? ?=1 Slide credit: Andrew Ng

Objective based clustering Input: ? (number of clusters) Training set ?(1),?(2),?(3), ,?(?) Distance/dissimilarity measure d(??,??) Goal: output a partition of the data K-means: Euclidean distance K-median: Absolute distance K-center: minimize the maximum radius

Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

Random initialization Randomly pick ? training examples Set ?1,?2, ,?? equal to those ? examples ?2 ?1 Slide credit: Andrew Ng

Local optima Slide credit: Andrew Ng

Local optima Slide credit: Maria-Florina Balcan

1) Multiple random initialization For i = 1 to 100 { Randomly initialize K-means Run K-means. Get ?1, ,??,?1, ,?? Compute the cost function (distortion) ?(?1, ,??,?1, ,??) } Pick clustering that gave the lowest cost ?(?1, ,??,?1, ,??) Slide credit: Andrew Ng

2) Furthest Point Heuristic Choose ?1 arbitrarily (or at random) For j = 2 to K Pick ?? among data points ?(1),?(2), ,?(?) that is farthest from previously chosen ?1,?2, ,?? 1 Slide credit: Maria-Florina Balcan

3) K-means++ initialization Interpolate between random and furthest point initialization Idea: Let D(?) be the distance between a point ? and its nearest center. Chose the next center proportional to D2(?). Choose ?1 arbitrarily (or at random) For j = 2 to K Pick ?? among data points ?(1),?(2), ,?(?) according to the distribution 2 Pr ??= ?? ? <??? ?? min [Arthur and Vassilvitskii 2007]

3) K-means++ initialization Interpolate between random and furthest point initialization Idea: Let D(?) be the distance between a point ? and its nearest center. Chose the next center proportional to D?(?). ? = 0: random sampling ? = : furthest point ? = 2: K-means++ scikit-learn uses K-means++ as default [Arthur and Vassilvitskii 2007]

How to choose K? 1) Elbow method 2) Try multiple K and use performance from downstream task for selection

Clustering Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

Hierarchical Clustering A hierarchy might be more nature Different users might care about different levels of granularity or even prunings. Slide credit: Maria-Florina Balcan

Hierarchical Clustering Top-down (divisive) Partition data into 2-groups (e.g., 2-means) Recursively cluster each group Bottom-up (agglomerative) Start with every point in its own cluster. Repeatedly merge the closest two clusters Different definitions of closest give different algorithms. Slide credit: Maria-Florina Balcan

Bottom-up (agglomerative) Have a distance measure on pairs of objects. ? ?,? : Distance between ? and ? Single linkage: dist A,B = min x ?,? ? d(x,x ) Complete linkage: dist A,B = max x ?,? ? d(x,x ) Average linkage: dist A,B = average x ?,? ? d(x,x ) ? |?| ? +|?|mean ? mean ? 2 Ward s method dist A,B = Slide credit: Maria-Florina Balcan

Bottom-up (agglomerative) Single linkage: dist A,B = min x ?,? ? d(x,x ) At any time, distance between any two points in a connected components < r. Complete linkage: dist A,B = max x ?,? ? d(x,x ) Keep max diameter as small as possible at any level ? |?| ? +|?|mean ? mean ? 2 Ward s method dist A,B = Merge the two clusters such that the increase in k-means cost is as small as possible. Works well in practice Slide credit: Maria-Florina Balcan

Things to remember Intro to unsupervised learning K-means algorithm Optimization objective Initialization and the number of clusters Hierarchical clustering

Overview of Unsupervised Learning in Machine Learning

Download Presentation

Presentation Transcript

Related

More Related Content