
Clustering Techniques for Data Analysis
Explore the world of clustering in data analysis through K-means clustering, gene expression analysis, and derivative clustering. Learn about Lloyd's algorithm, computational problems, and pseudo-pseudocode for organizing data into clusters efficiently. Dive into the visual representation of data with plotting techniques.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Lloyd Algorithm K-Means Clustering
Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying which genes are expressed at a given moment can help determine function.
Grouping Grouping genes by derivative. Data must be clustered by derivative.
Clustering Problems Cluster d data points into k clusters, such that each point is closer to the points in its cluster than those of any other. Data is usually not that clearly organized.
Lloyds Algorithm Assign points to clusters, minimizing distance between points and centers of clusters. Assign cluster center of gravity as new center, repeat until centers do not change, minimize squared error distortion.
The Computational Problem Input: A matrix of points with dimensions m and the desired number of clusters k. Output: Points organized into k clusters, minimizing distance from center, and a visual representation of the data.
Pseudo-pseudocode Arbitrarily assign k centers. Assign points to k clusters, minimizing Euclidian distance from center. Assign cluster center of gravity as new center. Repeat until algorithm converges