K-Means is an iterative algorithm that partitions data into K distinct clusters by minimizing the within-cluster sum of squares (inertia).
The Algorithm:
- Initialization: Place K centroids randomly or using K-Means++ (more strategic placement)
- Assignment Step: Assign each data point to the nearest centroid, forming K clusters
- Update Step: Recalculate the position of each centroid as the mean of all points in its cluster
- Iteration: Repeat Assignment and Update steps until convergence or max iterations
Key Concepts:
- Inertia: Sum of squared distances from each point to its assigned centroid (lower is better)
- Silhouette Score: Measures how similar points are to their own cluster compared to other clusters (higher is better)
- Elbow Method: Plots inertia against different K values to find the "elbow point" where adding more clusters yields diminishing returns
Limitations:
- Assumes clusters are roughly spherical and equal in size
- Sensitive to initial centroid placement
- Requires specifying K in advance (hence the elbow method)
- Can be trapped in local minima