Interactive Hierarchical Clustering Visualization

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Controls

Data Generation

Number of Points:

Distribution:

True Clusters:

Cluster Separation: 5.0

Hierarchical Clustering Parameters

Linkage Method:

Distance Metric:

Number of Clusters:

Visualization Options

Show Dendrogram

Show Cluster Centroids

Show True Clusters

Algorithm Stats

Current Step: 0

Current Clusters: 0

Last Merged Distance: -

Status: Not Started

Dendrogram

How Hierarchical Clustering Works

Hierarchical Clustering builds a hierarchy of clusters by either a bottom-up (agglomerative) or top-down (divisive) approach. This visualization focuses on the more common agglomerative method.

Agglomerative Clustering Algorithm:

Start: Each data point begins as its own cluster
Merge: At each step, the two most similar clusters are merged into a single cluster
Update: Recalculate distances between the new cluster and all other clusters
Repeat: Continue merging clusters until only one cluster remains (or until desired cluster count is reached)

Linkage Methods:

Single Linkage: Distance between the closest points of two clusters (minimum distance)
Complete Linkage: Distance between the farthest points of two clusters (maximum distance)
Average Linkage: Average distance between all pairs of points in the two clusters
Ward's Method: Minimizes the increase in variance for the clusters being merged

Dendrogram:

A tree diagram that shows the hierarchical relationship between clusters
Horizontal position of merge points represents the distance at which clusters were merged
Cutting the dendrogram at a particular height produces a specific number of clusters

Advantages:

No need to specify the number of clusters in advance
Produces a dendrogram that provides insights into data structure
Can identify hierarchical relationships between clusters

Limitations:

Computationally intensive for large datasets (O(n²) space complexity, O(n³) time complexity in worst case)
Sensitive to noise and outliers
Different linkage criteria can yield very different results