Return to Algorithms

Hierarchical Clustering Visualization

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Controls

Data Generation

5.0

Hierarchical Clustering Parameters

Visualization Options

Algorithm Stats

Current Step: 0
Current Clusters: 0
Last Merged Distance: -
Status: Not Started

Dendrogram

How Hierarchical Clustering Works

Hierarchical Clustering builds a hierarchy of clusters by either a bottom-up (agglomerative) or top-down (divisive) approach. This visualization focuses on the more common agglomerative method.

Agglomerative Clustering Algorithm:

  1. Start: Each data point begins as its own cluster
  2. Merge: At each step, the two most similar clusters are merged into a single cluster
  3. Update: Recalculate distances between the new cluster and all other clusters
  4. Repeat: Continue merging clusters until only one cluster remains (or until desired cluster count is reached)

Linkage Methods:

  • Single Linkage: Distance between the closest points of two clusters (minimum distance)
  • Complete Linkage: Distance between the farthest points of two clusters (maximum distance)
  • Average Linkage: Average distance between all pairs of points in the two clusters
  • Ward's Method: Minimizes the increase in variance for the clusters being merged

Dendrogram:

  • A tree diagram that shows the hierarchical relationship between clusters
  • Horizontal position of merge points represents the distance at which clusters were merged
  • Cutting the dendrogram at a particular height produces a specific number of clusters

Advantages:

  • No need to specify the number of clusters in advance
  • Produces a dendrogram that provides insights into data structure
  • Can identify hierarchical relationships between clusters

Limitations:

  • Computationally intensive for large datasets (O(n²) space complexity, O(n³) time complexity in worst case)
  • Sensitive to noise and outliers
  • Different linkage criteria can yield very different results