DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an algorithm that groups together points that are closely packed, while marking points in low-density regions as outliers.
The Algorithm:
- Core Points: Find all points that have at least minPts neighbors within distance ε (epsilon)
- Cluster Formation: Expand clusters from core points by recursively adding all density-connected points
- Border Points: Points that are within ε of a core point but don't have enough neighbors themselves
- Noise Points: Points that are neither core nor border points
Key Advantages:
- Does not require specifying the number of clusters in advance
- Can find arbitrarily shaped clusters
- Robust to outliers
- Only has two parameters: ε and minPts
Limitations:
- Struggles with clusters of varying densities
- Sensitive to parameter choices
- Not effective when clusters are very close
- Can be computationally intensive for large datasets
Parameter Selection:
- ε (Epsilon): The radius within which to search for neighbors
- minPts: Typically 2×dimensions is recommended (e.g., 4 for 2D data)