Interactive UMAP Visualization

Original High-Dimensional Data

Partial view of high-dimensional data (only 3 dimensions shown)

UMAP Projection

Low-dimensional UMAP embedding of the data

Controls

Data Generation

Number of Points:

Distribution:

Original Dimensions:

Noise Level: 0.1

UMAP Parameters

n_neighbors:

min_dist:

0.10

n_components:

Distance Metric:

Visualization Options

Show k-NN Graph

Animate Layout Process

Color by Clusters

Point Size: 4.0

UMAP Stats

Current Step: Not Started

Iteration: -

Status: Not Started

Nearest Neighbor Graph

k-Nearest Neighbor graph used by UMAP

How UMAP Works

Uniform Manifold Approximation and Projection (UMAP) is a dimensionality reduction technique that preserves both local and global structure of high-dimensional data. It's particularly effective for visualization and clustering tasks.

The UMAP Algorithm:

Construct Neighborhood Graph: Build a weighted k-nearest neighbor graph representing the high-dimensional manifold structure
Compute Fuzzy Topological Representation: Transform the neighborhood graph into a fuzzy topological representation using local connectivity patterns
Optimize Low-Dimensional Layout: Find a low-dimensional embedding that preserves the topological structure using stochastic gradient descent
Refine Embedding: Iteratively adjust the low-dimensional coordinates to minimize the difference between high and low-dimensional fuzzy topological representations

Key Parameters:

n_neighbors: Controls how UMAP balances local versus global structure. Lower values (2-15) emphasize local structure, higher values (50-200) preserve more global structure
min_dist: Controls how tightly UMAP packs points together. Smaller values create tighter, more clustered embeddings
metric: The distance function used to measure similarity between points
n_components: The dimensionality of the output embedding

Applications:

Visualization of high-dimensional data
Pre-processing for machine learning algorithms
Feature extraction and selection
Discovering clusters and patterns in complex datasets
Single-cell RNA sequencing analysis

Advantages over other techniques:

Better preservation of both local and global structure than t-SNE
More faithful representation of distances than PCA
Faster computation than t-SNE for large datasets
Theoretical foundation in Riemannian geometry and algebraic topology
Works well across various domains and data types