Return to Algorithms

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Original High-Dimensional Data

Partial view of high-dimensional data (only 3 dimensions shown)

t-SNE Projection

Low-dimensional t-SNE embedding of the data

Controls

Data Generation

0.1

t-SNE Parameters

30
200
12

Visualization Options

4.0

t-SNE Stats

Current Step: Not Started
Iteration: -
KL Divergence: -
Status: Not Started

Probability Distributions

Comparison of high-dimensional and low-dimensional probabilities

How t-SNE Works

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction technique that visualizes high-dimensional data by giving each datapoint a location in a 2D or 3D map. It excels at preserving local structure, making it ideal for revealing clusters and patterns.

The t-SNE Algorithm:

  1. Compute Pairwise Affinities: Convert distances between points in the high-dimensional space into conditional probabilities, with closer points having higher probabilities
  2. Initialize Low-Dimensional Embedding: Start with a random arrangement of points in the lower-dimensional space
  3. Compute Low-Dimensional Affinities: Calculate similar probabilities in the low-dimensional space using a t-distribution (heavier tails than Gaussian)
  4. Minimize KL Divergence: Use gradient descent to minimize the difference between the high and low-dimensional probability distributions
  5. Early Exaggeration: Initially multiply the high-dimensional probabilities to create wider gaps between natural clusters

Key Parameters:

  • Perplexity: Balance between local and global structure (typically 5-50). Roughly equivalent to the number of nearest neighbors that influence each point
  • Learning Rate: Controls step size in gradient descent. Too high and points may form compact clusters separated by large gaps
  • Early Exaggeration: Multiplier for initial iterations that helps separate clusters
  • Number of Iterations: More iterations can improve the embedding quality but at computational cost

Applications:

  • Visualizing high-dimensional data in 2D or 3D
  • Exploring clustering structure in unlabeled data
  • Revealing patterns in image, text, and bioinformatics data
  • Feature extraction for downstream machine learning tasks

Differences from other techniques:

  • Compared to PCA: t-SNE is non-linear and better preserves local structure, but PCA preserves global structure better
  • Compared to UMAP: t-SNE focuses more on preserving local structure, while UMAP better balances local and global structure
  • Compared to MDS: t-SNE uses probability distributions instead of directly preserving distances
  • Advantages: Creates visually appealing plots that intuitively reveal clusters
  • Limitations: Sensitive to hyperparameters, computationally intensive, and can produce different results on multiple runs