Return to Algorithms

Linear Regression Visualization

Data Points
Regression Line
True Relationship
Residuals

Controls

Data Generation

1.0

Linear Regression

Visualization Options

Regression Stats

True Relationship: y = 2x + 1
Fitted Equation: y = ?
MSE (Mean Squared Error): -
R² Value: -
Sum of Squared Residuals: -

How Linear Regression Works (And When It Fails)

Linear Regression Assumptions:

  • Linearity: The relationship between X and Y is linear
  • Homoscedasticity: The variance of residual is the same for any value of X
  • Independence: Observations are independent of each other
  • No outliers: Extreme values can have disproportionate influence on the model

Common Failure Cases:

  • Heteroscedastic Data: When variance increases with X, simple OLS is inefficient
  • Non-linear Relationships: Linear models can't capture curvature (e.g., quadratic patterns)
  • Outliers: Extreme values pull the line toward them, distorting the fit
  • Clustered Data: May indicate a need for separate models or additional features

Metrics:

  • Mean Squared Error (MSE): Average of squared residuals (lower is better)
  • R² Value: Proportion of variance explained by the model (higher is better, max 1.0)
  • Sum of Squared Residuals: Total squared vertical distance between points and the line

Try different data distributions to see how well linear regression performs in each scenario!