Linear regression (spec model) #
Defines linear regression as a dot product plus bias (one output):
y = wᵀ x + b
We aim to stay close to PyTorch's mental model:
torch.nn.Linear(in_features, out_features=1)for the forward pass,torch.nn.functional.mse_loss(..., reduction="mean")for the MSE objective,- an SGD-style parameter update step (as in
torch.optim.SGD) for training.
This file is a spec: it states the math (forward + VJPs) with shapes tracked by the type system. It prioritizes clarity and explicit derivatives over performance, and it does not include the closed-form normal-equations solution.
Parameters for a single-output linear regression model.
PyTorch analogy: the weights and bias fields correspond to nn.Linear(inDim, 1).weight and
nn.Linear(inDim, 1).bias, but with shapes tracked in the tensor type.
- weights : Tensor α (Shape.dim inDim Shape.scalar)
weights.
- bias : Tensor α Shape.scalar
bias.
Instances For
Forward pass for linear regression: y = wᵀ x + b.
Instances For
Batched forward pass, applied independently to each input row.
Instances For
VJP contribution for weights: dL/dw = x * (dL/dy) (scalar-times-vector scaling).
Instances For
VJP contribution for bias: dL/db = dL/dy.
Instances For
VJP contribution for input: dL/dx = w * (dL/dy).
Instances For
Full backward pass returning (dW, db, dX) in that order.
Instances For
Batched backward pass.
This aggregates parameter gradients across the batch (a sum over batch), matching PyTorch's
default behavior for loss reductions like "mean" when you subsequently scale appropriately.
Instances For
Mean Squared Error loss (MSE).
PyTorch analogy: F.mse_loss(predictions, target, reduction="mean").
Note: the batch ≠ 0 hypothesis avoids dividing by zero.
Instances For
Gradient of MSE w.r.t. predictions: d/dy (mean (y - t)^2) = (2/batch) * (y - t).
This is only meaningful when batch > 0 (callers typically already carry batch ≠ 0).
Instances For
One gradient-descent training step for linear regression.
Instances For
OpSpec wrapper for linear regression.
This is useful when composing the op in a spec-level AD development.
Instances For
R-squared (coefficient of determination) for model evaluation.
PyTorch analogy: there is no single built-in for R² in core PyTorch; this matches the standard
definition 1 - SS_res / SS_tot.
Note: if SS_tot = 0 (targets are constant), this divides by zero. Many libraries treat that
as a special case; this spec keeps the plain formula.
Instances For
Ridge regression forward pass.
Regularization changes the objective, not the raw prediction function, so the forward pass is identical to ordinary linear regression.
Reference: Hoerl and Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems" (1970). https://doi.org/10.1080/00401706.1970.10488634
Instances For
Ridge loss: MSE plus lambda * ||w||_2^2.
Instances For
Ridge gradient w.r.t. weights.
This is the usual batched gradient plus the derivative of lambda * ||w||_2^2, which contributes
2 * lambda * w.
Instances For
Soft-thresholding operator (often written S_λ), used in proximal-gradient updates for L1.
Reference: Tibshirani, "Regression Shrinkage and Selection via the Lasso" (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Instances For
Lasso forward pass (same raw prediction function as ordinary linear regression).
Instances For
Lasso loss: MSE plus lambda * ||w||_1.
Instances For
Elastic net loss: a convex combination of L1 and L2 penalties.
Reference: Zou and Hastie, "Regularization and Variable Selection via the Elastic Net" (2005). https://doi.org/10.1111/j.1467-9868.2005.00503.x
Instances For
Polynomial features #
Polynomial regression can be expressed as linear regression on a fixed feature expansion
φ(x) = [x, x^2, ..., x^degree] (per input coordinate). We keep this as a lightweight helper,
then reuse linear_regression_forward_spec on the expanded input.
Expand a length-inDim input vector into polynomial features up to degree.
This expansion does not include a constant feature (the model bias already plays that role).
Instances For
Forward pass for polynomial regression: expand features, then run linear regression.