Linear regression (spec model) #

Defines linear regression as a dot product plus bias (one output):

y = wᵀ x + b

We aim to stay close to PyTorch's mental model:

torch.nn.Linear(in_features, out_features=1) for the forward pass,
torch.nn.functional.mse_loss(..., reduction="mean") for the MSE objective,
an SGD-style parameter update step (as in torch.optim.SGD) for training.

This file is a spec: it states the math (forward + VJPs) with shapes tracked by the type system. It prioritizes clarity and explicit derivatives over performance, and it does not include the closed-form normal-equations solution.

source

structure Spec.LinearRegressionSpec (α : Type) (inDim : ℕ) :

Type

Parameters for a single-output linear regression model.

PyTorch analogy: the weights and bias fields correspond to nn.Linear(inDim, 1).weight and nn.Linear(inDim, 1).bias, but with shapes tracked in the tensor type.

weights : Tensor α (Shape.dim inDim Shape.scalar)
weights.
bias : Tensor α Shape.scalar
bias.

Instances For

source

def Spec.linearRegressionForwardSpec {α : Type} [Context α] {inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) :

Tensor α Shape.scalar

Forward pass for linear regression: y = wᵀ x + b.

Instances For

source

def Spec.linearRegressionBatchedForwardSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) :

Tensor α (Shape.dim batch Shape.scalar)

Batched forward pass, applied independently to each input row.

Instances For

source

def Spec.linearRegressionWeightsDerivSpec {α : Type} [Context α] {inDim : ℕ} (input : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α Shape.scalar) :

Tensor α (Shape.dim inDim Shape.scalar)

VJP contribution for weights: dL/dw = x * (dL/dy) (scalar-times-vector scaling).

Instances For

source

def Spec.linearRegressionBiasDerivSpec {α : Type} {inDim : ℕ} (_weights : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α Shape.scalar) (_input : Tensor α (Shape.dim inDim Shape.scalar)) :

Tensor α Shape.scalar

VJP contribution for bias: dL/db = dL/dy.

Instances For

source

def Spec.linearRegressionInputDerivSpec {α : Type} [Context α] {inDim : ℕ} (weights : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α Shape.scalar) :

Tensor α (Shape.dim inDim Shape.scalar)

VJP contribution for input: dL/dx = w * (dL/dy).

Instances For

source

def Spec.linearRegressionBackwardSpec {α : Type} [Context α] {inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α Shape.scalar) :

Tensor α (Shape.dim inDim Shape.scalar) × Tensor α Shape.scalar × Tensor α (Shape.dim inDim Shape.scalar)

Full backward pass returning (dW, db, dX) in that order.

Instances For

source

def Spec.linearRegressionBatchedBackwardSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim batch Shape.scalar)) (h : batch ≠ 0) :

Tensor α (Shape.dim inDim Shape.scalar) × Tensor α Shape.scalar × Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))

Batched backward pass.

This aggregates parameter gradients across the batch (a sum over batch), matching PyTorch's default behavior for loss reductions like "mean" when you subsequently scale appropriately.

Instances For

source

def Spec.mseLossSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (target : Tensor α (Shape.dim batch Shape.scalar)) (h : batch ≠ 0) :

Tensor α Shape.scalar

Mean Squared Error loss (MSE).

PyTorch analogy: F.mse_loss(predictions, target, reduction="mean").

Note: the batch ≠ 0 hypothesis avoids dividing by zero.

Instances For

source

def Spec.mseLossGradSpec {α : Type} [Context α] {batch : ℕ} (predictions target : Tensor α (Shape.dim batch Shape.scalar)) :

Tensor α (Shape.dim batch Shape.scalar)

Gradient of MSE w.r.t. predictions: d/dy (mean (y - t)^2) = (2/batch) * (y - t).

This is only meaningful when batch > 0 (callers typically already carry batch ≠ 0).

Instances For

source

def Spec.linearRegressionTrainStepSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (target : Tensor α (Shape.dim batch Shape.scalar)) (learning_rate : α) (h : batch ≠ 0) :

Tensor α Shape.scalar × LinearRegressionSpec α inDim

One gradient-descent training step for linear regression.

Instances For

source

def Spec.linearRegressionOpSpec {α : Type} [Context α] {inDim : ℕ} (model : LinearRegressionSpec α inDim) :

OpSpec α (Shape.dim inDim Shape.scalar) Shape.scalar

OpSpec wrapper for linear regression.

This is useful when composing the op in a spec-level AD development.

Instances For

source

def Spec.rSquaredSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (target : Tensor α (Shape.dim batch Shape.scalar)) (h : batch ≠ 0) :

Tensor α Shape.scalar

R-squared (coefficient of determination) for model evaluation.

PyTorch analogy: there is no single built-in for R² in core PyTorch; this matches the standard definition 1 - SS_res / SS_tot.

Note: if SS_tot = 0 (targets are constant), this divides by zero. Many libraries treat that as a special case; this spec keeps the plain formula.

Instances For

source

def Spec.ridgeRegressionForwardSpec {α : Type} [Context α] {inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) (_lambda : α) :

Tensor α Shape.scalar

Ridge regression forward pass.

Regularization changes the objective, not the raw prediction function, so the forward pass is identical to ordinary linear regression.

Reference: Hoerl and Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems" (1970). https://doi.org/10.1080/00401706.1970.10488634

Instances For

source

def Spec.ridgeLossSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (target : Tensor α (Shape.dim batch Shape.scalar)) (lambda : α) (h : batch ≠ 0) :

Tensor α Shape.scalar

Ridge loss: MSE plus lambda * ||w||_2^2.

Instances For

source

def Spec.ridgeWeightsDerivSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim batch Shape.scalar)) (lambda : α) (h : batch ≠ 0) :

Tensor α (Shape.dim inDim Shape.scalar)

Ridge gradient w.r.t. weights.

This is the usual batched gradient plus the derivative of lambda * ||w||_2^2, which contributes 2 * lambda * w.

Instances For

source

def Spec.lassoSoftThresholdSpec {α : Type} [Context α] {inDim : ℕ} (weights : Tensor α (Shape.dim inDim Shape.scalar)) (threshold : α) :

Tensor α (Shape.dim inDim Shape.scalar)

Soft-thresholding operator (often written S_λ), used in proximal-gradient updates for L1.

Reference: Tibshirani, "Regression Shrinkage and Selection via the Lasso" (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Instances For

source

def Spec.lassoRegressionForwardSpec {α : Type} [Context α] {inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) (_lambda : α) :

Tensor α Shape.scalar

Lasso forward pass (same raw prediction function as ordinary linear regression).

Instances For

source

def Spec.lassoLossSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (target : Tensor α (Shape.dim batch Shape.scalar)) (lambda : α) (h : batch ≠ 0) :

Tensor α Shape.scalar

Lasso loss: MSE plus lambda * ||w||_1.

Instances For

source

def Spec.elasticNetLossSpec {α : Type} [Context α] {batch inDim : ℕ} (model : LinearRegressionSpec α inDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) (target : Tensor α (Shape.dim batch Shape.scalar)) (l1_ratio alpha : α) (h : batch ≠ 0) :

Tensor α Shape.scalar

Elastic net loss: a convex combination of L1 and L2 penalties.

Reference: Zou and Hastie, "Regularization and Variable Selection via the Elastic Net" (2005). https://doi.org/10.1111/j.1467-9868.2005.00503.x

Instances For

Polynomial features #

Polynomial regression can be expressed as linear regression on a fixed feature expansion φ(x) = [x, x^2, ..., x^degree] (per input coordinate). We keep this as a lightweight helper, then reuse linear_regression_forward_spec on the expanded input.

source

def Spec.polynomialFeaturesSpec {α : Type} [Context α] {inDim : ℕ} (degree : ℕ) (input : Tensor α (Shape.dim inDim Shape.scalar)) :

Tensor α (Shape.dim (inDim * degree) Shape.scalar)

Expand a length-inDim input vector into polynomial features up to degree.

This expansion does not include a constant feature (the model bias already plays that role).

Instances For

source

def Spec.polynomialFeaturesSpec.expand {α : Type} [Context α] {inDim : ℕ} (degree : ℕ) (values : Fin inDim → Tensor α Shape.scalar) (remaining : ℕ) (acc : List α) :

List α

Instances For

source

def Spec.polynomialRegressionForwardSpec {α : Type} [Context α] {inDim degree : ℕ} (model : LinearRegressionSpec α (inDim * degree)) (input : Tensor α (Shape.dim inDim Shape.scalar)) :

Tensor α Shape.scalar

Forward pass for polynomial regression: expand features, then run linear regression.

Instances For

TorchLean API

NN.Spec.Models.LinearRegression

Linear regression (spec model) #

Polynomial features #