Support Vector Machines (spec models) #
This file provides a small linear SVM baseline with explicit gradients.
PyTorch mental model:
- scoring function:
score = X @ w + b(likenn.Linear(p, 1)without an activation), - loss: hinge loss on signed labels
y ∈ {−1, +1}:loss_i = max(0, 1 - y_i * score_i), - optimization: a small deterministic gradient descent loop (not an optimized solver).
There are two "layers" in this file:
LinearSVM: the clean mathematical model + objective + backward pass (VJP-style gradients);fitLinearSVM/predict: a small training + prediction wrapper used by smoke tests and demos.
Note on naming: classic SVM literature often uses a parameter C that weights the hinge term.
In this file, fitLinearSVM takes a parameter named C, but we use it as the L2
regularization strength (the lambda in LinearSVM.backward) to keep the baseline small.
References:
- Cortes and Vapnik, "Support-Vector Networks", 1995.
- Vapnik, "The Nature of Statistical Learning Theory", 1995/1998.
Linear SVM (primal) #
Linear SVM parameters: a weight vector w and bias b.
We intentionally keep "training hyperparameters" (regularization strength, learning rate, etc.) out of the parameter record; those are choices about an optimizer, not part of the model itself.
- w : Spec.Tensor α (Spec.Shape.dim p Spec.Shape.scalar)
w.
- b : α
b.
Instances For
Decision function f(x) = w·x + b.
Instances For
Batch decision values for X : (n×p).
Instances For
Hinge loss per example: ℓ_i = max(0, 1 - y_i * f(x_i)).
We write it using if rather than max to make the "active-set" logic explicit.
Instances For
Mean hinge loss over a dataset.
Instances For
L2-regularized SVM objective (primal, soft-margin style).
We use the common "½λ‖w‖² + mean hinge" form.
Instances For
Backward pass #
For the objective
L(w,b) = ½λ‖w‖² + (1/n) Σ max(0, 1 - y_i (w·x_i + b))
the gradients are:
∂L/∂w = λ w + (1/n) Σ [margin_i < 1] * (-y_i x_i)∂L/∂b = (1/n) Σ [margin_i < 1] * (-y_i)
We also return ∂L/∂X because it is sometimes useful for sensitivity analysis.
PyTorch analogy: this is what autograd would compute for
0.5*λ*||w||^2 + mean(relu(1 - y*(X@w+b))), except we write it out explicitly.
Backward/VJP for the linear SVM objective.
Returns (dw, db, dX) where:
dw : ∂L/∂wdb : ∂L/∂bdX : ∂L/∂X(sometimes useful for sensitivity analysis)
Instances For
A Small Training Wrapper (Gradient Descent) #
The LinearSVM definitions above are enough for "spec math".
For demos/tests, it is convenient to package a trained parameter pair together with a simple
predictor, so we provide:
SVM: a small record holding(weights, bias)and a heuristic support-vector index tensor,fitLinearSVM: deterministic gradient descent usingLinearSVM.backward,predict: sign prediction as±1.
Small trained SVM bundle for demos/tests.
This is not a full SMO-style solver; it is a deterministic gradient-descent baseline that is useful as a reference model in the TorchLean spec layer.
- weights : Spec.Tensor α (Spec.Shape.dim p Spec.Shape.scalar)
Normal vector
wof the separating hyperplane. - bias : α
Bias/intercept term
b. - supportVectorIndices : Spec.Tensor ℕ (Spec.Shape.dim n Spec.Shape.scalar)
Heuristic support-vector indices (approximate: margin near
1).
Instances For
Heuristic support-vector index extractor.
We mark an example as a "support vector" if its margin is close to 1. This is only meant for
introspection and demos (it is not used by the optimizer).
Instances For
Fit a linear SVM by deterministic gradient descent on the primal objective.
Parameters:
learning_rate: gradient step sizeC: regularization strength (treated aslambda)iterations: number of GD steps
Instances For
Instances For
Predict signed labels ±1 for a batch X using the learned hyperplane.
Instances For
Linear kernel: k(x, y) = x·y.
Instances For
Polynomial kernel: k(x, y) = (x·y + c)^degree (naive power for generic α).
Instances For
RBF kernel: k(x, y) = exp(-gamma * ||x - y||^2).