3.2. Autograd Walkthrough
Autograd is the runtime bridge between executable model code and gradient based training. The TorchLean version is deliberately close to the PyTorch mental model, but it makes the important objects visible.
PyTorch hides many details behind loss.backward(). TorchLean exposes the pieces:
-
the function or model being differentiated,
-
the input and parameter shapes,
-
the seed for a vector-Jacobian product when one is needed,
-
the gradient tensors returned by the runtime.
3.2.1. A Scalar Function
The smallest case is a scalar valued tensor function.
import NN
open Spec
open Tensor
open NN.Tensor
open NN.API
def sumsq : autograd.fn1.Fn (Shape.Vec 2) Shape.scalar :=
fun x => do
let y ← nn.functional.square x
nn.functional.mean y
def demo : IO Unit := do
let x : Tensor Float (Shape.Vec 2) := tensorND! [2] [0.5, -1.2]
let g ← autograd.fn1.grad (α := Float) sumsq x
IO.println s!"grad = {Spec.pretty g}"
The shape tells us that sumsq consumes a vector of length two and returns a scalar. Because the
output is scalar, autograd.fn1.grad returns a tensor with the same shape as the input.
For this example, the mathematical gradient is:
\nabla_x \operatorname{mean}(x^2) = x
so the returned gradient has the same two coordinates as x.
3.2.2. Value And Gradient Together
For debugging it is often useful to compute the value and gradient in one call:
let (value, grad) ← autograd.fn1.valueAndGrad (α := Float) sumsq x
This mirrors the common workflow:
y = f(x) dy_dx = grad(y, x)
TorchLean keeps both results ordinary values. There is no hidden .grad field that must be cleared
before the next step.
3.2.3. Vector Outputs Need A Seed
If a function returns a vector, there is not a single gradient until we choose how to combine the output coordinates. A vector-Jacobian product supplies that choice.
let dx ← autograd.fn1.vjp (α := Float) f x seedOut
Informally:
\operatorname{vjp}(f,x,\bar y) = J_f(x)^\mathsf{T}\bar y
Here seedOut is the output cotangent. It has the same shape as the output of f, and the returned
tensor has the same shape as x.
This is exactly the operation reverse mode needs. Each local operation sends an output cotangent back to input cotangents; composing those local rules gives the full gradient.
3.2.4. Model Parameters
Training usually differentiates a loss with respect to parameters, not only with respect to an input tensor. The model-level API uses the same idea, but the returned gradient has the same parameter structure as the model.
The usual calls are:
-
autograd.model.gradParamsfor gradients of the loss with respect to parameters, -
autograd.model.gradInputsfor gradients with respect to input and target tensors, -
autograd.model.valueAndGradParamswhen the loss value and parameter gradients are both needed.
Those declarations live in NN.API.Public, while the lower runtime implementation lives in NN.API.Runtime and NN.Runtime.Autograd.TorchLean.Autodiff.
3.2.5. Jacobians, JVPs, And HVPs
TorchLean also exposes analysis tools beyond the basic training gradient:
-
autograd.fn1.jacfwdfor a forward mode Jacobian, -
autograd.fn1.jacrevfor a reverse mode Jacobian, -
autograd.fn1.hessianfor scalar valued functions, -
autograd.model.jvpParamsfor a directional derivative of a scalar loss, -
autograd.model.hvpParamsfor Hessian vector products over parameters.
These are useful when a chapter talks about sensitivity, curvature, verification, or second order diagnostics. They are also a good way to see why the typed shape discipline matters: every returned object has the shape dictated by the derivative it represents.
3.2.6. From Local Rules To Global Backprop
Autograd correctness has a simple informal shape.
-
Each primitive operation has a forward meaning.
-
Each primitive operation has a local VJP or JVP rule.
-
If those local rules match the derivative of the forward operation, then the composed reverse pass computes the adjoint derivative of the whole graph.
The guide states this informally because it is the right mental model for users. The proof chapters then point to the exact Lean theorems that make the statement precise.
The main proof trail starts here:
3.2.7. A Practical Rule
Use the smallest API that matches the question.
-
For a gradient of
Tensor σ -> scalar, start withautograd.fn1.grad. -
For a value and gradient together, start with
autograd.fn1.valueAndGrad. -
For a vector output with a chosen cotangent, start with
autograd.fn1.vjp. -
For a model loss gradient with respect to parameters, start with
autograd.model.gradParams. -
For minibatch training, start with
train.fitLoaderor the quickstart examples.
For a runnable tour, open NN.Examples.Quickstart.AutogradBasics. It prints gradients, VJPs, Jacobians, Hessian vector products, and parameter gradients in one place.