Linear layer (spec layer) #
This file defines a fully‑connected layer and its gradients:
- forward:
y = W x + b - backward: ∂L/∂W, ∂L/∂b, ∂L/∂x
Definitions are purely functional and shape‑indexed, suitable for both proofs and reuse by
autograd wrappers in NN/Spec/Autograd.
Linear layer specification (pure, shape-indexed).
This is the spec-level analogue of PyTorch torch.nn.Linear / torch.nn.functional.linear:
- weights : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))
weights.
- bias : Tensor α (Shape.dim outDim Shape.scalar)
bias.
Instances For
Unbatched forward pass: y = W x + b.
PyTorch analogue: torch.nn.functional.linear.
Instances For
Batched forward pass (map the unbatched linear_spec over the batch axis).
Input shape: [batch, inDim]
Output shape: [batch, outDim]
PyTorch analogue: applying nn.Linear to a batched tensor.
Instances For
Gradient w.r.t. weights: ∂L/∂W = (∂L/∂y) ⊗ x (outer product).
This is the standard linear-layer backward formula for y = W x + b.
Instances For
Gradient w.r.t. bias: ∂L/∂b = ∂L/∂y.
Since y = W x + b, the Jacobian of y w.r.t. b is the identity.
Instances For
Gradient w.r.t. input: ∂L/∂x = Wᵀ (∂L/∂y).
This is the standard "matmul by the transpose" rule for y = W x + b.
Instances For
Batched derivatives (∂L/∂W, ∂L/∂b, ∂L/∂x) for a batch of size batch + 1.
This is a convenience wrapper that uses matrix operations to compute:
d_weights = (grad_outputᵀ) · input,d_bias = sum(grad_output)over the batch axis,d_input = grad_output · weights.
Instances For
Complete unbatched backward pass for a linear layer.
Returns (∂L/∂W, ∂L/∂b, ∂L/∂x) given the layer params, input x, and output gradient ∂L/∂y.
Instances For
Accumulate two weight gradients by addition.
This is a small helper used by batching/training code.
Instances For
Scale a weight gradient by a scalar factor (e.g. learning-rate adjustment).