Linear layer (spec layer) #

This file defines a fully‑connected layer and its gradients:

forward: y = W x + b
backward: ∂L/∂W, ∂L/∂b, ∂L/∂x

Definitions are purely functional and shape‑indexed, suitable for both proofs and reuse by autograd wrappers in NN/Spec/Autograd.

source

structure Spec.LinearSpec (α : Type) (inDim outDim : ℕ) :

Type

Linear layer specification (pure, shape-indexed).

This is the spec-level analogue of PyTorch torch.nn.Linear / torch.nn.functional.linear:

weights has shape [outDim, inDim],
bias has shape [outDim].

weights : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))
weights.
bias : Tensor α (Shape.dim outDim Shape.scalar)
bias.

Instances For

source

def Spec.linearSpec {α : Type} [Add α] [Mul α] [Zero α] {inDim outDim : ℕ} (m : LinearSpec α inDim outDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) :

Tensor α (Shape.dim outDim Shape.scalar)

Unbatched forward pass: y = W x + b.

PyTorch analogue: torch.nn.functional.linear.

Instances For

source

def Spec.linearBatchedSpec {α : Type} [Add α] [Mul α] [Zero α] {batch inDim outDim : ℕ} (m : LinearSpec α inDim outDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) :

Tensor α (Shape.dim batch (Shape.dim outDim Shape.scalar))

Batched forward pass (map the unbatched linear_spec over the batch axis).

Input shape: [batch, inDim] Output shape: [batch, outDim]

PyTorch analogue: applying nn.Linear to a batched tensor.

Instances For

source

def Spec.linearWeightsDerivSpec {α : Type} [Mul α] {inDim outDim : ℕ} (input : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) :

Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))

Gradient w.r.t. weights: ∂L/∂W = (∂L/∂y) ⊗ x (outer product).

This is the standard linear-layer backward formula for y = W x + b.

Instances For

source

def Spec.linearBiasDerivSpec {α : Type} {inDim outDim : ℕ} (_dW : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) (_input : Tensor α (Shape.dim inDim Shape.scalar)) :

Tensor α (Shape.dim outDim Shape.scalar)

Gradient w.r.t. bias: ∂L/∂b = ∂L/∂y.

Since y = W x + b, the Jacobian of y w.r.t. b is the identity.

Instances For

source

def Spec.linearInputDerivSpec {α : Type} [Add α] [Mul α] [Zero α] {inDim outDim : ℕ} (weights : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) :

Tensor α (Shape.dim inDim Shape.scalar)

Gradient w.r.t. input: ∂L/∂x = Wᵀ (∂L/∂y).

This is the standard "matmul by the transpose" rule for y = W x + b.

Instances For

source

def Spec.batchLinearDerivSpec {α : Type} [Add α] [Mul α] [Zero α] {batch inDim outDim : ℕ} (weights : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (input : Tensor α (Shape.dim (batch + 1) (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim (batch + 1) (Shape.dim outDim Shape.scalar))) :

Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar)) × Tensor α (Shape.dim outDim Shape.scalar) × Tensor α (Shape.dim (batch + 1) (Shape.dim inDim Shape.scalar))

Batched derivatives (∂L/∂W, ∂L/∂b, ∂L/∂x) for a batch of size batch + 1.

This is a convenience wrapper that uses matrix operations to compute:

d_weights = (grad_outputᵀ) · input,
d_bias = sum(grad_output) over the batch axis,
d_input = grad_output · weights.

Instances For

source

def Spec.linearBackwardSpec {α : Type} [Add α] [Mul α] [Zero α] {inDim outDim : ℕ} (layer : LinearSpec α inDim outDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) :

Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar)) × Tensor α (Shape.dim outDim Shape.scalar) × Tensor α (Shape.dim inDim Shape.scalar)

Complete unbatched backward pass for a linear layer.

Returns (∂L/∂W, ∂L/∂b, ∂L/∂x) given the layer params, input x, and output gradient ∂L/∂y.

Instances For

source

def Spec.linearGradientAccumulateSpec {α : Type} [Add α] {inDim outDim : ℕ} (grad1 grad2 : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) :

Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))

Accumulate two weight gradients by addition.

This is a small helper used by batching/training code.

Instances For

source

def Spec.linearGradientScaleSpec {α : Type} [Mul α] {inDim outDim : ℕ} (grad : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (scale_factor : α) :

Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))

Scale a weight gradient by a scalar factor (e.g. learning-rate adjustment).

Instances For

TorchLean API

NN.Spec.Layers.Linear

Linear layer (spec layer) #