Batched #

Additional HasFDerivAt-level nodes for batched (3D) ops.

These are useful for MultiHeadAttention graphs where the head dimension is explicit:

batched matrix multiplication: (h×m×n) × (h×n×p) → (h×m×p)
batched (row-wise) softmax: h × (m×n) → h × (m×n)

All results here are spec-level over ℝ.

noncomputable def Proofs.Autograd.TapeNodes.Batched.heads {h n : ℕ} (x : Vec (h * n)) :

Fin h → Vec n

Split a flattened h * n vector into h “heads” of length n.

This is the vector-level analogue of reshaping (..., h*n) into (..., h, n). It is used to define batched operations as head-wise operations.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.unheads {h n : ℕ} (r : Fin h → Vec n) :

Vec (h * n)

Inverse of heads: concatenate head vectors back into one flattened vector.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.headsCLM {h n : ℕ} :

Vec (h * n) →L[ℝ ] Fin h → Vec n

Continuous linear map version of heads.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.unheadsCLM {h n : ℕ} :

(Fin h → Vec n) →L[ℝ ] Vec (h * n)

Continuous linear map version of unheads.

Instances For

source

@[simp]

theorem Proofs.Autograd.TapeNodes.Batched.headsCLM_apply {h n : ℕ} (x : Vec (h * n)) :

headsCLM x = heads x

source

@[simp]

theorem Proofs.Autograd.TapeNodes.Batched.unheadsCLM_apply {h n : ℕ} (r : Fin h → Vec n) :

unheadsCLM r = unheads r

source

theorem Proofs.Autograd.TapeNodes.Batched.size3_eq (h m n : ℕ) :

(Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))).size = h * (m * n)

source

@[reducible, inline]

abbrev Proofs.Autograd.TapeNodes.Batched.HMatSize (h m n : ℕ) :

ℕ

Flattened size of h many m×n matrices (row-major): h * (m*n).

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.matmulBilin {h m n p : ℕ} :

Vec (HMatSize h m n) →L[ℝ ] Vec (HMatSize h n p) →L[ℝ ] Vec (HMatSize h m p)

Bilinear map for batched matmul, packaged as A →L (B →L A ⬝ B) in flattened form.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.matmul {Γ : List Spec.Shape} {h m n p : ℕ} (A : Idx Γ (Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar)))) (B : Idx Γ (Spec.Shape.dim h (Spec.Shape.dim n (Spec.Shape.dim p Spec.Shape.scalar)))) :

Node Γ (Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim p Spec.Shape.scalar)))

Batched matmul node (head-wise): (h×m×n) × (h×n×p) → (h×m×p).

PyTorch analogue: torch.matmul with leading batch dimension h. https://pytorch.org/docs/stable/generated/torch.matmul.html

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.matmulFderiv {Γ : List Spec.Shape} {h m n p : ℕ} (A : Idx Γ (Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar)))) (B : Idx Γ (Spec.Shape.dim h (Spec.Shape.dim n (Spec.Shape.dim p Spec.Shape.scalar)))) :

NodeFDerivCorrect (matmul A B)

NodeFDerivCorrect for the batched matmul node.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.softmaxLast {Γ : List Spec.Shape} {h m n : ℕ} (idx : Idx Γ (Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar)))) :

Node Γ (Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar)))

Batched row-wise softmax node: apply softmax_last independently per head.

Shape: h × (m×n) → h × (m×n), where each head contains an m×n matrix and softmax is along the last axis (size n) within each row.

PyTorch analogue: torch.nn.functional.softmax(x, dim=-1) with a leading batch dimension. https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.Batched.softmaxLastFderiv {Γ : List Spec.Shape} {h m n : ℕ} (idx : Idx Γ (Spec.Shape.dim h (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar)))) :

NodeFDerivCorrect (softmaxLast idx)

NodeFDerivCorrect for softmax_last in the batched/head-wise setting.

Instances For

TorchLean API

NN.Proofs.Autograd.Tape.Nodes.Batched

Batched #