Arithmetic and stochastic tape nodes #

Affine nodes, pointwise arithmetic, scaling, multiplication, squaring, and fixed-mask stochastic operators. Dropout appears here only in its fixed-mask deterministic form, which is the semantics that can be differentiated directly.

source

noncomputable def Proofs.Autograd.TapeNodes.affine {Γ : List Spec.Shape} {sIn sOut : Spec.Shape} (idx : Idx Γ sIn) (A : Vec sIn.size →L[ℝ ] Vec sOut.size) (b : Vec sOut.size) :

Node Γ sOut

Fixed affine node over arbitrary tensor shapes.

This is the proof-level version of a linear layer after all leading dimensions have been flattened into the vector representation used by CtxVec. It is intentionally shape-generic: a usual unbatched LinearSpec, a position-wise Transformer FFN map over a whole sequence, or any other fixed affine map can instantiate the same theorem by supplying the appropriate continuous linear map A and bias vector b.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.affineFderiv {Γ : List Spec.Shape} {sIn sOut : Spec.Shape} (idx : Idx Γ sIn) (A : Vec sIn.size →L[ℝ ] Vec sOut.size) (b : Vec sOut.size) :

NodeFDerivCorrect (affine idx A b)

NodeFDerivCorrect for a fixed affine map over arbitrary tensor shapes.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.add {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

Node Γ s

Add two same-shaped context entries.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.addFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

NodeFDerivCorrect (add a b)

NodeFDerivCorrect for add (derivative is pointwise addition of the two projections).

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.sub {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

Node Γ s

Subtract two same-shaped context entries.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.subFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

NodeFDerivCorrect (sub a b)

NodeFDerivCorrect for sub (derivative is pointwise subtraction of projections).

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.scale {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (c : ℝ) :

Node Γ s

Scale a context entry by a constant scalar.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.scaleFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (c : ℝ) :

NodeFDerivCorrect (scale idx c)

NodeFDerivCorrect for scale (derivative is scaling of the projection CLM).

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.fixedMaskScale {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (coeff : Vec s.size) :

Node Γ s

Apply a fixed elementwise multiplier.

This is the differentiable core of deterministic training-mode dropout: once a seed has sampled a mask, the forward pass is just x ↦ coeff ⊙ x (with coeff = mask / keepProb for inverted dropout). The randomness itself is not differentiated; it is represented by the fixed coeff.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.fixedMaskScaleFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (coeff : Vec s.size) :

NodeFDerivCorrect (fixedMaskScale idx coeff)

NodeFDerivCorrect for a fixed-mask scaling node.

The derivative is the diagonal linear map induced by coeff. This is the proof-level contract for seeded dropout after the mask has been generated: fixed seed and fixed keep probability determine coeff; gradients flow only through the input activation.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.invertedDropoutCoeff {s : Spec.Shape} (mask : Fin s.size → Bool) (keepProb : ℝ) :

Vec s.size

Coefficients for inverted dropout from a fixed Boolean keep mask and scalar keep probability.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.fixedInvertedDropout {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (mask : Fin s.size → Bool) (keepProb : ℝ) :

Node Γ s

Fixed-mask inverted dropout node.

This is the theorem-facing form of training-mode dropout. A runtime seed may generate mask, but the derivative theorem is stated after sampling: mask and keepProb are constants, and the node is simply a fixed diagonal linear map on the activation.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.fixedInvertedDropoutFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (mask : Fin s.size → Bool) (keepProb : ℝ) :

NodeFDerivCorrect (fixedInvertedDropout idx mask keepProb)

NodeFDerivCorrect for fixed-mask inverted dropout.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.mul {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

Node Γ s

Pointwise multiplication of two same-shaped context entries.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.mulFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

NodeFDerivCorrect (mul a b)

NodeFDerivCorrect for mul (Hadamard product), using the product rule coordinatewise.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.square {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) :

Node Γ s

Runtime square node, implemented as x ⊙ x.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.squareFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) :

NodeFDerivCorrect (square idx)

Global NodeFDerivCorrect for elementwise square.

Instances For

TorchLean API

NN.Proofs.Autograd.Tape.Nodes.Arithmetic

Arithmetic and stochastic tape nodes #