Arithmetic and stochastic tape nodes #
Affine nodes, pointwise arithmetic, scaling, multiplication, squaring, and fixed-mask stochastic operators. Dropout appears here only in its fixed-mask deterministic form, which is the semantics that can be differentiated directly.
Fixed affine node over arbitrary tensor shapes.
This is the proof-level version of a linear layer after all leading dimensions have been flattened
into the vector representation used by CtxVec. It is intentionally shape-generic: a usual
unbatched LinearSpec, a position-wise Transformer FFN map over a whole sequence, or any other
fixed affine map can instantiate the same theorem by supplying the appropriate continuous linear
map A and bias vector b.
Instances For
NodeFDerivCorrect for a fixed affine map over arbitrary tensor shapes.
Instances For
Add two same-shaped context entries.
Instances For
NodeFDerivCorrect for add (derivative is pointwise addition of the two projections).
Instances For
Subtract two same-shaped context entries.
Instances For
NodeFDerivCorrect for sub (derivative is pointwise subtraction of projections).
Instances For
Scale a context entry by a constant scalar.
Instances For
NodeFDerivCorrect for scale (derivative is scaling of the projection CLM).
Instances For
Apply a fixed elementwise multiplier.
This is the differentiable core of deterministic training-mode dropout: once a seed has sampled a
mask, the forward pass is just x ↦ coeff ⊙ x (with coeff = mask / keepProb for inverted
dropout). The randomness itself is not differentiated; it is represented by the fixed coeff.
Instances For
NodeFDerivCorrect for a fixed-mask scaling node.
The derivative is the diagonal linear map induced by coeff. This is the proof-level contract for
seeded dropout after the mask has been generated: fixed seed and fixed keep probability determine
coeff; gradients flow only through the input activation.
Instances For
Coefficients for inverted dropout from a fixed Boolean keep mask and scalar keep probability.
Instances For
Fixed-mask inverted dropout node.
This is the theorem-facing form of training-mode dropout. A runtime seed may generate mask, but
the derivative theorem is stated after sampling: mask and keepProb are constants, and the node
is simply a fixed diagonal linear map on the activation.
Instances For
NodeFDerivCorrect for fixed-mask inverted dropout.
Instances For
Pointwise multiplication of two same-shaped context entries.
Instances For
NodeFDerivCorrect for mul (Hadamard product), using the product rule coordinatewise.
Instances For
Runtime square node, implemented as x ⊙ x.
Instances For
Global NodeFDerivCorrect for elementwise square.