TorchLean API

NN.Proofs.Autograd.Tape.Nodes.Arithmetic

Arithmetic and stochastic tape nodes #

Affine nodes, pointwise arithmetic, scaling, multiplication, squaring, and fixed-mask stochastic operators. Dropout appears here only in its fixed-mask deterministic form, which is the semantics that can be differentiated directly.

noncomputable def Proofs.Autograd.TapeNodes.affine {Γ : List Spec.Shape} {sIn sOut : Spec.Shape} (idx : Idx Γ sIn) (A : Vec sIn.size →L[] Vec sOut.size) (b : Vec sOut.size) :
Node Γ sOut

Fixed affine node over arbitrary tensor shapes.

This is the proof-level version of a linear layer after all leading dimensions have been flattened into the vector representation used by CtxVec. It is intentionally shape-generic: a usual unbatched LinearSpec, a position-wise Transformer FFN map over a whole sequence, or any other fixed affine map can instantiate the same theorem by supplying the appropriate continuous linear map A and bias vector b.

Instances For
    noncomputable def Proofs.Autograd.TapeNodes.affineFderiv {Γ : List Spec.Shape} {sIn sOut : Spec.Shape} (idx : Idx Γ sIn) (A : Vec sIn.size →L[] Vec sOut.size) (b : Vec sOut.size) :

    NodeFDerivCorrect for a fixed affine map over arbitrary tensor shapes.

    Instances For
      noncomputable def Proofs.Autograd.TapeNodes.add {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :
      Node Γ s

      Add two same-shaped context entries.

      Instances For
        noncomputable def Proofs.Autograd.TapeNodes.addFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

        NodeFDerivCorrect for add (derivative is pointwise addition of the two projections).

        Instances For
          noncomputable def Proofs.Autograd.TapeNodes.sub {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :
          Node Γ s

          Subtract two same-shaped context entries.

          Instances For
            noncomputable def Proofs.Autograd.TapeNodes.subFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

            NodeFDerivCorrect for sub (derivative is pointwise subtraction of projections).

            Instances For
              noncomputable def Proofs.Autograd.TapeNodes.scale {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (c : ) :
              Node Γ s

              Scale a context entry by a constant scalar.

              Instances For
                noncomputable def Proofs.Autograd.TapeNodes.scaleFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (c : ) :

                NodeFDerivCorrect for scale (derivative is scaling of the projection CLM).

                Instances For
                  noncomputable def Proofs.Autograd.TapeNodes.fixedMaskScale {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (coeff : Vec s.size) :
                  Node Γ s

                  Apply a fixed elementwise multiplier.

                  This is the differentiable core of deterministic training-mode dropout: once a seed has sampled a mask, the forward pass is just x ↦ coeff ⊙ x (with coeff = mask / keepProb for inverted dropout). The randomness itself is not differentiated; it is represented by the fixed coeff.

                  Instances For
                    noncomputable def Proofs.Autograd.TapeNodes.fixedMaskScaleFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (coeff : Vec s.size) :

                    NodeFDerivCorrect for a fixed-mask scaling node.

                    The derivative is the diagonal linear map induced by coeff. This is the proof-level contract for seeded dropout after the mask has been generated: fixed seed and fixed keep probability determine coeff; gradients flow only through the input activation.

                    Instances For
                      noncomputable def Proofs.Autograd.TapeNodes.invertedDropoutCoeff {s : Spec.Shape} (mask : Fin s.sizeBool) (keepProb : ) :

                      Coefficients for inverted dropout from a fixed Boolean keep mask and scalar keep probability.

                      Instances For
                        noncomputable def Proofs.Autograd.TapeNodes.fixedInvertedDropout {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (mask : Fin s.sizeBool) (keepProb : ) :
                        Node Γ s

                        Fixed-mask inverted dropout node.

                        This is the theorem-facing form of training-mode dropout. A runtime seed may generate mask, but the derivative theorem is stated after sampling: mask and keepProb are constants, and the node is simply a fixed diagonal linear map on the activation.

                        Instances For
                          noncomputable def Proofs.Autograd.TapeNodes.fixedInvertedDropoutFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) (mask : Fin s.sizeBool) (keepProb : ) :

                          NodeFDerivCorrect for fixed-mask inverted dropout.

                          Instances For
                            noncomputable def Proofs.Autograd.TapeNodes.mul {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :
                            Node Γ s

                            Pointwise multiplication of two same-shaped context entries.

                            Instances For
                              noncomputable def Proofs.Autograd.TapeNodes.mulFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (a b : Idx Γ s) :

                              NodeFDerivCorrect for mul (Hadamard product), using the product rule coordinatewise.

                              Instances For
                                noncomputable def Proofs.Autograd.TapeNodes.square {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) :
                                Node Γ s

                                Runtime square node, implemented as x ⊙ x.

                                Instances For

                                  Global NodeFDerivCorrect for elementwise square.

                                  Instances For