TorchLean API

NN.Spec.Layers.Linear

Linear layer (spec layer) #

This file defines a fully‑connected layer and its gradients:

Definitions are purely functional and shape‑indexed, suitable for both proofs and reuse by autograd wrappers in NN/Spec/Autograd.

structure Spec.LinearSpec (α : Type) (inDim outDim : ) :

Linear layer specification (pure, shape-indexed).

This is the spec-level analogue of PyTorch torch.nn.Linear / torch.nn.functional.linear:

  • weights has shape [outDim, inDim],
  • bias has shape [outDim].
Instances For
    def Spec.linearSpec {α : Type} [Add α] [Mul α] [Zero α] {inDim outDim : } (m : LinearSpec α inDim outDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) :

    Unbatched forward pass: y = W x + b.

    PyTorch analogue: torch.nn.functional.linear.

    Instances For
      def Spec.linearBatchedSpec {α : Type} [Add α] [Mul α] [Zero α] {batch inDim outDim : } (m : LinearSpec α inDim outDim) (input : Tensor α (Shape.dim batch (Shape.dim inDim Shape.scalar))) :

      Batched forward pass (map the unbatched linear_spec over the batch axis).

      Input shape: [batch, inDim] Output shape: [batch, outDim]

      PyTorch analogue: applying nn.Linear to a batched tensor.

      Instances For
        def Spec.linearWeightsDerivSpec {α : Type} [Mul α] {inDim outDim : } (input : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) :

        Gradient w.r.t. weights: ∂L/∂W = (∂L/∂y) ⊗ x (outer product).

        This is the standard linear-layer backward formula for y = W x + b.

        Instances For
          def Spec.linearBiasDerivSpec {α : Type} {inDim outDim : } (_dW : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) (_input : Tensor α (Shape.dim inDim Shape.scalar)) :

          Gradient w.r.t. bias: ∂L/∂b = ∂L/∂y.

          Since y = W x + b, the Jacobian of y w.r.t. b is the identity.

          Instances For
            def Spec.linearInputDerivSpec {α : Type} [Add α] [Mul α] [Zero α] {inDim outDim : } (weights : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) :

            Gradient w.r.t. input: ∂L/∂x = Wᵀ (∂L/∂y).

            This is the standard "matmul by the transpose" rule for y = W x + b.

            Instances For
              def Spec.batchLinearDerivSpec {α : Type} [Add α] [Mul α] [Zero α] {batch inDim outDim : } (weights : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (input : Tensor α (Shape.dim (batch + 1) (Shape.dim inDim Shape.scalar))) (grad_output : Tensor α (Shape.dim (batch + 1) (Shape.dim outDim Shape.scalar))) :

              Batched derivatives (∂L/∂W, ∂L/∂b, ∂L/∂x) for a batch of size batch + 1.

              This is a convenience wrapper that uses matrix operations to compute:

              • d_weights = (grad_outputᵀ) · input,
              • d_bias = sum(grad_output) over the batch axis,
              • d_input = grad_output · weights.
              Instances For
                def Spec.linearBackwardSpec {α : Type} [Add α] [Mul α] [Zero α] {inDim outDim : } (layer : LinearSpec α inDim outDim) (input : Tensor α (Shape.dim inDim Shape.scalar)) (grad_output : Tensor α (Shape.dim outDim Shape.scalar)) :

                Complete unbatched backward pass for a linear layer.

                Returns (∂L/∂W, ∂L/∂b, ∂L/∂x) given the layer params, input x, and output gradient ∂L/∂y.

                Instances For
                  def Spec.linearGradientAccumulateSpec {α : Type} [Add α] {inDim outDim : } (grad1 grad2 : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) :

                  Accumulate two weight gradients by addition.

                  This is a small helper used by batching/training code.

                  Instances For
                    def Spec.linearGradientScaleSpec {α : Type} [Mul α] {inDim outDim : } (grad : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))) (scale_factor : α) :

                    Scale a weight gradient by a scalar factor (e.g. learning-rate adjustment).

                    Instances For