TorchLean API

NN.Spec.Layers.Rnn

RNN (spec layer) #

Defines a vanilla RNN cell and sequence semantics, along with BPTT-style gradients.

This is the recurrent core that TorchLean builds on:

PyTorch analogy:

References #

Common shape aliases #

We use these aliases pervasively in the spec layer so that signatures read like the math:

PyTorch note: torch.nn.RNN can be configured as batch-first or time-first. In the spec layer we standardize on time-major (seqLen outermost) because it matches recursive definitions and proofs.

@[reducible, inline]
abbrev Spec.InputVector (α : Type) (inputSize : ) :

Shape alias: length-inputSize input vector.

Instances For
    @[reducible, inline]
    abbrev Spec.HiddenVector (α : Type) (hiddenSize : ) :

    Shape alias: length-hiddenSize hidden-state vector.

    Instances For
      @[reducible, inline]
      abbrev Spec.WeightMatrix (α : Type) (hiddenSize inputSize : ) :

      Shape alias: (hiddenSize × inputSize) dense weight matrix.

      Instances For
        @[reducible, inline]
        abbrev Spec.SequenceTensor (α : Type) (seqLen : ) (shape : Shape) :

        Shape alias: time-major sequence of length seqLen with per-step shape shape.

        Instances For
          @[reducible, inline]
          abbrev Spec.BatchedTensor (α : Type) (batchSize : ) (shape : Shape) :

          Shape alias: batch of batchSize tensors with per-item shape shape.

          Instances For
            structure Spec.RNNSpec (α : Type) (inputSize hiddenSize : ) :

            RNN cell parameters.

            We use a single weight matrix applied to a concatenated vector [x_t; h_{t-1}]:

            h_t = tanh(W [x_t; h_{t-1}] + b).

            This is equivalent to the common split-parameter form:

            h_t = tanh(W_ih x_t + W_hh h_{t-1} + b),

            just packaged to reuse the same tensor primitives elsewhere in TorchLean.

            Instances For
              def Spec.rnnCellSpec {α : Type} [Context α] {inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (input : InputVector α inputSize) (hidden : HiddenVector α hiddenSize) :
              HiddenVector α hiddenSize

              Single RNN cell forward pass.

              Math: h_t = tanh(W [x_t; h_{t-1}] + b).

              PyTorch analogy: RNNCell(input, hidden) with tanh nonlinearity.

              Instances For
                def Spec.rnnCellBackwardSpec {α : Type} [Context α] {inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (input : InputVector α inputSize) (prev_hidden hidden grad_hidden : HiddenVector α hiddenSize) :
                InputVector α inputSize × HiddenVector α hiddenSize × WeightMatrix α hiddenSize (inputSize + hiddenSize) × HiddenVector α hiddenSize

                Backward/VJP for a single RNN cell.

                Inputs:

                • x_t, h_{t-1},
                • the cached forward output h_t (so we can write tanh' in terms of h_t),
                • an upstream gradient dL/dh_t.

                Outputs:

                • dL/dx_t, dL/dh_{t-1}, and parameter gradients (dL/dW, dL/db).
                Instances For
                  def Spec.rnnSequenceSpec {α : Type} [Context α] {seqLen inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (initial_hidden : HiddenVector α hiddenSize) :
                  SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)

                  Unroll an RNN over seqLen steps (time-major).

                  Returns the sequence of hidden states [h_0, ..., h_{seqLen-1}].

                  Instances For
                    @[irreducible]
                    def Spec.rnnSequenceSpec.process_sequence {α : Type} [Context α] {seqLen inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (t : ) (prev_hidden : HiddenVector α hiddenSize) :
                    HiddenVector α hiddenSize × List (HiddenVector α hiddenSize)
                    Instances For
                      def Spec.rnnBatchedSpec {α : Type} [Context α] {batchSize seqLen inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (inputs : BatchedTensor α batchSize (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : BatchedTensor α batchSize (Shape.dim hiddenSize Shape.scalar)) :
                      BatchedTensor α batchSize (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))

                      Batched RNN forward pass (maps rnnSequenceSpec over the batch dimension).

                      Instances For
                        def Spec.rnnWeightsDerivSpec {α : Type} [Context α] {seqLen inputSize hiddenSize : } (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (hiddens grad_outputs : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) :
                        WeightMatrix α hiddenSize (inputSize + hiddenSize)

                        Gradient w.r.t. weights from a full unroll, given per-step preactivation gradients.

                        This is a lightweight helper for analyses that already have preactivation gradients. It assumes:

                        • the initial hidden state is 0, and
                        • grad_outputs[t] is already dL/dz_t (preactivation gradient).

                        For end-to-end BPTT from dL/dh_t, prefer rnnSequenceBackwardSpec.

                        Instances For
                          @[irreducible]
                          def Spec.rnnWeightsDerivSpec.accumulate_grads {α : Type} [Context α] {seqLen inputSize hiddenSize : } (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (hiddens grad_outputs : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) (t : ) (acc : WeightMatrix α hiddenSize (inputSize + hiddenSize)) :
                          WeightMatrix α hiddenSize (inputSize + hiddenSize)
                          Instances For
                            def Spec.rnnBiasDerivSpec {α : Type} [Context α] {seqLen hiddenSize : } (grad_outputs : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) (h : seqLen 0) :
                            HiddenVector α hiddenSize

                            Gradient w.r.t. bias from per-step preactivation gradients.

                            This is sum_t dL/dz_t over the sequence dimension.

                            Instances For
                              def Spec.rnnSequenceBackwardSpec {α : Type} [Context α] {seqLen inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (initial_hidden : HiddenVector α hiddenSize) (hiddens grad_hiddens : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) :
                              WeightMatrix α hiddenSize (inputSize + hiddenSize) × HiddenVector α hiddenSize × SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar) × HiddenVector α hiddenSize

                              Full BPTT backward pass through an RNN sequence.

                              This is the spec-level version of what PyTorch autograd computes for nn.RNN when unrolled:

                              • we walk time in reverse,
                              • accumulate parameter gradients,
                              • and compute gradients for each input step plus the initial hidden state.

                              Diagram: forward unroll + BPTT (vanilla RNN) #

                              One step (forward):

                              x_t        h_{t-1}
                               |            |
                               +---- concat ----+
                                               |
                                           z_t = W · [x_t; h_{t-1}] + b
                                               |
                                           h_t = tanh(z_t)
                              

                              Unrolled over time (forward):

                              h_-1 = h0
                              
                              x0 -> [cell] -> h0 -> [cell] -> h1 -> ... -> [cell] -> h_{T-1}
                                      ^          ^                       ^
                                    uses h_-1  uses h0                 uses h_{T-2}
                              

                              Backprop through time (reverse):

                              At each time step we combine two sources of gradient for h_t:

                              • the gradient coming from the loss that touches h_t directly (grad_hiddens[t]),
                              • plus the gradient flowing "from the future" through the recurrence (dHidden_next).

                              Then we push total_grad through the single-step VJP (rnn_cell_backward_spec), producing:

                              • dInput_t and dHidden_prev,
                              • and parameter gradients dW_t, db_t which are accumulated across time.
                              Instances For
                                @[irreducible]
                                def Spec.rnnSequenceBackwardSpec.backward_step {α : Type} [Context α] {seqLen inputSize hiddenSize : } (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (initial_hidden : HiddenVector α hiddenSize) (hiddens grad_hiddens : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) (t : ) (h_t : t seqLen) (dHidden_next : HiddenVector α hiddenSize) (acc_inputs : List (InputVector α inputSize)) (acc_W : WeightMatrix α hiddenSize (inputSize + hiddenSize)) (acc_b : HiddenVector α hiddenSize) :
                                List (InputVector α inputSize) × HiddenVector α hiddenSize × WeightMatrix α hiddenSize (inputSize + hiddenSize) × HiddenVector α hiddenSize
                                Instances For