RNN (spec layer) #

Defines a vanilla RNN cell and sequence semantics, along with BPTT-style gradients.

This is the recurrent core that TorchLean builds on:

a single-step cell (rnnCellSpec),
an explicit unrolling over time (rnnSequenceSpec),
and a reverse-time VJP (rnnSequenceBackwardSpec).

PyTorch analogy:

rnnCellSpec corresponds to torch.nn.RNNCell with nonlinearity="tanh".
rnnSequenceSpec corresponds to torch.nn.RNN unrolled over seqLen.

References #

Elman, "Finding Structure in Time" (1990): https://crl.ucsd.edu/~elman/Papers/fsit.pdf
PyTorch RNNCell: https://docs.pytorch.org/docs/stable/generated/torch.nn.RNNCell.html
PyTorch RNN: https://docs.pytorch.org/docs/stable/generated/torch.nn.RNN.html

Common shape aliases #

We use these aliases pervasively in the spec layer so that signatures read like the math:

InputVector α inputSize is a length-inputSize vector,
HiddenVector α hiddenSize is a length-hiddenSize vector,
WeightMatrix α hiddenSize inputSize is a (hiddenSize × inputSize) matrix,
SequenceTensor α seqLen s is a time-major sequence of length seqLen.

PyTorch note: torch.nn.RNN can be configured as batch-first or time-first. In the spec layer we standardize on time-major (seqLen outermost) because it matches recursive definitions and proofs.

source

@[reducible, inline]

abbrev Spec.InputVector (α : Type) (inputSize : ℕ) :

Type

Shape alias: length-inputSize input vector.

Instances For

source

@[reducible, inline]

abbrev Spec.HiddenVector (α : Type) (hiddenSize : ℕ) :

Type

Shape alias: length-hiddenSize hidden-state vector.

Instances For

source

@[reducible, inline]

abbrev Spec.WeightMatrix (α : Type) (hiddenSize inputSize : ℕ) :

Type

Shape alias: (hiddenSize × inputSize) dense weight matrix.

Instances For

source

@[reducible, inline]

abbrev Spec.SequenceTensor (α : Type) (seqLen : ℕ) (shape : Shape) :

Type

Shape alias: time-major sequence of length seqLen with per-step shape shape.

Instances For

source

@[reducible, inline]

abbrev Spec.BatchedTensor (α : Type) (batchSize : ℕ) (shape : Shape) :

Type

Shape alias: batch of batchSize tensors with per-item shape shape.

Instances For

source

structure Spec.RNNSpec (α : Type) (inputSize hiddenSize : ℕ) :

Type

RNN cell parameters.

We use a single weight matrix applied to a concatenated vector [x_t; h_{t-1}]:

h_t = tanh(W [x_t; h_{t-1}] + b).

This is equivalent to the common split-parameter form:

h_t = tanh(W_ih x_t + W_hh h_{t-1} + b),

just packaged to reuse the same tensor primitives elsewhere in TorchLean.

weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)
weights.
bias : HiddenVector α hiddenSize
bias.

Instances For

source

def Spec.rnnCellSpec {α : Type} [Context α] {inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (input : InputVector α inputSize) (hidden : HiddenVector α hiddenSize) :

HiddenVector α hiddenSize

Single RNN cell forward pass.

Math: h_t = tanh(W [x_t; h_{t-1}] + b).

PyTorch analogy: RNNCell(input, hidden) with tanh nonlinearity.

Instances For

source

def Spec.rnnCellBackwardSpec {α : Type} [Context α] {inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (input : InputVector α inputSize) (prev_hidden hidden grad_hidden : HiddenVector α hiddenSize) :

InputVector α inputSize × HiddenVector α hiddenSize × WeightMatrix α hiddenSize (inputSize + hiddenSize) × HiddenVector α hiddenSize

Backward/VJP for a single RNN cell.

Inputs:

x_t, h_{t-1},
the cached forward output h_t (so we can write tanh' in terms of h_t),
an upstream gradient dL/dh_t.

Outputs:

dL/dx_t, dL/dh_{t-1}, and parameter gradients (dL/dW, dL/db).

Instances For

source

def Spec.rnnSequenceSpec {α : Type} [Context α] {seqLen inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (initial_hidden : HiddenVector α hiddenSize) :

SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)

Unroll an RNN over seqLen steps (time-major).

Returns the sequence of hidden states [h_0, ..., h_{seqLen-1}].

Instances For

source

@[irreducible]

def Spec.rnnSequenceSpec.process_sequence {α : Type} [Context α] {seqLen inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (t : ℕ) (prev_hidden : HiddenVector α hiddenSize) :

HiddenVector α hiddenSize × List (HiddenVector α hiddenSize)

Instances For

source

def Spec.rnnBatchedSpec {α : Type} [Context α] {batchSize seqLen inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (inputs : BatchedTensor α batchSize (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : BatchedTensor α batchSize (Shape.dim hiddenSize Shape.scalar)) :

BatchedTensor α batchSize (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))

Batched RNN forward pass (maps rnnSequenceSpec over the batch dimension).

Instances For

source

def Spec.rnnWeightsDerivSpec {α : Type} [Context α] {seqLen inputSize hiddenSize : ℕ} (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (hiddens grad_outputs : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) :

WeightMatrix α hiddenSize (inputSize + hiddenSize)

Gradient w.r.t. weights from a full unroll, given per-step preactivation gradients.

This is a lightweight helper for analyses that already have preactivation gradients. It assumes:

the initial hidden state is 0, and
grad_outputs[t] is already dL/dz_t (preactivation gradient).

For end-to-end BPTT from dL/dh_t, prefer rnnSequenceBackwardSpec.

Instances For

source

@[irreducible]

def Spec.rnnWeightsDerivSpec.accumulate_grads {α : Type} [Context α] {seqLen inputSize hiddenSize : ℕ} (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (hiddens grad_outputs : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) (t : ℕ) (acc : WeightMatrix α hiddenSize (inputSize + hiddenSize)) :

WeightMatrix α hiddenSize (inputSize + hiddenSize)

Instances For

source

def Spec.rnnBiasDerivSpec {α : Type} [Context α] {seqLen hiddenSize : ℕ} (grad_outputs : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) (h : seqLen ≠ 0) :

HiddenVector α hiddenSize

Gradient w.r.t. bias from per-step preactivation gradients.

This is sum_t dL/dz_t over the sequence dimension.

Instances For

source

def Spec.rnnSequenceBackwardSpec {α : Type} [Context α] {seqLen inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (initial_hidden : HiddenVector α hiddenSize) (hiddens grad_hiddens : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) :

WeightMatrix α hiddenSize (inputSize + hiddenSize) × HiddenVector α hiddenSize × SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar) × HiddenVector α hiddenSize

Full BPTT backward pass through an RNN sequence.

This is the spec-level version of what PyTorch autograd computes for nn.RNN when unrolled:

we walk time in reverse,
accumulate parameter gradients,
and compute gradients for each input step plus the initial hidden state.

Diagram: forward unroll + BPTT (vanilla RNN) #

One step (forward):

x_t        h_{t-1}
 |            |
 +---- concat ----+
                 |
             z_t = W · [x_t; h_{t-1}] + b
                 |
             h_t = tanh(z_t)

Unrolled over time (forward):

h_-1 = h0

x0 -> [cell] -> h0 -> [cell] -> h1 -> ... -> [cell] -> h_{T-1}
        ^          ^                       ^
      uses h_-1  uses h0                 uses h_{T-2}

Backprop through time (reverse):

At each time step we combine two sources of gradient for h_t:

the gradient coming from the loss that touches h_t directly (grad_hiddens[t]),
plus the gradient flowing "from the future" through the recurrence (dHidden_next).

Then we push total_grad through the single-step VJP (rnn_cell_backward_spec), producing:

dInput_t and dHidden_prev,
and parameter gradients dW_t, db_t which are accumulated across time.

Instances For

source

@[irreducible]

def Spec.rnnSequenceBackwardSpec.backward_step {α : Type} [Context α] {seqLen inputSize hiddenSize : ℕ} (rnn : RNNSpec α inputSize hiddenSize) (inputs : SequenceTensor α seqLen (Shape.dim inputSize Shape.scalar)) (initial_hidden : HiddenVector α hiddenSize) (hiddens grad_hiddens : SequenceTensor α seqLen (Shape.dim hiddenSize Shape.scalar)) (t : ℕ) (h_t : t ≤ seqLen) (dHidden_next : HiddenVector α hiddenSize) (acc_inputs : List (InputVector α inputSize)) (acc_W : WeightMatrix α hiddenSize (inputSize + hiddenSize)) (acc_b : HiddenVector α hiddenSize) :

List (InputVector α inputSize) × HiddenVector α hiddenSize × WeightMatrix α hiddenSize (inputSize + hiddenSize) × HiddenVector α hiddenSize

Instances For

TorchLean API

NN.Spec.Layers.Rnn

RNN (spec layer) #

References #

Common shape aliases #

Diagram: forward unroll + BPTT (vanilla RNN) #