RNN (spec layer) #
Defines a vanilla RNN cell and sequence semantics, along with BPTT-style gradients.
This is the recurrent core that TorchLean builds on:
- a single-step cell (
rnnCellSpec), - an explicit unrolling over time (
rnnSequenceSpec), - and a reverse-time VJP (
rnnSequenceBackwardSpec).
PyTorch analogy:
rnnCellSpeccorresponds totorch.nn.RNNCellwithnonlinearity="tanh".rnnSequenceSpeccorresponds totorch.nn.RNNunrolled overseqLen.
References #
- Elman, "Finding Structure in Time" (1990): https://crl.ucsd.edu/~elman/Papers/fsit.pdf
- PyTorch
RNNCell: https://docs.pytorch.org/docs/stable/generated/torch.nn.RNNCell.html - PyTorch
RNN: https://docs.pytorch.org/docs/stable/generated/torch.nn.RNN.html
Common shape aliases #
We use these aliases pervasively in the spec layer so that signatures read like the math:
InputVector α inputSizeis a length-inputSizevector,HiddenVector α hiddenSizeis a length-hiddenSizevector,WeightMatrix α hiddenSize inputSizeis a(hiddenSize × inputSize)matrix,SequenceTensor α seqLen sis a time-major sequence of lengthseqLen.
PyTorch note: torch.nn.RNN can be configured as batch-first or time-first. In the spec layer we
standardize on time-major (seqLen outermost) because it matches recursive definitions and proofs.
Shape alias: length-inputSize input vector.
Instances For
Shape alias: length-hiddenSize hidden-state vector.
Instances For
Shape alias: (hiddenSize × inputSize) dense weight matrix.
Instances For
RNN cell parameters.
We use a single weight matrix applied to a concatenated vector [x_t; h_{t-1}]:
h_t = tanh(W [x_t; h_{t-1}] + b).
This is equivalent to the common split-parameter form:
h_t = tanh(W_ih x_t + W_hh h_{t-1} + b),
just packaged to reuse the same tensor primitives elsewhere in TorchLean.
- weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)
weights.
- bias : HiddenVector α hiddenSize
bias.
Instances For
Single RNN cell forward pass.
Math:
h_t = tanh(W [x_t; h_{t-1}] + b).
PyTorch analogy: RNNCell(input, hidden) with tanh nonlinearity.
Instances For
Backward/VJP for a single RNN cell.
Inputs:
x_t,h_{t-1},- the cached forward output
h_t(so we can writetanh'in terms ofh_t), - an upstream gradient
dL/dh_t.
Outputs:
dL/dx_t,dL/dh_{t-1}, and parameter gradients(dL/dW, dL/db).
Instances For
Unroll an RNN over seqLen steps (time-major).
Returns the sequence of hidden states [h_0, ..., h_{seqLen-1}].
Instances For
Instances For
Batched RNN forward pass (maps rnnSequenceSpec over the batch dimension).
Instances For
Gradient w.r.t. weights from a full unroll, given per-step preactivation gradients.
This is a lightweight helper for analyses that already have preactivation gradients. It assumes:
- the initial hidden state is
0, and grad_outputs[t]is alreadydL/dz_t(preactivation gradient).
For end-to-end BPTT from dL/dh_t, prefer rnnSequenceBackwardSpec.
Instances For
Instances For
Gradient w.r.t. bias from per-step preactivation gradients.
This is sum_t dL/dz_t over the sequence dimension.
Instances For
Full BPTT backward pass through an RNN sequence.
This is the spec-level version of what PyTorch autograd computes for nn.RNN when unrolled:
- we walk time in reverse,
- accumulate parameter gradients,
- and compute gradients for each input step plus the initial hidden state.
Diagram: forward unroll + BPTT (vanilla RNN) #
One step (forward):
x_t h_{t-1}
| |
+---- concat ----+
|
z_t = W · [x_t; h_{t-1}] + b
|
h_t = tanh(z_t)
Unrolled over time (forward):
h_-1 = h0
x0 -> [cell] -> h0 -> [cell] -> h1 -> ... -> [cell] -> h_{T-1}
^ ^ ^
uses h_-1 uses h0 uses h_{T-2}
Backprop through time (reverse):
At each time step we combine two sources of gradient for h_t:
- the gradient coming from the loss that touches
h_tdirectly (grad_hiddens[t]), - plus the gradient flowing "from the future" through the recurrence (
dHidden_next).
Then we push total_grad through the single-step VJP (rnn_cell_backward_spec), producing:
dInput_tanddHidden_prev,- and parameter gradients
dW_t,db_twhich are accumulated across time.