LSTM (spec layer) #
TorchLean provides a small LSTM specification that is:
- explicit about shapes (so common dimension mistakes are caught early),
- explicit about the gate math (so gradients are inspectable and proofs can refer to the equations),
- close in spirit to the way PyTorch documents
nn.LSTMCell/nn.LSTM.
References (math + PyTorch behavior) #
- Hochreiter, Schmidhuber, "Long Short-Term Memory" (Neural Computation, 1997). Free PDF: http://www.bioinf.jku.at/publications/older/2604.pdf
- PyTorch
LSTMCellequations: https://docs.pytorch.org/docs/stable/generated/torch.nn.LSTMCell.html - PyTorch
LSTMequations: https://docs.pytorch.org/docs/stable/generated/torch.nn.LSTM.html
Notes on parameterization #
Many libraries expose two matrices per gate (W_ih and W_hh) and add them.
In this spec we use a single matrix applied to a concatenated vector [x_t; h_{t-1}].
It's the same computation, just packaged to reuse TorchLean's tensor building blocks.
Parameters for an LSTM cell, with one (hiddenSize × (inputSize + hiddenSize)) matrix per gate.
This corresponds to the usual (W_ih, W_hh) parameterization in libraries like PyTorch, but we
package it as a single matrix applied to [x_t; h_{t-1}] to reuse TorchLean's tensor building
blocks.
- forget_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)
Forget-gate weights for
f_t = sigmoid(W_f [x_t; h_{t-1}] + b_f). - forget_bias : HiddenVector α hiddenSize
Forget-gate bias.
- input_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)
Input-gate weights for
i_t = sigmoid(W_i [x_t; h_{t-1}] + b_i). - input_bias : HiddenVector α hiddenSize
Input-gate bias.
- candidate_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)
Candidate/cell-proposal weights for
g_t = tanh(W_g [x_t; h_{t-1}] + b_g). - candidate_bias : HiddenVector α hiddenSize
Candidate/cell-proposal bias.
- output_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)
Output-gate weights for
o_t = sigmoid(W_o [x_t; h_{t-1}] + b_o). - output_bias : HiddenVector α hiddenSize
Output-gate bias.
Instances For
LSTM recurrent state: hidden vector h_t and cell vector c_t.
- cell : HiddenVector α hiddenSize
Internal memory/cell state
c_t.
Instances For
One LSTM cell step: update (h_{t-1}, c_{t-1}) given x_t and parameters.
Instances For
Run an LSTM cell over a length-seqLen input sequence, returning outputs and final state.
Instances For
Instances For
Batched wrapper around lstmSequenceSpec (runs one sequence per batch element).
Instances For
Forward pass for one LSTM cell that also returns the gate activations.
This is the spec analogue of the "saved tensors" that a runtime will keep for backward.
Instances For
Backward pass (VJP) for a single LSTM cell.
Inputs:
- parameters
lstm, - inputs
x_t, previous state(h_{t-1}, c_{t-1}), and current state(h_t, c_t), - the gate activations from the forward pass,
- upstream gradients for both
h_tandc_t.
Outputs:
- gradients w.r.t.
x_tand the previous state, - plus gradients for each parameter tensor.
PyTorch mental model: this is what autograd computes for nn.LSTMCell when unrolled in time.
Instances For
Backprop through time (BPTT) for the whole sequence.
This function recomputes and stores the forward intermediates (gates and states) internally, then walks time backward accumulating parameter gradients and input gradients. This matches the usual PyTorch training structure, with the save-vs-recompute choice made explicit.