GRU models (spec) #
TorchLean provides GRU layers/cells in NN.Spec.Layers.Gru. This file builds models on top of
that layer API: common compositions, heads, and a couple of end-to-end forward/backward routines.
Higher‑level GRU architectures built from module specs (SpecChain):
- sequence‑to‑sequence outputs,
- classifier heads (many‑to‑one),
- multi‑layer compositions.
GRU cell equations are in NN/Spec/Layers/Gru.lean; this file is primarily “wiring”.
References:
- Cho et al. (2014), "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation" (introduces GRU): https://arxiv.org/abs/1406.1078
- Chung et al. (2014), "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling" (GRU variants/ablation): https://arxiv.org/abs/1412.3555
- PyTorch
nn.GRUCelldocs: https://docs.pytorch.org/docs/stable/generated/torch.nn.GRUCell.html - PyTorch
nn.GRUdocs: https://pytorch.org/docs/stable/generated/torch.nn.GRU.html
PyTorch analogy: this corresponds to wiring torch.nn.GRU with linear heads and pooling over time
(e.g. last hidden state for classification).
A sequence-to-sequence GRU model, written as a SpecChain.
Pipeline:
GRU(seqLen, inputSize → hiddenSize) then Linear applied at each timestep.
PyTorch analogy: nn.GRU(..., batch_first=False) followed by an nn.Linear on the output sequence.
Instances For
A many-to-one GRU classifier (use the last hidden state, then a linear head).
PyTorch analogy: run nn.GRU over the sequence and feed the last output/hidden state into
nn.Linear(hiddenSize, numClasses).
Instances For
A 2-layer GRU stack (sequence-to-sequence), followed by a per-timestep linear head.
Instances For
A simple GRU language-model style pipeline:
Linear as a lightweight embedding/projection, then GRU, then a per-timestep projection back to
vocabSize.
PyTorch analogy: embedding (often nn.Embedding), nn.GRU, and nn.Linear(hiddenSize, vocabSize).
We use LinearSpec here as a spec-friendly stand-in for a one-hot embedding matrix.
Instances For
Record-style model specs #
The SpecChain builders above are the most uniform way to assemble models in TorchLean.
This section uses small record types with explicit forward functions. It is useful when you want to talk about a particular architecture directly (e.g. encoder-decoder), or when you need to carry extra per-model parameters (e.g. a dropout rate) without building a full module stack.
Bundle of parameters for a single-layer GRU model with a linear output head.
This is a direct record representation (as opposed to the SpecChain representation above).
- gru : GRUSpec α inputSize hiddenSize
gru.
- output_layer : LinearSpec α hiddenSize outputSize
output layer.
Instances For
Bundle of parameters for a multi-layer GRU model.
The first layer consumes inputSize, and all subsequent layers consume hiddenSize.
- first_layer : GRUSpec α inputSize hiddenSize
first layer.
- output_layer : LinearSpec α hiddenSize outputSize
output layer.
Instances For
Bundle of parameters for a many-to-one GRU classifier.
The classifier head is applied to the final hidden state.
- gru : GRUSpec α inputSize hiddenSize
gru.
- classifier : LinearSpec α hiddenSize numClasses
classifier.
Instances For
Bundle of parameters for a many-to-many GRU generator (language-model style).
This includes an (embedding) linear map, recurrent core, and output projection back to vocabulary.
- embedding : LinearSpec α vocabSize hiddenSize
embedding.
- gru : GRUSpec α hiddenSize hiddenSize
gru.
- output_projection : LinearSpec α hiddenSize vocabSize
output projection.
Instances For
Bundle of parameters for a bidirectional GRU model with an output head.
The head consumes the concatenation of forward and backward hidden states.
PyTorch analogue: nn.GRU(..., bidirectional=true) plus a linear projection.
- forward_gru : GRUSpec α inputSize hiddenSize
forward gru.
- backward_gru : GRUSpec α inputSize hiddenSize
backward gru.
- output_layer : LinearSpec α (hiddenSize + hiddenSize) outputSize
output layer.
Instances For
Bundle of parameters for a stacked GRU language model with deterministic dropout.
This model uses a list of GRU layers (all with hiddenSize input/output) and applies
dropout_inference_spec scaling between the GRU stack and the output projection.
- embedding : LinearSpec α vocabSize hiddenSize
embedding.
gru layers.
- output_projection : LinearSpec α hiddenSize vocabSize
output projection.
- dropout_rate : α
dropout rate.
Instances For
Bundle of parameters for a GRU encoder-decoder model (seq2seq).
This uses separate embeddings and GRU cores for encoder and decoder, plus an output projection.
PyTorch analogue: an encoder nn.GRU and a decoder nn.GRU with teacher forcing.
- encoder_embedding : LinearSpec α inputVocabSize hiddenSize
encoder embedding.
- encoder_gru : GRUSpec α hiddenSize hiddenSize
encoder gru.
- decoder_embedding : LinearSpec α outputVocabSize hiddenSize
decoder embedding.
- decoder_gru : GRUSpec α hiddenSize hiddenSize
decoder gru.
- output_projection : LinearSpec α hiddenSize outputVocabSize
output projection.
Instances For
One-step forward for SimpleGRUModel.
Input: (x_t, h_{t-1}). Output: (y_t, h_t).
Instances For
Sequence forward for SimpleGRUModel (time-major).
Returns (outputs, final_hidden).
PyTorch analogy: run nn.GRU over the sequence, then apply nn.Linear at each timestep.
Instances For
Forward pass for a GRUClassifier (many-to-one).
This runs the GRU over the input sequence and applies the classifier head to the final hidden state.
Instances For
Forward pass for a GRUGenerator (many-to-many).
This applies an embedding linear map to each token vector, runs the GRU, and projects each hidden state back into vocabulary space.
Instances For
Forward pass for a bidirectional GRU model (time-major).
This runs a forward GRU on the sequence, a backward GRU on the reversed sequence, concatenates the two hidden streams per timestep, and applies an output head.
Instances For
Forward pass for a MultiLayerGRUModel (stacked GRU layers).
This runs the first layer on the input sequence, then threads the resulting hidden stream through each additional hidden layer, and finally applies the output head per timestep.
Instances For
Forward pass for GRULanguageModel (teacher forcing, time-major).
This runs the embedding, then a stack of GRU layers with provided initial hiddens, applies
deterministic dropout scaling (dropout_inference_spec), and projects to vocabulary logits.
Instances For
Instances For
Encoder-decoder forward pass (GRU encoder + GRU decoder).
This is a small reference architecture:
- encode
src_tokensinto a final hidden state, - decode
tgt_tokensstarting from that hidden state (teacher forcing), - project decoder states into output-vocabulary logits.
PyTorch analogy: nn.GRU encoder + nn.GRU decoder with a linear output projection.
Instances For
Backward pass for SimpleGRUModel (full BPTT, gate-aware).
This assumes you already ran a forward pass that saved:
hidden_states,- the GRU intermediates (
reset_gates,update_gates,new_candidates,reset_hiddens).
Those intermediates can be produced using Spec.gru_extract_intermediate_values from
NN.Spec.Layers.Gru.
Return values:
- gradients for GRU parameters (reset/update/new weights + biases),
- gradients for the output linear layer (weights + bias),
- gradient for each timestep input.
Instances For
Attention-style GRU model bundle.
This record defines the parameters for an encoder/decoder GRU with learned attention scores. Forward passes can choose additive, dot-product, or domain-specific attention semantics while sharing this typed parameter bundle.
- encoder_gru : GRUSpec α inputSize hiddenSize
encoder gru.
decoder gru.
- attention_weights : LinearSpec α (hiddenSize + hiddenSize) 1
attention weights.
- output_layer : LinearSpec α hiddenSize outputSize
output layer.
Instances For
Bundle of parameters for a residual GRU model.
This includes a projection from input space to hidden space so the input can be added as a residual to the GRU hidden stream.
- gru : GRUSpec α inputSize hiddenSize
gru.
- residual_projection : LinearSpec α inputSize hiddenSize
residual projection.
- output_layer : LinearSpec α hiddenSize outputSize
output layer.
Instances For
Forward pass for ResidualGRUModel.
This runs the GRU, adds a projected version of the input as a residual connection, and applies the output head per timestep.
Instances For
Package SimpleGRUModel as an NNModuleSpec.
This is used to plug the spec model into the common module pipeline. The export_func.toPyTorch
field is documentation-oriented and indicates the intended PyTorch analogue.
Instances For
Package GRUClassifier as an NNModuleSpec.
PyTorch analogue: nn.GRU feeding a nn.Linear classifier head.
Instances For
Package BiGRUModel as an NNModuleSpec.
PyTorch analogue: nn.GRU(..., bidirectional=true) feeding a per-timestep linear head.
Instances For
Package GRUGenerator as an NNModuleSpec.
PyTorch analogue: GRU language model (nn.GRU + vocabulary projection) producing a sequence of
logits.