GRU models (spec) #

TorchLean provides GRU layers/cells in NN.Spec.Layers.Gru. This file builds models on top of that layer API: common compositions, heads, and a couple of end-to-end forward/backward routines.

Higher‑level GRU architectures built from module specs (SpecChain):

sequence‑to‑sequence outputs,
classifier heads (many‑to‑one),
multi‑layer compositions.

GRU cell equations are in NN/Spec/Layers/Gru.lean; this file is primarily “wiring”.

References:

Cho et al. (2014), "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation" (introduces GRU): https://arxiv.org/abs/1406.1078
Chung et al. (2014), "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling" (GRU variants/ablation): https://arxiv.org/abs/1412.3555
PyTorch nn.GRUCell docs: https://docs.pytorch.org/docs/stable/generated/torch.nn.GRUCell.html
PyTorch nn.GRU docs: https://pytorch.org/docs/stable/generated/torch.nn.GRU.html

PyTorch analogy: this corresponds to wiring torch.nn.GRU with linear heads and pooling over time (e.g. last hidden state for classification).

source

def Spec.simpleGruModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize outputSize : ℕ} (gru_spec : GRUSpec α inputSize hiddenSize) (linearSpec : LinearSpec α hiddenSize outputSize) :

ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

A sequence-to-sequence GRU model, written as a SpecChain.

Pipeline: GRU(seqLen, inputSize → hiddenSize) then Linear applied at each timestep.

PyTorch analogy: nn.GRU(..., batch_first=False) followed by an nn.Linear on the output sequence.

Instances For

source

def Spec.gruClassifierModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize numClasses : ℕ} (gru_spec : GRUSpec α inputSize hiddenSize) (classifier_spec : LinearSpec α hiddenSize numClasses) (h : seqLen ≠ 0) :

ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim numClasses Shape.scalar)

A many-to-one GRU classifier (use the last hidden state, then a linear head).

PyTorch analogy: run nn.GRU over the sequence and feed the last output/hidden state into nn.Linear(hiddenSize, numClasses).

Instances For

source

def Spec.multilayerGruSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize outputSize : ℕ} (gru1_spec : GRUSpec α inputSize hiddenSize) (gru2_spec : GRUSpec α hiddenSize hiddenSize) (linearSpec : LinearSpec α hiddenSize outputSize) :

ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

A 2-layer GRU stack (sequence-to-sequence), followed by a per-timestep linear head.

Instances For

source

def Spec.gruLanguageModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen vocabSize hiddenSize : ℕ} (embedding_spec : LinearSpec α vocabSize hiddenSize) (gru_spec : GRUSpec α hiddenSize hiddenSize) (output_spec : LinearSpec α hiddenSize vocabSize) :

ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))

A simple GRU language-model style pipeline:

Linear as a lightweight embedding/projection, then GRU, then a per-timestep projection back to vocabSize.

PyTorch analogy: embedding (often nn.Embedding), nn.GRU, and nn.Linear(hiddenSize, vocabSize). We use LinearSpec here as a spec-friendly stand-in for a one-hot embedding matrix.

Instances For

Record-style model specs #

The SpecChain builders above are the most uniform way to assemble models in TorchLean.

This section uses small record types with explicit forward functions. It is useful when you want to talk about a particular architecture directly (e.g. encoder-decoder), or when you need to carry extra per-model parameters (e.g. a dropout rate) without building a full module stack.

source

structure Spec.SimpleGRUModel (α : Type) (inputSize hiddenSize outputSize : ℕ) :

Type

Bundle of parameters for a single-layer GRU model with a linear output head.

This is a direct record representation (as opposed to the SpecChain representation above).

gru : GRUSpec α inputSize hiddenSize
gru.
output_layer : LinearSpec α hiddenSize outputSize
output layer.

Instances For

source

structure Spec.MultiLayerGRUModel (α : Type) (inputSize hiddenSize outputSize numLayers : ℕ) :

Type

Bundle of parameters for a multi-layer GRU model.

The first layer consumes inputSize, and all subsequent layers consume hiddenSize.

first_layer : GRUSpec α inputSize hiddenSize
first layer.
hidden_layers (i : Fin (numLayers - 1)) : GRUSpec α hiddenSize hiddenSize
hidden layers.
output_layer : LinearSpec α hiddenSize outputSize
output layer.

Instances For

source

structure Spec.GRUClassifier (α : Type) (inputSize hiddenSize numClasses : ℕ) :

Type

Bundle of parameters for a many-to-one GRU classifier.

The classifier head is applied to the final hidden state.

gru : GRUSpec α inputSize hiddenSize
gru.
classifier : LinearSpec α hiddenSize numClasses
classifier.

Instances For

source

structure Spec.GRUGenerator (α : Type) (vocabSize hiddenSize : ℕ) :

Type

Bundle of parameters for a many-to-many GRU generator (language-model style).

This includes an (embedding) linear map, recurrent core, and output projection back to vocabulary.

embedding : LinearSpec α vocabSize hiddenSize
embedding.
gru : GRUSpec α hiddenSize hiddenSize
gru.
output_projection : LinearSpec α hiddenSize vocabSize
output projection.

Instances For

source

structure Spec.BiGRUModel (α : Type) (inputSize hiddenSize outputSize : ℕ) :

Type

Bundle of parameters for a bidirectional GRU model with an output head.

The head consumes the concatenation of forward and backward hidden states. PyTorch analogue: nn.GRU(..., bidirectional=true) plus a linear projection.

forward_gru : GRUSpec α inputSize hiddenSize
forward gru.
backward_gru : GRUSpec α inputSize hiddenSize
backward gru.
output_layer : LinearSpec α (hiddenSize + hiddenSize) outputSize
output layer.

Instances For

source

structure Spec.GRULanguageModel (α : Type) (vocabSize hiddenSize : ℕ) :

Type

Bundle of parameters for a stacked GRU language model with deterministic dropout.

This model uses a list of GRU layers (all with hiddenSize input/output) and applies dropout_inference_spec scaling between the GRU stack and the output projection.

embedding : LinearSpec α vocabSize hiddenSize
embedding.
gru_layers : List (GRUSpec α hiddenSize hiddenSize)
gru layers.
output_projection : LinearSpec α hiddenSize vocabSize
output projection.
dropout_rate : α
dropout rate.

Instances For

source

structure Spec.GRUEncoderDecoder (α : Type) (inputVocabSize hiddenSize outputVocabSize : ℕ) :

Type

Bundle of parameters for a GRU encoder-decoder model (seq2seq).

This uses separate embeddings and GRU cores for encoder and decoder, plus an output projection. PyTorch analogue: an encoder nn.GRU and a decoder nn.GRU with teacher forcing.

encoder_embedding : LinearSpec α inputVocabSize hiddenSize
encoder embedding.
encoder_gru : GRUSpec α hiddenSize hiddenSize
encoder gru.
decoder_embedding : LinearSpec α outputVocabSize hiddenSize
decoder embedding.
decoder_gru : GRUSpec α hiddenSize hiddenSize
decoder gru.
output_projection : LinearSpec α hiddenSize outputVocabSize
output projection.

Instances For

source

def Spec.simpleGruForward {α : Type} [Context α] {inputSize hiddenSize outputSize : ℕ} (model : SimpleGRUModel α inputSize hiddenSize outputSize) (input : Tensor α (Shape.dim inputSize Shape.scalar)) (hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) :

Tensor α (Shape.dim outputSize Shape.scalar) × Tensor α (Shape.dim hiddenSize Shape.scalar)

One-step forward for SimpleGRUModel.

Input: (x_t, h_{t-1}). Output: (y_t, h_t).

Instances For

source

def Spec.simpleGruSequenceForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : ℕ} (model : SimpleGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :

Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar)

Sequence forward for SimpleGRUModel (time-major).

Returns (outputs, final_hidden).

PyTorch analogy: run nn.GRU over the sequence, then apply nn.Linear at each timestep.

Instances For

source

def Spec.gruClassifierForward {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : ℕ} (model : GRUClassifier α inputSize hiddenSize numClasses) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :

Tensor α (Shape.dim numClasses Shape.scalar)

Forward pass for a GRUClassifier (many-to-one).

This runs the GRU over the input sequence and applies the classifier head to the final hidden state.

Instances For

source

def Spec.gruGeneratorForward {α : Type} [Context α] {seqLen vocabSize hiddenSize : ℕ} (model : GRUGenerator α vocabSize hiddenSize) (input_tokens : Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :

Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar)

Forward pass for a GRUGenerator (many-to-many).

This applies an embedding linear map to each token vector, runs the GRU, and projects each hidden state back into vocabulary space.

Instances For

source

def Spec.bigruForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : ℕ} (model : BiGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (forward_hidden backward_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) :

Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

Forward pass for a bidirectional GRU model (time-major).

This runs a forward GRU on the sequence, a backward GRU on the reversed sequence, concatenates the two hidden streams per timestep, and applies an output head.

Instances For

source

def Spec.multilayerGruForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize numLayers : ℕ} (model : MultiLayerGRUModel α inputSize hiddenSize outputSize numLayers) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hiddens : Fin numLayers → Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : numLayers > 0) (h2 : 0 < seqLen) :

Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × (Fin numLayers → Tensor α (Shape.dim hiddenSize Shape.scalar))

Forward pass for a MultiLayerGRUModel (stacked GRU layers).

This runs the first layer on the input sequence, then threads the resulting hidden stream through each additional hidden layer, and finally applies the output head per timestep.

Instances For

source

@[irreducible]

def Spec.multilayerGruForward.process_hidden_layers {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize numLayers : ℕ} (model : MultiLayerGRUModel α inputSize hiddenSize outputSize numLayers) (h : numLayers > 0) (h2 : 0 < seqLen) (layer : ℕ) (layer_input : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (hiddens : Fin numLayers → Tensor α (Shape.dim hiddenSize Shape.scalar)) :

Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar)) × (Fin numLayers → Tensor α (Shape.dim hiddenSize Shape.scalar))

Instances For

source

def Spec.gruLmForward {α : Type} [Context α] {seqLen vocabSize hiddenSize : ℕ} (model : GRULanguageModel α vocabSize hiddenSize) (input_tokens : Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))) (initial_hiddens : List (Tensor α (Shape.dim hiddenSize Shape.scalar))) (h : 0 < seqLen) :

Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) × List (Tensor α (Shape.dim hiddenSize Shape.scalar))

Forward pass for GRULanguageModel (teacher forcing, time-major).

This runs the embedding, then a stack of GRU layers with provided initial hiddens, applies deterministic dropout scaling (dropout_inference_spec), and projects to vocabulary logits.

Instances For

source

def Spec.gruLmForward.process_gru_layers {α : Type} [Context α] {seqLen hiddenSize : ℕ} (layers : List (GRUSpec α hiddenSize hiddenSize)) (hiddens : List (Tensor α (Shape.dim hiddenSize Shape.scalar))) (layer_input : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (h : 0 < seqLen) :

Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar)) × List (Tensor α (Shape.dim hiddenSize Shape.scalar))

Instances For

source

def Spec.gruEncoderDecoderForward {α : Type} [Context α] {srcSeqLen tgtSeqLen inputVocabSize hiddenSize outputVocabSize : ℕ} (model : GRUEncoderDecoder α inputVocabSize hiddenSize outputVocabSize) (src_tokens : Tensor α (Shape.dim srcSeqLen (Shape.dim inputVocabSize Shape.scalar))) (tgt_tokens : Tensor α (Shape.dim tgtSeqLen (Shape.dim outputVocabSize Shape.scalar))) (encoder_hidden _decoder_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < srcSeqLen) (h2 : 0 < tgtSeqLen) :

Tensor α (Shape.dim tgtSeqLen (Shape.dim outputVocabSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim hiddenSize Shape.scalar)

Encoder-decoder forward pass (GRU encoder + GRU decoder).

This is a small reference architecture:

encode src_tokens into a final hidden state,
decode tgt_tokens starting from that hidden state (teacher forcing),
project decoder states into output-vocabulary logits.

PyTorch analogy: nn.GRU encoder + nn.GRU decoder with a linear output projection.

Instances For

source

def Spec.simpleGruBackward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : ℕ} (model : SimpleGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (hidden_states : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (grad_outputs : Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))) (reset_gates update_gates new_candidates _reset_hiddens : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (h : seqLen ≠ 0) :

Tensor α (Shape.dim hiddenSize (Shape.dim (inputSize + hiddenSize) Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim hiddenSize (Shape.dim (inputSize + hiddenSize) Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim hiddenSize (Shape.dim (inputSize + hiddenSize) Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim outputSize (Shape.dim hiddenSize Shape.scalar)) × Tensor α (Shape.dim outputSize Shape.scalar) × Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))

Backward pass for SimpleGRUModel (full BPTT, gate-aware).

This assumes you already ran a forward pass that saved:

hidden_states,
the GRU intermediates (reset_gates, update_gates, new_candidates, reset_hiddens).

Those intermediates can be produced using Spec.gru_extract_intermediate_values from NN.Spec.Layers.Gru.

Return values:

gradients for GRU parameters (reset/update/new weights + biases),
gradients for the output linear layer (weights + bias),
gradient for each timestep input.

Instances For

source

structure Spec.AttentionGRUModel (α : Type) (inputSize hiddenSize outputSize : ℕ) :

Type

Attention-style GRU model bundle.

This record defines the parameters for an encoder/decoder GRU with learned attention scores. Forward passes can choose additive, dot-product, or domain-specific attention semantics while sharing this typed parameter bundle.

encoder_gru : GRUSpec α inputSize hiddenSize
encoder gru.
decoder_gru : GRUSpec α (inputSize + hiddenSize) hiddenSize
decoder gru.
attention_weights : LinearSpec α (hiddenSize + hiddenSize) 1
attention weights.
output_layer : LinearSpec α hiddenSize outputSize
output layer.

Instances For

source

structure Spec.ResidualGRUModel (α : Type) (inputSize hiddenSize outputSize : ℕ) :

Type

Bundle of parameters for a residual GRU model.

This includes a projection from input space to hidden space so the input can be added as a residual to the GRU hidden stream.

gru : GRUSpec α inputSize hiddenSize
gru.
residual_projection : LinearSpec α inputSize hiddenSize
residual projection.
output_layer : LinearSpec α hiddenSize outputSize
output layer.

Instances For

source

def Spec.residualGruForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : ℕ} (model : ResidualGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :

Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar)

Forward pass for ResidualGRUModel.

This runs the GRU, adds a projected version of the input as a residual connection, and applies the output head per timestep.

Instances For

source

def Spec.simpleGRUToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : ℕ} (model : SimpleGRUModel α inputSize hiddenSize outputSize) (h : 0 < seqLen) :

ModSpec.NNModuleSpec α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

Package SimpleGRUModel as an NNModuleSpec.

This is used to plug the spec model into the common module pipeline. The export_func.toPyTorch field is documentation-oriented and indicates the intended PyTorch analogue.

Instances For

source

def Spec.gruClassifierToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : ℕ} (model : GRUClassifier α inputSize hiddenSize numClasses) (h : 0 < seqLen) :

ModSpec.NNModuleSpec α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim numClasses Shape.scalar)

Package GRUClassifier as an NNModuleSpec.

PyTorch analogue: nn.GRU feeding a nn.Linear classifier head.

Instances For

source

def Spec.biGRUToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : ℕ} (model : BiGRUModel α inputSize hiddenSize outputSize) :

ModSpec.NNModuleSpec α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

Package BiGRUModel as an NNModuleSpec.

PyTorch analogue: nn.GRU(..., bidirectional=true) feeding a per-timestep linear head.

Instances For

source

def Spec.gruGeneratorToModuleSpec {α : Type} [Context α] {seqLen vocabSize hiddenSize : ℕ} (model : GRUGenerator α vocabSize hiddenSize) (h : 0 < seqLen) :

ModSpec.NNModuleSpec α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))

Package GRUGenerator as an NNModuleSpec.

PyTorch analogue: GRU language model (nn.GRU + vocabulary projection) producing a sequence of logits.

Instances For

TorchLean API

NN.Spec.Module.GruModels

GRU models (spec) #

Record-style model specs #