TorchLean API

NN.Spec.Module.GruModels

GRU models (spec) #

TorchLean provides GRU layers/cells in NN.Spec.Layers.Gru. This file builds models on top of that layer API: common compositions, heads, and a couple of end-to-end forward/backward routines.

Higher‑level GRU architectures built from module specs (SpecChain):

GRU cell equations are in NN/Spec/Layers/Gru.lean; this file is primarily “wiring”.

References:

PyTorch analogy: this corresponds to wiring torch.nn.GRU with linear heads and pooling over time (e.g. last hidden state for classification).

def Spec.simpleGruModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize outputSize : } (gru_spec : GRUSpec α inputSize hiddenSize) (linearSpec : LinearSpec α hiddenSize outputSize) :
ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

A sequence-to-sequence GRU model, written as a SpecChain.

Pipeline: GRU(seqLen, inputSize → hiddenSize) then Linear applied at each timestep.

PyTorch analogy: nn.GRU(..., batch_first=False) followed by an nn.Linear on the output sequence.

Instances For
    def Spec.gruClassifierModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize numClasses : } (gru_spec : GRUSpec α inputSize hiddenSize) (classifier_spec : LinearSpec α hiddenSize numClasses) (h : seqLen 0) :

    A many-to-one GRU classifier (use the last hidden state, then a linear head).

    PyTorch analogy: run nn.GRU over the sequence and feed the last output/hidden state into nn.Linear(hiddenSize, numClasses).

    Instances For
      def Spec.multilayerGruSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize outputSize : } (gru1_spec : GRUSpec α inputSize hiddenSize) (gru2_spec : GRUSpec α hiddenSize hiddenSize) (linearSpec : LinearSpec α hiddenSize outputSize) :
      ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

      A 2-layer GRU stack (sequence-to-sequence), followed by a per-timestep linear head.

      Instances For
        def Spec.gruLanguageModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen vocabSize hiddenSize : } (embedding_spec : LinearSpec α vocabSize hiddenSize) (gru_spec : GRUSpec α hiddenSize hiddenSize) (output_spec : LinearSpec α hiddenSize vocabSize) :

        A simple GRU language-model style pipeline:

        Linear as a lightweight embedding/projection, then GRU, then a per-timestep projection back to vocabSize.

        PyTorch analogy: embedding (often nn.Embedding), nn.GRU, and nn.Linear(hiddenSize, vocabSize). We use LinearSpec here as a spec-friendly stand-in for a one-hot embedding matrix.

        Instances For

          Record-style model specs #

          The SpecChain builders above are the most uniform way to assemble models in TorchLean.

          This section uses small record types with explicit forward functions. It is useful when you want to talk about a particular architecture directly (e.g. encoder-decoder), or when you need to carry extra per-model parameters (e.g. a dropout rate) without building a full module stack.

          structure Spec.SimpleGRUModel (α : Type) (inputSize hiddenSize outputSize : ) :

          Bundle of parameters for a single-layer GRU model with a linear output head.

          This is a direct record representation (as opposed to the SpecChain representation above).

          • gru : GRUSpec α inputSize hiddenSize

            gru.

          • output_layer : LinearSpec α hiddenSize outputSize

            output layer.

          Instances For
            structure Spec.MultiLayerGRUModel (α : Type) (inputSize hiddenSize outputSize numLayers : ) :

            Bundle of parameters for a multi-layer GRU model.

            The first layer consumes inputSize, and all subsequent layers consume hiddenSize.

            • first_layer : GRUSpec α inputSize hiddenSize

              first layer.

            • hidden_layers (i : Fin (numLayers - 1)) : GRUSpec α hiddenSize hiddenSize

              hidden layers.

            • output_layer : LinearSpec α hiddenSize outputSize

              output layer.

            Instances For
              structure Spec.GRUClassifier (α : Type) (inputSize hiddenSize numClasses : ) :

              Bundle of parameters for a many-to-one GRU classifier.

              The classifier head is applied to the final hidden state.

              • gru : GRUSpec α inputSize hiddenSize

                gru.

              • classifier : LinearSpec α hiddenSize numClasses

                classifier.

              Instances For
                structure Spec.GRUGenerator (α : Type) (vocabSize hiddenSize : ) :

                Bundle of parameters for a many-to-many GRU generator (language-model style).

                This includes an (embedding) linear map, recurrent core, and output projection back to vocabulary.

                • embedding : LinearSpec α vocabSize hiddenSize

                  embedding.

                • gru : GRUSpec α hiddenSize hiddenSize

                  gru.

                • output_projection : LinearSpec α hiddenSize vocabSize

                  output projection.

                Instances For
                  structure Spec.BiGRUModel (α : Type) (inputSize hiddenSize outputSize : ) :

                  Bundle of parameters for a bidirectional GRU model with an output head.

                  The head consumes the concatenation of forward and backward hidden states. PyTorch analogue: nn.GRU(..., bidirectional=true) plus a linear projection.

                  • forward_gru : GRUSpec α inputSize hiddenSize

                    forward gru.

                  • backward_gru : GRUSpec α inputSize hiddenSize

                    backward gru.

                  • output_layer : LinearSpec α (hiddenSize + hiddenSize) outputSize

                    output layer.

                  Instances For
                    structure Spec.GRULanguageModel (α : Type) (vocabSize hiddenSize : ) :

                    Bundle of parameters for a stacked GRU language model with deterministic dropout.

                    This model uses a list of GRU layers (all with hiddenSize input/output) and applies dropout_inference_spec scaling between the GRU stack and the output projection.

                    • embedding : LinearSpec α vocabSize hiddenSize

                      embedding.

                    • gru_layers : List (GRUSpec α hiddenSize hiddenSize)

                      gru layers.

                    • output_projection : LinearSpec α hiddenSize vocabSize

                      output projection.

                    • dropout_rate : α

                      dropout rate.

                    Instances For
                      structure Spec.GRUEncoderDecoder (α : Type) (inputVocabSize hiddenSize outputVocabSize : ) :

                      Bundle of parameters for a GRU encoder-decoder model (seq2seq).

                      This uses separate embeddings and GRU cores for encoder and decoder, plus an output projection. PyTorch analogue: an encoder nn.GRU and a decoder nn.GRU with teacher forcing.

                      • encoder_embedding : LinearSpec α inputVocabSize hiddenSize

                        encoder embedding.

                      • encoder_gru : GRUSpec α hiddenSize hiddenSize

                        encoder gru.

                      • decoder_embedding : LinearSpec α outputVocabSize hiddenSize

                        decoder embedding.

                      • decoder_gru : GRUSpec α hiddenSize hiddenSize

                        decoder gru.

                      • output_projection : LinearSpec α hiddenSize outputVocabSize

                        output projection.

                      Instances For
                        def Spec.simpleGruForward {α : Type} [Context α] {inputSize hiddenSize outputSize : } (model : SimpleGRUModel α inputSize hiddenSize outputSize) (input : Tensor α (Shape.dim inputSize Shape.scalar)) (hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) :
                        Tensor α (Shape.dim outputSize Shape.scalar) × Tensor α (Shape.dim hiddenSize Shape.scalar)

                        One-step forward for SimpleGRUModel.

                        Input: (x_t, h_{t-1}). Output: (y_t, h_t).

                        Instances For
                          def Spec.simpleGruSequenceForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :
                          Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar)

                          Sequence forward for SimpleGRUModel (time-major).

                          Returns (outputs, final_hidden).

                          PyTorch analogy: run nn.GRU over the sequence, then apply nn.Linear at each timestep.

                          Instances For
                            def Spec.gruClassifierForward {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : } (model : GRUClassifier α inputSize hiddenSize numClasses) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :
                            Tensor α (Shape.dim numClasses Shape.scalar)

                            Forward pass for a GRUClassifier (many-to-one).

                            This runs the GRU over the input sequence and applies the classifier head to the final hidden state.

                            Instances For
                              def Spec.gruGeneratorForward {α : Type} [Context α] {seqLen vocabSize hiddenSize : } (model : GRUGenerator α vocabSize hiddenSize) (input_tokens : Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :
                              Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar)

                              Forward pass for a GRUGenerator (many-to-many).

                              This applies an embedding linear map to each token vector, runs the GRU, and projects each hidden state back into vocabulary space.

                              Instances For
                                def Spec.bigruForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : BiGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (forward_hidden backward_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) :
                                Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

                                Forward pass for a bidirectional GRU model (time-major).

                                This runs a forward GRU on the sequence, a backward GRU on the reversed sequence, concatenates the two hidden streams per timestep, and applies an output head.

                                Instances For
                                  def Spec.multilayerGruForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize numLayers : } (model : MultiLayerGRUModel α inputSize hiddenSize outputSize numLayers) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hiddens : Fin numLayersTensor α (Shape.dim hiddenSize Shape.scalar)) (h : numLayers > 0) (h2 : 0 < seqLen) :
                                  Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × (Fin numLayersTensor α (Shape.dim hiddenSize Shape.scalar))

                                  Forward pass for a MultiLayerGRUModel (stacked GRU layers).

                                  This runs the first layer on the input sequence, then threads the resulting hidden stream through each additional hidden layer, and finally applies the output head per timestep.

                                  Instances For
                                    @[irreducible]
                                    def Spec.multilayerGruForward.process_hidden_layers {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize numLayers : } (model : MultiLayerGRUModel α inputSize hiddenSize outputSize numLayers) (h : numLayers > 0) (h2 : 0 < seqLen) (layer : ) (layer_input : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (hiddens : Fin numLayersTensor α (Shape.dim hiddenSize Shape.scalar)) :
                                    Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar)) × (Fin numLayersTensor α (Shape.dim hiddenSize Shape.scalar))
                                    Instances For
                                      def Spec.gruLmForward {α : Type} [Context α] {seqLen vocabSize hiddenSize : } (model : GRULanguageModel α vocabSize hiddenSize) (input_tokens : Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))) (initial_hiddens : List (Tensor α (Shape.dim hiddenSize Shape.scalar))) (h : 0 < seqLen) :
                                      Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) × List (Tensor α (Shape.dim hiddenSize Shape.scalar))

                                      Forward pass for GRULanguageModel (teacher forcing, time-major).

                                      This runs the embedding, then a stack of GRU layers with provided initial hiddens, applies deterministic dropout scaling (dropout_inference_spec), and projects to vocabulary logits.

                                      Instances For
                                        def Spec.gruLmForward.process_gru_layers {α : Type} [Context α] {seqLen hiddenSize : } (layers : List (GRUSpec α hiddenSize hiddenSize)) (hiddens : List (Tensor α (Shape.dim hiddenSize Shape.scalar))) (layer_input : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (h : 0 < seqLen) :
                                        Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar)) × List (Tensor α (Shape.dim hiddenSize Shape.scalar))
                                        Instances For
                                          def Spec.gruEncoderDecoderForward {α : Type} [Context α] {srcSeqLen tgtSeqLen inputVocabSize hiddenSize outputVocabSize : } (model : GRUEncoderDecoder α inputVocabSize hiddenSize outputVocabSize) (src_tokens : Tensor α (Shape.dim srcSeqLen (Shape.dim inputVocabSize Shape.scalar))) (tgt_tokens : Tensor α (Shape.dim tgtSeqLen (Shape.dim outputVocabSize Shape.scalar))) (encoder_hidden _decoder_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < srcSeqLen) (h2 : 0 < tgtSeqLen) :
                                          Tensor α (Shape.dim tgtSeqLen (Shape.dim outputVocabSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim hiddenSize Shape.scalar)

                                          Encoder-decoder forward pass (GRU encoder + GRU decoder).

                                          This is a small reference architecture:

                                          • encode src_tokens into a final hidden state,
                                          • decode tgt_tokens starting from that hidden state (teacher forcing),
                                          • project decoder states into output-vocabulary logits.

                                          PyTorch analogy: nn.GRU encoder + nn.GRU decoder with a linear output projection.

                                          Instances For
                                            def Spec.simpleGruBackward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (hidden_states : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (grad_outputs : Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))) (reset_gates update_gates new_candidates _reset_hiddens : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (h : seqLen 0) :
                                            Tensor α (Shape.dim hiddenSize (Shape.dim (inputSize + hiddenSize) Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim hiddenSize (Shape.dim (inputSize + hiddenSize) Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim hiddenSize (Shape.dim (inputSize + hiddenSize) Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar) × Tensor α (Shape.dim outputSize (Shape.dim hiddenSize Shape.scalar)) × Tensor α (Shape.dim outputSize Shape.scalar) × Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))

                                            Backward pass for SimpleGRUModel (full BPTT, gate-aware).

                                            This assumes you already ran a forward pass that saved:

                                            • hidden_states,
                                            • the GRU intermediates (reset_gates, update_gates, new_candidates, reset_hiddens).

                                            Those intermediates can be produced using Spec.gru_extract_intermediate_values from NN.Spec.Layers.Gru.

                                            Return values:

                                            • gradients for GRU parameters (reset/update/new weights + biases),
                                            • gradients for the output linear layer (weights + bias),
                                            • gradient for each timestep input.
                                            Instances For
                                              structure Spec.AttentionGRUModel (α : Type) (inputSize hiddenSize outputSize : ) :

                                              Attention-style GRU model bundle.

                                              This record defines the parameters for an encoder/decoder GRU with learned attention scores. Forward passes can choose additive, dot-product, or domain-specific attention semantics while sharing this typed parameter bundle.

                                              • encoder_gru : GRUSpec α inputSize hiddenSize

                                                encoder gru.

                                              • decoder_gru : GRUSpec α (inputSize + hiddenSize) hiddenSize

                                                decoder gru.

                                              • attention_weights : LinearSpec α (hiddenSize + hiddenSize) 1

                                                attention weights.

                                              • output_layer : LinearSpec α hiddenSize outputSize

                                                output layer.

                                              Instances For
                                                structure Spec.ResidualGRUModel (α : Type) (inputSize hiddenSize outputSize : ) :

                                                Bundle of parameters for a residual GRU model.

                                                This includes a projection from input space to hidden space so the input can be added as a residual to the GRU hidden stream.

                                                • gru : GRUSpec α inputSize hiddenSize

                                                  gru.

                                                • residual_projection : LinearSpec α inputSize hiddenSize

                                                  residual projection.

                                                • output_layer : LinearSpec α hiddenSize outputSize

                                                  output layer.

                                                Instances For
                                                  def Spec.residualGruForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : ResidualGRUModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_hidden : Tensor α (Shape.dim hiddenSize Shape.scalar)) (h : 0 < seqLen) :
                                                  Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × Tensor α (Shape.dim hiddenSize Shape.scalar)

                                                  Forward pass for ResidualGRUModel.

                                                  This runs the GRU, adds a projected version of the input as a residual connection, and applies the output head per timestep.

                                                  Instances For
                                                    def Spec.simpleGRUToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleGRUModel α inputSize hiddenSize outputSize) (h : 0 < seqLen) :

                                                    Package SimpleGRUModel as an NNModuleSpec.

                                                    This is used to plug the spec model into the common module pipeline. The export_func.toPyTorch field is documentation-oriented and indicates the intended PyTorch analogue.

                                                    Instances For
                                                      def Spec.gruClassifierToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : } (model : GRUClassifier α inputSize hiddenSize numClasses) (h : 0 < seqLen) :

                                                      Package GRUClassifier as an NNModuleSpec.

                                                      PyTorch analogue: nn.GRU feeding a nn.Linear classifier head.

                                                      Instances For
                                                        def Spec.biGRUToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : BiGRUModel α inputSize hiddenSize outputSize) :

                                                        Package BiGRUModel as an NNModuleSpec.

                                                        PyTorch analogue: nn.GRU(..., bidirectional=true) feeding a per-timestep linear head.

                                                        Instances For
                                                          def Spec.gruGeneratorToModuleSpec {α : Type} [Context α] {seqLen vocabSize hiddenSize : } (model : GRUGenerator α vocabSize hiddenSize) (h : 0 < seqLen) :

                                                          Package GRUGenerator as an NNModuleSpec.

                                                          PyTorch analogue: GRU language model (nn.GRU + vocabulary projection) producing a sequence of logits.

                                                          Instances For