TorchLean API

NN.Spec.Module.LstmModels

LSTM models (spec) #

Higher‑level LSTM architectures built from module specs (SpecChain), including:

Cell equations are in NN/Spec/Layers/Lstm.lean; this file focuses on composing modules.

References (math + PyTorch behavior):

“Fully implemented” LSTM models #

The LSTM model layer exposes first-class model objects with:

This section provides those “full” APIs for the simple LSTM models in this file by reusing the gate-aware BPTT implementation in NN/Spec/Layers/Lstm.lean.

Gradient records #

structure Spec.LinearGrads (α : Type) (inDim outDim : ) :

Gradients for a linear layer y = W x + b.

This is the natural gradient bundle for Spec.LinearSpec (PyTorch analogue: torch.nn.Linear), with dW matching the weight shape [outDim, inDim] and db matching [outDim].

Instances For
    structure Spec.LSTMGrads (α : Type) (inputSize hiddenSize : ) :

    Gate-wise gradients for an LSTM cell.

    This matches the parameterization used by Spec.LSTMSpec (see NN/Spec/Layers/Lstm.lean): each gate has a weight matrix of shape [hiddenSize, inputSize + hiddenSize] applied to a concatenated vector, plus a bias of shape [hiddenSize].

    • d_forget_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)

      d forget weights.

    • d_forget_bias : HiddenVector α hiddenSize

      d forget bias.

    • d_input_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)

      d input weights.

    • d_input_bias : HiddenVector α hiddenSize

      d input bias.

    • d_candidate_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)

      d candidate weights.

    • d_candidate_bias : HiddenVector α hiddenSize

      d candidate bias.

    • d_output_weights : WeightMatrix α hiddenSize (inputSize + hiddenSize)

      d output weights.

    • d_output_bias : HiddenVector α hiddenSize

      d output bias.

    Instances For
      structure Spec.SimpleLSTMModelGrads (α : Type) (inputSize hiddenSize outputSize : ) :

      Parameter gradients for SimpleLSTMModel.

      This bundles the LSTM cell gradients and the time-distributed linear head gradients.

      • lstm : LSTMGrads α inputSize hiddenSize

        lstm.

      • output_layer : LinearGrads α hiddenSize outputSize

        output layer.

      Instances For
        def Spec.simpleLstmModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize outputSize : } (lstm_spec : LSTMSpec α inputSize hiddenSize) (linearSpec : LinearSpec α hiddenSize outputSize) :
        ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

        Sequence-to-sequence LSTM model as a SpecChain: LSTM over time, then a per-timestep linear head.

        PyTorch analogue: nn.LSTM producing an output sequence, followed by nn.Linear applied at each time step.

        Instances For
          def Spec.lstmClassifierModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize numClasses : } (lstm_spec : LSTMSpec α inputSize hiddenSize) (classifier_spec : LinearSpec α hiddenSize numClasses) (h : seqLen 0) :

          Many-to-one LSTM classifier as a SpecChain.

          This runs an LSTM over the sequence and applies a linear classifier head to the final hidden state. PyTorch analogue: nn.LSTM + nn.Linear, taking the last output/hidden.

          Instances For
            def Spec.multilayerLstmSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen inputSize hiddenSize outputSize : } (lstm1_spec : LSTMSpec α inputSize hiddenSize) (lstm2_spec : LSTMSpec α hiddenSize hiddenSize) (linearSpec : LinearSpec α hiddenSize outputSize) :
            ModSpec.SpecChain α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

            Two-layer LSTM stack (sequence-to-sequence), followed by a per-timestep linear head.

            The second LSTM consumes the hidden stream produced by the first.

            Instances For
              def Spec.lstmLanguageModelSpec {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] {seqLen vocabSize hiddenSize : } (embedding_spec : LinearSpec α vocabSize hiddenSize) (lstm_spec : LSTMSpec α hiddenSize hiddenSize) (output_spec : LinearSpec α hiddenSize vocabSize) :

              Simple LSTM language-model pipeline as a SpecChain: embedding, LSTM core, and output projection.

              In this spec layer we represent the embedding/projection as LinearSpecs (often used with one-hot token vectors). PyTorch analogue: nn.Embedding (conceptually) + nn.LSTM + nn.Linear.

              Instances For
                structure Spec.SimpleLSTMModel (α : Type) (inputSize hiddenSize outputSize : ) :

                Bundle of parameters for a single-layer LSTM model with a linear output head.

                This is a direct record representation (as opposed to the SpecChain representation above).

                • lstm : LSTMSpec α inputSize hiddenSize

                  lstm.

                • output_layer : LinearSpec α hiddenSize outputSize

                  output layer.

                Instances For
                  structure Spec.MultiLayerLSTMModel (α : Type) (inputSize hiddenSize outputSize numLayers : ) :

                  Bundle of parameters for a multi-layer LSTM model with a linear output head.

                  The first layer consumes inputSize, and all subsequent layers consume hiddenSize.

                  • first_layer : LSTMSpec α inputSize hiddenSize

                    first layer.

                  • hidden_layers (i : Fin (numLayers - 1)) : LSTMSpec α hiddenSize hiddenSize

                    hidden layers.

                  • output_layer : LinearSpec α hiddenSize outputSize

                    output layer.

                  Instances For
                    structure Spec.LSTMClassifier (α : Type) (inputSize hiddenSize numClasses : ) :

                    Bundle of parameters for a many-to-one LSTM classifier.

                    The classifier head is applied to the final hidden state.

                    • lstm : LSTMSpec α inputSize hiddenSize

                      lstm.

                    • classifier : LinearSpec α hiddenSize numClasses

                      classifier.

                    Instances For
                      structure Spec.LSTMGenerator (α : Type) (vocabSize hiddenSize : ) :

                      Bundle of parameters for a many-to-many LSTM generator (language-model style).

                      This includes an (embedding) linear map, recurrent core, and output projection back to vocabulary.

                      • embedding : LinearSpec α vocabSize hiddenSize

                        embedding.

                      • lstm : LSTMSpec α hiddenSize hiddenSize

                        lstm.

                      • output_projection : LinearSpec α hiddenSize vocabSize

                        output projection.

                      Instances For
                        structure Spec.BiLSTMModel (α : Type) (inputSize hiddenSize outputSize : ) :

                        Bundle of parameters for a bidirectional LSTM model with an output head.

                        The head consumes the concatenation of forward and backward hidden states. PyTorch analogue: nn.LSTM(..., bidirectional=true) plus a linear projection.

                        • forward_lstm : LSTMSpec α inputSize hiddenSize

                          forward lstm.

                        • backward_lstm : LSTMSpec α inputSize hiddenSize

                          backward lstm.

                        • output_layer : LinearSpec α (hiddenSize + hiddenSize) outputSize

                          output layer.

                        Instances For
                          structure Spec.LSTMLanguageModel (α : Type) (vocabSize hiddenSize : ) :

                          Bundle of parameters for a stacked LSTM language model with deterministic dropout.

                          This model uses a list of LSTM layers (all with hiddenSize input/output) and applies a dropout_inference_spec scaling step between the recurrent stack and the output projection.

                          • embedding : LinearSpec α vocabSize hiddenSize

                            embedding.

                          • lstm_layers : List (LSTMSpec α hiddenSize hiddenSize)

                            lstm layers.

                          • output_projection : LinearSpec α hiddenSize vocabSize

                            output projection.

                          • dropout_rate : α

                            dropout rate.

                          Instances For
                            def Spec.simpleLstmForward {α : Type} [Context α] {inputSize hiddenSize outputSize : } (model : SimpleLSTMModel α inputSize hiddenSize outputSize) (input : Tensor α (Shape.dim inputSize Shape.scalar)) (state : LSTMState α hiddenSize) :
                            Tensor α (Shape.dim outputSize Shape.scalar) × LSTMState α hiddenSize

                            One-step forward pass for SimpleLSTMModel.

                            Given an input vector and the previous LSTM state (hidden, cell), compute (output, new_state). PyTorch analogue: nn.LSTMCell step followed by a nn.Linear head.

                            Instances For
                              def Spec.simpleLstmSequenceForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleLSTMModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) :
                              Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × LSTMState α hiddenSize

                              Sequence forward pass for SimpleLSTMModel.

                              Runs the LSTM over all timesteps (time-major), applies the output head to each hidden state, and returns (outputs, final_state).

                              Instances For

                                Backward pass (BPTT) for the simple LSTM sequence model #

                                This is the model-level analogue of Spec.lstm_sequence_backward_spec. The only extra work we do here is to backprop through the per-timestep output projection and feed its gradient into the LSTM sequence backward pass.

                                def Spec.LSTMModels.Internal.timeDistributedLinearBackward {α : Type} [Context α] {seqLen hiddenSize outputSize : } (layer : LinearSpec α hiddenSize outputSize) (hiddens : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (grad_outputs : Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))) :
                                LinearGrads α hiddenSize outputSize × Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))

                                Backprop through the time-distributed linear head and produce hidden-state gradients.

                                Instances For
                                  def Spec.simpleLstmBackward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleLSTMModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) (grad_outputs : Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))) :
                                  SimpleLSTMModelGrads α inputSize hiddenSize outputSize × Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar)) × LSTMState α hiddenSize

                                  Backward pass for simple_lstm_sequence_forward.

                                  Returns:

                                  • parameter gradients (SimpleLSTMModelGrads)
                                  • gradient w.r.t. input sequence (dInputs)
                                  • gradient w.r.t. initial recurrent state (dInitialState)
                                  Instances For
                                    def Spec.simpleLstmMseLoss {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleLSTMModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (targets : Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) :
                                    α

                                    MSE loss for the simple LSTM sequence model.

                                    This runs simple_lstm_sequence_forward and compares the predicted output sequence against targets using mse_spec.

                                    Instances For
                                      def Spec.simpleLstmMseGrad {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleLSTMModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (targets : Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) :
                                      α × SimpleLSTMModelGrads α inputSize hiddenSize outputSize

                                      Compute (loss, grads) for the simple LSTM sequence model under MSE.

                                      This is the “full training API” building block: an optimizer (SGD/Adam) can consume these grads.

                                      Instances For
                                        def Spec.lstmClassifierForward {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : } (model : LSTMClassifier α inputSize hiddenSize numClasses) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) :
                                        Tensor α (Shape.dim numClasses Shape.scalar)

                                        Forward pass for an LSTMClassifier (many-to-one).

                                        This uses the final hidden state of the LSTM sequence as the classifier input.

                                        Instances For

                                          Backward for the classifier head (many-to-one) #

                                          The classifier only consumes the final hidden state. We express that by feeding a gradient sequence that is zero everywhere except the last timestep.

                                          def Spec.lstmClassifierBackward {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : } (model : LSTMClassifier α inputSize hiddenSize numClasses) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) (grad_logits : Tensor α (Shape.dim numClasses Shape.scalar)) :
                                          LinearGrads α hiddenSize numClasses × LSTMGrads α inputSize hiddenSize × Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))

                                          Backward pass for an LSTMClassifier (many-to-one).

                                          This backprops through the classifier head, then runs an LSTM sequence backward pass where the hidden-state gradient is zero at all timesteps except the last.

                                          Instances For
                                            def Spec.lstmGeneratorForward {α : Type} [Context α] {seqLen vocabSize hiddenSize : } (model : LSTMGenerator α vocabSize hiddenSize) (input_tokens : Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))) (initial_state : LSTMState α hiddenSize) :
                                            Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) × LSTMState α hiddenSize

                                            Forward pass for an LSTMGenerator (many-to-many).

                                            This applies an embedding linear map to each token vector, runs the LSTM, and projects each hidden state back into vocabulary space.

                                            Instances For
                                              def Spec.bilstmForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : BiLSTMModel α inputSize hiddenSize outputSize) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (forward_state backward_state : LSTMState α hiddenSize) :
                                              Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar))

                                              Forward pass for a bidirectional LSTM model (time-major).

                                              This runs a forward LSTM on the sequence, a backward LSTM on the reversed sequence, concatenates the two hidden streams per timestep, and applies an output head.

                                              Instances For
                                                def Spec.multilayerLstmForward {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize numLayers : } (model : MultiLayerLSTMModel α inputSize hiddenSize outputSize numLayers) (inputs : Tensor α (Shape.dim seqLen (Shape.dim inputSize Shape.scalar))) (initial_states : Fin numLayersLSTMState α hiddenSize) (h : numLayers > 0) :
                                                Tensor α (Shape.dim seqLen (Shape.dim outputSize Shape.scalar)) × (Fin numLayersLSTMState α hiddenSize)

                                                Forward pass for a MultiLayerLSTMModel (stacked LSTM layers).

                                                This runs the first layer on the input sequence, then threads the resulting hidden stream through each additional hidden layer, and finally applies the output head per timestep.

                                                Instances For
                                                  @[irreducible]
                                                  def Spec.multilayerLstmForward.process_hidden_layers {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize numLayers : } (model : MultiLayerLSTMModel α inputSize hiddenSize outputSize numLayers) (h : numLayers > 0) (layer : ) (layer_input : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) (states : Fin numLayersLSTMState α hiddenSize) :
                                                  Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar)) × (Fin numLayersLSTMState α hiddenSize)
                                                  Instances For
                                                    def Spec.lstmLmForward {α : Type} [Context α] {seqLen vocabSize hiddenSize : } (model : LSTMLanguageModel α vocabSize hiddenSize) (input_tokens : Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar))) (initial_states : List (LSTMState α hiddenSize)) :
                                                    Tensor α (Shape.dim seqLen (Shape.dim vocabSize Shape.scalar)) × List (LSTMState α hiddenSize)

                                                    Forward pass for LSTMLanguageModel (teacher forcing, time-major).

                                                    This runs the embedding, then a stack of LSTM layers with provided initial states, applies deterministic dropout scaling (dropout_inference_spec), and projects to vocabulary logits.

                                                    Instances For
                                                      def Spec.lstmLmForward.process_lstm_layers {α : Type} [Context α] {seqLen hiddenSize : } (layers : List (LSTMSpec α hiddenSize hiddenSize)) (states : List (LSTMState α hiddenSize)) (layer_input : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar))) :
                                                      Tensor α (Shape.dim seqLen (Shape.dim hiddenSize Shape.scalar)) × List (LSTMState α hiddenSize)
                                                      Instances For
                                                        structure Spec.AttentionLSTMModel (α : Type) (inputSize hiddenSize outputSize : ) :

                                                        Attention-style LSTM model bundle.

                                                        This record defines the parameters for an encoder/decoder LSTM with learned attention scores. Forward passes can choose additive, dot-product, or domain-specific attention semantics while sharing this typed parameter bundle.

                                                        • encoder_lstm : LSTMSpec α inputSize hiddenSize

                                                          encoder lstm.

                                                        • decoder_lstm : LSTMSpec α (inputSize + hiddenSize) hiddenSize

                                                          decoder lstm.

                                                        • attention_weights : LinearSpec α (hiddenSize + hiddenSize) 1

                                                          attention weights.

                                                        • output_layer : LinearSpec α hiddenSize outputSize

                                                          output layer.

                                                        Instances For
                                                          def Spec.simpleLSTMToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : SimpleLSTMModel α inputSize hiddenSize outputSize) :

                                                          Package SimpleLSTMModel as an NNModuleSpec.

                                                          This is used to plug the spec model into the common module pipeline. The export_func.toPyTorch field is documentation-oriented and indicates the intended PyTorch analogue.

                                                          Instances For
                                                            def Spec.lstmClassifierToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize numClasses : } (model : LSTMClassifier α inputSize hiddenSize numClasses) :

                                                            Package LSTMClassifier as an NNModuleSpec.

                                                            PyTorch analogue: nn.LSTM feeding a nn.Linear classifier head.

                                                            Instances For
                                                              def Spec.biLSTMToModuleSpec {α : Type} [Context α] {seqLen inputSize hiddenSize outputSize : } (model : BiLSTMModel α inputSize hiddenSize outputSize) :

                                                              Package BiLSTMModel as an NNModuleSpec.

                                                              PyTorch analogue: nn.LSTM(..., bidirectional=true) feeding a per-timestep linear head.

                                                              Instances For