TorchLean API

NN.Runtime.Autograd.Torch.Core.Session

Eager Session and CUDA Bridge #

Session state, CUDA upload/download conversions, parameter synchronization, and tape lifecycle helpers for the eager Torch-style runtime.

Internal Eager Session #

The eager backend keeps one mutable CPU tape, one mutable CUDA tape, the non-differentiable Nat environment, and the map from tape leaves back to trainable parameters. The public session-style API lives in Runtime.Autograd.TorchLean.Session; this module owns the lower-level state it delegates to.

CUDA Bridge (Upload/Download) #

The CUDA eager tape stores float32 device buffers (Runtime.Autograd.Cuda.Buffer) paired with a runtime Shape (Runtime.Autograd.Cuda.AnyBuffer).

The Torch eager front-end still presents the spec-level Tensor α s API, so in CUDA mode we need:

The helper namespace gives CUDA bridge conversions stable call sites and a clear boundary.

Conversions required by the eager CUDA tape path.

Instances

    Float implementation #

    @[implicit_reducible]

    Float CUDA conversions: upload/download via row-major FloatArray.

    @[implicit_reducible, instance 10]

    Generic CPU-preserving fallback for scalar types without a CUDA wire-format bridge.

    Many TorchLean sessions are scalar-polymorphic on CPU, while the eager CUDA tape stores float32 buffers. This fallback keeps those CPU instantiations usable and fails loudly if a CUDA upload is requested for a scalar type that has no declared float32 wire representation. Add a higher-priority TensorConv α instance when a scalar type should be allowed onto the CUDA tape.

    Shape helpers for CUDA kernels #

    Runtime dimension list as an Array Nat (outermost-first).

    Instances For

      axisMap as an array.

      Instances For

        Synchronize a CUDA-updated parameter back to its host tensor, if needed.

        This synchronization point is explicit. Training hot paths keep parameters resident on device; public readback APIs call this helper before exposing parameter tensors to the Lean side.

        Instances For

          Store/update the CUDA mirror of a parameter and mark the host tensor stale.

          Instances For

            Overwrite a host parameter value and invalidate any stale CUDA mirror.

            Instances For

              Internal eager session: a mutable runtime tape plus side tables.

              This is the state needed to offer a PyTorch-like API where "tensors" are opaque references and ops mutate a hidden tape stored in an IO.Ref.

              Notes:

              • tape stores values and backward closures (Runtime.Autograd.Tape).
              • paramsByLeaf remembers which tape leaf ids correspond to trainable parameters (for SGD).
              • nats stores non-differentiable Nat inputs used for indexing-like operations.
              • opts : Options

                Session options controlling backend/device/kernel behavior.

              • tape : IO.Ref (Tape α)

                CPU eager tape used when opts.useGpu = false.

              • cudaTape : IO.Ref Cuda.Tape

                CUDA eager tape used when opts.useGpu = true.

              • paramsByLeaf : IO.Ref (Std.HashMap (AnyParam α))

                Map from tape leaf ids to trainable parameter objects.

              • nats : IO.Ref (Array )

                Non-differentiable integer inputs for dynamic indexing operations.

              Instances For

                Allocate a fresh eager session with an empty tape and empty side tables.

                Instances For

                  Force-free a CUDA buffer allocation; the external finalizer is safe to call twice.

                  Instances For

                    Force-release a shape-erased CUDA buffer.

                    Instances For

                      CUDA optimizers only need gradients for parameter leaves, not for every intermediate tape node. CudaGradMap is the sparse representation used by long eager CUDA runs: keys are tape node ids for parameter leaves and values are device-resident cotangents with the same shape as that leaf.

                      Release current CUDA tape values that are not persistent parameter mirrors.

                      Eager CUDA training creates ephemeral buffers for forward values and backward workspace. Reset paths call this before discarding the current tape snapshot, while persistent parameter mirrors remain owned by their Param objects.

                      Instances For

                        Release CUDA tape values after an optimizer step.

                        Unlike releaseCudaTapeNonParamValues, this may release trainable parameter leaf buffers too. In a CUDA optimizer step, trainable parameters have already been written to fresh persistent mirrors, so the leaf buffers from the just-consumed tape are stale. Non-trainable parameter leaves still are their persistent mirrors, so we keep those cached.

                        Instances For

                          Release a dense CUDA gradient array after an optimizer has consumed it.

                          Instances For

                            Release a sparse CUDA gradient map after an optimizer has consumed it.

                            Instances For

                              Check that a shape-erased CUDA buffer has the number of elements promised by its shape.

                              Instances For

                                Make an owned copy of a CUDA buffer after checking its shape metadata.

                                Instances For

                                  Ask the native allocator to return/free pages after a large CUDA eager step.

                                  Instances For

                                    Reset the tape and side tables.

                                    PyTorch comparison: like starting a fresh forward pass where the autograd graph is discarded.

                                    Instances For
                                      def Runtime.Autograd.Torch.Internal.EagerSession.param {α : Type} (s : EagerSession α) {sh : Spec.Shape} (init : Spec.Tensor α sh) (name : Option String := none) (requiresGrad : Option Bool := none) :
                                      IO (Param α sh)

                                      Create a mutable parameter object (not yet on the tape).

                                      To record this parameter on the session tape, call use, which reads the parameter and records it as a leaf.

                                      Instances For

                                        Read back the concrete tensor value stored at a TensorRef.

                                        This is a dynamic check: we ensure the id exists on the tape and the stored shape matches sh.

                                        Instances For

                                          Record a constant leaf (non-differentiable) on the tape.

                                          PyTorch comparison: like constructing a tensor with requires_grad=False.

                                          Instances For

                                            Stop-gradient boundary.

                                            Forward semantics: identity (detach(x) = x). Backward semantics: no gradient flows to x.

                                            Instances For

                                              Deterministic U[0,1) tensor generator (seeded).

                                              Instances For

                                                Deterministic {0,1} mask generator (seeded) with a scalar keep-probability input.

                                                Instances For

                                                  Use a parameter in the tape by recording its current value as a leaf.

                                                  The returned TensorRef is the handle you pass to ops. The leaf id is stored in paramsByLeaf so optimizer steps (e.g. SGD) can update parameters after backward. PyTorch comparison: like using a torch.nn.Parameter in a forward pass (it becomes a leaf in the autograd graph).

                                                  Instances For

                                                    Record an external input tensor as a leaf on the tape.

                                                    PyTorch comparison: like introducing a tensor into the autograd graph with a chosen requires_grad flag.

                                                    Instances For

                                                      Record a non-differentiable Nat input in the session environment.

                                                      This supports ops that depend on indices/labels that should not receive gradients.

                                                      Instances For

                                                        Read a previously recorded NatRef.

                                                        Instances For

                                                          Overwrite a previously recorded NatRef.

                                                          Instances For

                                                            Convert a Tensor Nat (.dim k .scalar) to an Array Nat.

                                                            Used to stage NatVecRef values into the session environment.

                                                            Instances For

                                                              Record a non-differentiable vector of Nats in the session environment.

                                                              Returns a NatVecRef k pointing to the stored block.

                                                              Instances For

                                                                Read back the vector stored at NatVecRef k.

                                                                Instances For

                                                                  Overwrite the stored vector at NatVecRef k.

                                                                  Instances For