TorchLean API

Docs Home Guide Examples Graphs

NN.API.Public

API Public #

PyTorch-like facade over the TorchLean API.

Most user code should be able to import NN and then work with:

API.nn (model/layer builders)
API.optim (optimizer configs for training)
API.Adapters (LoRA and other model adapters)
API.train (fit/predict helpers)
API.Data (datasets/loaders + CSV/NPY readers)
API.autograd (grad/vjp/jacobian helpers)
API.rand (deterministic RNG helpers)
API.text (tokenizers and small text-model helpers)
API.ssl (self-supervised sample/objective helpers)

Most of the executable runtime machinery lives under API.TorchLean.*; this module collects the pieces into a smaller, PyTorch-shaped surface under NN.API.*.

PyTorch References #

This facade is inspired by the public shape of PyTorch:

torch.nn: https://pytorch.org/docs/stable/nn.html
torch.nn.functional: https://pytorch.org/docs/stable/nn.functional.html
torch.optim: https://pytorch.org/docs/stable/optim.html
torch.utils.data: https://pytorch.org/docs/stable/data.html

TorchLean differs in two important ways:

tensor shapes are tracked in types (many "shape bugs" become type errors),
some scalar dtypes are proof-only (see NN.API.DType for executable dtype selection).

Recommended Import #

This is the implementation module for the public facade. New user code should usually prefer import NN; use import NN.Entrypoint.API when you want only the PyTorch-shaped facade.

Facade policy:

nn, functional, Loss, Norm, Autodiff, Optim, Data, tlist, and sample are the intended public namespaces.
low-level runtime composition helpers like compAny stay internal to NN.API.Runtime. Small correctness-first helpers such as batchLayerDim0 are documented as internal and may move.

@[reducible, inline]

abbrev NN.API.nn.Sequential :

Spec.Shape → Spec.Shape → Type 2

Sequential model type (TorchLean Seq). This is the analogue of PyTorch nn.Sequential.

Instances For

@[reducible, inline]

abbrev NN.API.nn.LayerDef (σ τ : Spec.Shape) :

Single-layer definition type (TorchLean LayerDef). This is the analogue of PyTorch nn.Module.

Instances For

Re-export common Seq helpers under API.nn.* so examples can stay on the public facade.

This intentionally mirrors the TorchLean names to keep the mapping obvious.

def NN.API.nn.of {σ τ : Spec.Shape} (layer : LayerDef σ τ) :

Sequential σ τ

Lift a single layer definition into a sequential model.

Instances For

All explicit-seed layer constructors live under nn.pure.*.

The top-level nn.* namespace is reserved for the seeded builder API that allocates initialization seeds automatically (PyTorch-style ergonomics).

def NN.API.nn.pure.linear (inDim outDim : ℕ) (seedW seedB : ℕ := 0) (pfx : Spec.Shape := Spec.Shape.scalar) :

Sequential (pfx.appendDim inDim) (pfx.appendDim outDim)

Linear layer on the last axis (prefix-shape preserving).

PyTorch analogue: torch.nn.Linear. See https://pytorch.org/docs/stable/generated/torch.nn.Linear.html.

Unlike the lower-level TorchLean layer constructor (which is vector-only), this public facade matches PyTorch’s convention:

if x has shape [..., inDim], linear inDim outDim returns a model of shape [..., outDim].

The leading “prefix” dimensions are treated as a batch (they are flattened to (numel(prefix), inDim), the affine map is applied once, and the result is reshaped back).

Instances For

def NN.API.nn.pure.rnn (seqLen inputSize hiddenSize : ℕ) (seedW seedB : ℕ := 0) :

Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize)

Vanilla RNN layer (time-major sequence, no batch axis).

Semantics: h_t = tanh(W [x_t; h_{t-1}] + b), with h_{-1} = 0.

This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both CPU and CUDA backends.

PyTorch analogy: torch.nn.RNN(inputSize, hiddenSize, nonlinearity="tanh") with batch_first=false, specialized to a single batch element.

Instances For

def NN.API.nn.pure.gru (seqLen inputSize hiddenSize : ℕ) (seedW seedB : ℕ := 0) :

Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize)

GRU layer (time-major sequence, no batch axis).

This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both CPU and CUDA backends.

PyTorch analogy: torch.nn.GRU(inputSize, hiddenSize) with batch_first=false, specialized to a single batch element.

Instances For

def NN.API.nn.pure.mamba (seqLen inputSize hiddenSize : ℕ) (seedW seedB : ℕ := 0) :

Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize)

Trainable Mamba-style gated diagonal state-space layer.

The layer is time-major and single-batch, matching the simple rnn/gru/lstm constructors: input (seqLen × inputSize), output (seqLen × hiddenSize). It is unrolled with differentiable TorchLean ops, so CPU and CUDA training use the same API.

Instances For

def NN.API.nn.pure.lstm (seqLen inputSize hiddenSize : ℕ) (seedW seedB : ℕ := 0) :

Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize)

LSTM layer (time-major sequence, no batch axis).

This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both CPU and CUDA backends.

PyTorch analogy: torch.nn.LSTM(inputSize, hiddenSize) with batch_first=false, specialized to a single batch element.

Instances For

structure NN.API.nn.pure.Embedding :

Embedding table initialization configuration (one-hot / token-distribution inputs).

This is the TorchLean-friendly analogue of torch.nn.Embedding in the common demo setting where token ids are represented as one-hot vectors (or soft token distributions), so lookup is a matrix multiplication rather than integer indexing.

seedW : ℕ
Seed for deterministic embedding-table initialization.
wInit : Runtime.Autograd.Torch.Init.Scheme
Initialization scheme for the embedding table.

Instances For

def NN.API.nn.pure.embedding (vocab embedDim : ℕ) (cfg : Embedding := { }) (pfx : Spec.Shape := Spec.Shape.scalar) :

Sequential (pfx.appendDim vocab) (pfx.appendDim embedDim)

Embedding layer for one-hot / token-distribution inputs (no bias).

Input shape: [..., vocab] Output shape: [..., embedDim]

PyTorch analogue: conceptually nn.Embedding(vocab, embedDim) but applied to one-hot inputs.

Instances For

structure NN.API.nn.pure.LearnedPositionalEmbedding :

Learned positional embedding configuration.

This is a trainable parameter tensor of shape (seqLen × embedDim) that is broadcast across the leading batch dimension and added to the input.

seedPos : ℕ
Seed for deterministic initialization.
posInit : Runtime.Autograd.Torch.Init.Scheme
Initialization scheme for the positional embedding table.

Instances For

def NN.API.nn.pure.learnedPositionalEmbedding {batch seqLen embedDim : ℕ} (cfg : LearnedPositionalEmbedding := { }) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim))

Add learned positional embeddings to a batched (batch × seqLen × embedDim) tensor.

PyTorch analogue: x + pos[:seqLen] where pos is a parameter table.

Instances For

structure NN.API.nn.pure.SinusoidalPositionalEncoding :

Sinusoidal positional encoding configuration.

This is the classic (non-trainable) Transformer sinusoidal encoding, added to token embeddings. startPos is an absolute-position offset (useful for KV-cache decoding).

startPos : ℕ
Absolute position offset for the first row of the encoding table.

Instances For

def NN.API.nn.pure.sinusoidalPositionalEncoding {batch seqLen embedDim : ℕ} (cfg : SinusoidalPositionalEncoding := { }) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim))

Add sinusoidal positional encodings to a batched (batch × seqLen × embedDim) tensor.

Implementation:

precompute PE : (seqLen × embedDim) at initialization time (stored as a non-trainable buffer),
broadcast it across the leading batch axis and add to the input.

Instances For

structure NN.API.nn.pure.RoPE :

Rotary positional embedding (RoPE) configuration.

startPos is an absolute-position offset (useful for KV-cache decoding).

startPos : ℕ
Absolute position offset for the first row of RoPE angles.

Instances For

def NN.API.nn.pure.rope {batch numHeads seqLen headDim : ℕ} (cfg : RoPE := { }) :

Sequential (Spec.Shape.dim batch (Spec.Shape.dim numHeads (Tensor.Shape.Mat seqLen headDim))) (Spec.Shape.dim batch (Spec.Shape.dim numHeads (Tensor.Shape.Mat seqLen headDim)))

Apply RoPE to a batched multi-head tensor (batch × numHeads × seqLen × headDim).

This matches the standard identity:

rope(x) = x * cos + rotatePairs(x) * sin

where cos/sin depend only on (pos, dim) and broadcast across (batch, numHeads).

Notes:

This layer is differentiable (gradients flow through the rotation), but it has no trainable parameters; the precomputed cos/sin tables are stored as non-trainable buffers.
The pure spec version is in NN.Spec.Layers.PositionalEncoding (Spec.rope_apply_heads_spec).

Instances For

def NN.API.nn.pure.relu {s : Spec.Shape} :

Elementwise ReLU. PyTorch analogue: torch.nn.ReLU / torch.nn.functional.relu.

Instances For

def NN.API.nn.pure.silu {s : Spec.Shape} :

Elementwise SiLU/Swish. PyTorch analogue: torch.nn.SiLU / torch.nn.functional.silu.

Instances For

def NN.API.nn.pure.gelu {s : Spec.Shape} :

Elementwise GELU. PyTorch analogue: torch.nn.GELU / torch.nn.functional.gelu.

Instances For

def NN.API.nn.pure.sigmoid {s : Spec.Shape} :

Elementwise sigmoid. PyTorch analogue: torch.nn.Sigmoid / torch.nn.functional.sigmoid.

Instances For

def NN.API.nn.pure.tanh {s : Spec.Shape} :

Elementwise tanh. PyTorch analogue: torch.nn.Tanh / torch.nn.functional.tanh.

Instances For

def NN.API.nn.pure.softmax {s : Spec.Shape} :

Softmax. PyTorch analogue: torch.nn.Softmax / torch.nn.functional.softmax.

Instances For

def NN.API.nn.pure.sum {s : Spec.Shape} :

Sequential s Spec.Shape.scalar

Reduce-sum to a scalar. PyTorch analogue: torch.sum.

Instances For

def NN.API.nn.pure.flatten {s : Spec.Shape} :

Sequential s (Spec.Shape.dim s.size Spec.Shape.scalar)

Flatten any tensor into a 1D vector of length size s. PyTorch analogue: torch.flatten.

Instances For

def NN.API.nn.pure.flattenBatch {n : ℕ} {s : Spec.Shape} :

Sequential (Spec.Shape.dim n s) (Tensor.Shape.Mat n s.size)

Flatten a batched tensor N × σ into a matrix N × (size σ).

PyTorch analogue: torch.flatten(x, start_dim=1).

Instances For

def NN.API.nn.pure.flattenStart1 {n : ℕ} {s : Spec.Shape} :

Sequential (Spec.Shape.dim n s) (Tensor.Shape.Mat n s.size)

Flatten a batched tensor starting at dimension 1 (keep dim0).

Synonym for flattenBatch, matching PyTorch’s start_dim=1 wording.

Instances For

def NN.API.nn.pure.dropout {s : Spec.Shape} (p : Float) (seed : ℕ := 0) :

Dropout layer (active in train mode, identity in eval mode).

PyTorch analogue: torch.nn.Dropout.

Instances For

def NN.API.nn.pure.flattenLinear {s : Spec.Shape} (outDim : ℕ) (seedW seedB : ℕ := 0) :

Sequential s (Tensor.Shape.Vec outDim)

Convenience block: Flatten -> Linear.

This is common for "image to classifier head" demos.

Instances For

nn.functional mirrors torch.nn.functional: pure, stateless building blocks.

In TorchLean these are defined as derived ops over the small primitive Ops surface, so the same code works on both the eager backend and the compiled backend.

PyTorch references:

torch.nn.functional: https://pytorch.org/docs/stable/nn.functional.html

Batch Lifting #

batchDim0 n model wraps a single-example model σ → τ into a batched model (dim n σ) → (dim n τ) by running the underlying model once per batch element.

This is a correctness-first helper used to expose PyTorch-like N×... APIs even when a primitive only exists for the unbatched shape.

def NN.API.nn.pure.batchLayerDim0 (n : ℕ) {σ τ : Spec.Shape} (l : LayerDef σ τ) :

LayerDef (Spec.Shape.dim n σ) (Spec.Shape.dim n τ)

Lift a single-example LayerDef σ τ to operate on a dimension-0 batch.

This is a correctness-first helper: it runs the underlying layer independently on each batch element. Prefer a primitive batched layer when one exists.

Instances For

def NN.API.nn.pure.batchDim0 (n : ℕ) {σ τ : Spec.Shape} :

Sequential σ τ → Sequential (Spec.Shape.dim n σ) (Spec.Shape.dim n τ)

Lift a sequential model to act pointwise on a leading dim0 batch axis.

Instances For

Note: some low-level TorchLean layers (notably conv/pool/norm) have Nat-side well-formedness proof arguments (e.g. kH ≠ 0).

The public path is record-based specs that hide those proofs via typeclasses like NeZero, so examples can stay PyTorch-like without relying on positional macros.

structure NN.API.nn.pure.Conv2d :

Named-field Conv2d configuration (CHW layout).

This is the public, PyTorch-like entry point for convolution in TorchLean. PyTorch analogue: torch.nn.Conv2d. See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.

outC : ℕ
Output channels.
kH : ℕ
Kernel height.
kW : ℕ
Kernel width.
stride : ℕ
Stride (shared for height/width).
padding : ℕ
Zero-padding (shared for height/width).
seedK : ℕ
Seed for deterministic kernel initialization.
seedB : ℕ
Seed for deterministic bias initialization.
kInit : Runtime.Autograd.Torch.Init.Scheme
Initialization scheme for the kernel weights.

Instances For

@[reducible, inline]

abbrev NN.API.nn.pure.Conv :

Named-field Conv2d configuration (CHW layout).

This is the public, PyTorch-like entry point for convolution in TorchLean. PyTorch analogue: torch.nn.Conv2d. See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.

Instances For

def NN.API.nn.pure.conv2dCHWWith {inC inH inW : ℕ} (cfg : Conv2d) (hInC : inC ≠ 0) (hKH : cfg.kH ≠ 0) (hKW : cfg.kW ≠ 0) :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1))

2D convolution over a CHW tensor, using explicit well-formedness proofs.

Instances For

def NN.API.nn.pure.conv2dCHW {inC inH inW : ℕ} (cfg : Conv2d) [NeZero inC] [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1))

2D convolution over a CHW tensor, with a PyTorch-like named-field spec.

This hides the Nat-side proof arguments via the NeZero typeclass.

Instances For

def NN.API.nn.pure.conv2d {n inC inH inW : ℕ} (cfg : Conv2d) [NeZero inC] [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1))

2D convolution over a batched image tensor (shape N×C×H×W, like PyTorch).

Instances For

def NN.API.nn.pure.convCHWWith {inC inH inW : ℕ} (cfg : Conv2d) (hInC : inC ≠ 0) (hKH : cfg.kH ≠ 0) (hKW : cfg.kW ≠ 0) :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1))

2D convolution over a CHW tensor, using explicit well-formedness proofs.

Instances For

def NN.API.nn.pure.convCHW {inC inH inW : ℕ} (cfg : Conv2d) [NeZero inC] [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1))

2D convolution over a CHW tensor, with a PyTorch-like named-field spec.

This hides the Nat-side proof arguments via the NeZero typeclass.

Instances For

def NN.API.nn.pure.conv {n inC inH inW : ℕ} (cfg : Conv) [NeZero inC] [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1))

Convolution over batched CHW images, using the PyTorch-style Conv2d config record.

Shorthand for conv2d.

Instances For

structure NN.API.nn.pure.MaxPool2d :

MaxPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.MaxPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.

kH : ℕ
Kernel height.
kW : ℕ
Kernel width.
stride : ℕ
Stride (shared for height/width).

Instances For

@[reducible, inline]

abbrev NN.API.nn.pure.MaxPool :

MaxPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.MaxPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.

Instances For

def NN.API.nn.pure.maxPool2dWith {inC inH inW : ℕ} (cfg : MaxPool2d) (hKH : cfg.kH ≠ 0) (hKW : cfg.kW ≠ 0) :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

MaxPool2d with explicit nonzero kernel proofs.

Instances For

def NN.API.nn.pure.maxPool2dCHW {inC inH inW : ℕ} (cfg : MaxPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

MaxPool2d over CHW inputs using NeZero to hide nonzero kernel proofs.

Instances For

def NN.API.nn.pure.maxPool2d {n inC inH inW : ℕ} (cfg : MaxPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

MaxPool2d using NeZero to hide nonzero kernel proofs.

Instances For

def NN.API.nn.pure.maxPoolWith {inC inH inW : ℕ} (cfg : MaxPool2d) (hKH : cfg.kH ≠ 0) (hKW : cfg.kW ≠ 0) :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

Shorthand for maxPool2dWith (PyTorch-style).

Instances For

def NN.API.nn.pure.maxPoolCHW {inC inH inW : ℕ} (cfg : MaxPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

Shorthand for maxPool2dCHW (PyTorch-style).

Instances For

def NN.API.nn.pure.maxPool {n inC inH inW : ℕ} (cfg : MaxPool) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

Max pooling over batched CHW images, using the PyTorch-style MaxPool2d config record.

Shorthand for maxPool2d.

Instances For

structure NN.API.nn.pure.AvgPool2d :

AvgPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.AvgPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html.

kH : ℕ
Kernel height.
kW : ℕ
Kernel width.
stride : ℕ
Stride (shared for height/width).

Instances For

@[reducible, inline]

abbrev NN.API.nn.pure.AvgPool :

AvgPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.AvgPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html.

Instances For

def NN.API.nn.pure.avgPool2dWith {inC inH inW : ℕ} (cfg : AvgPool2d) (hKH : cfg.kH ≠ 0) (hKW : cfg.kW ≠ 0) :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

AvgPool2d with explicit nonzero kernel proofs.

Instances For

def NN.API.nn.pure.avgPool2dCHW {inC inH inW : ℕ} (cfg : AvgPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

AvgPool2d over CHW inputs using NeZero to hide nonzero kernel proofs.

Instances For

def NN.API.nn.pure.avgPool2d {n inC inH inW : ℕ} (cfg : AvgPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

AvgPool2d over batched NCHW inputs (shape N×C×H×W, like PyTorch).

Instances For

def NN.API.nn.pure.avgPoolWith {inC inH inW : ℕ} (cfg : AvgPool2d) (hKH : cfg.kH ≠ 0) (hKW : cfg.kW ≠ 0) :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

Shorthand for avgPool2dWith (PyTorch-style).

Instances For

def NN.API.nn.pure.avgPoolCHW {inC inH inW : ℕ} (cfg : AvgPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

Shorthand for avgPool2dCHW (PyTorch-style).

Instances For

def NN.API.nn.pure.avgPool {n inC inH inW : ℕ} (cfg : AvgPool) [NeZero cfg.kH] [NeZero cfg.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1))

Average pooling over batched CHW images, using the PyTorch-style AvgPool2d config record.

Shorthand for avgPool2d.

Instances For

def NN.API.nn.pure.globalAvgPoolCHW (c h w : ℕ) {hC : c > 0} {hH : h > 0} {hW : w > 0} :

TorchLean.NN.Seq (Tensor.Shape.CHW c h w) (Tensor.Shape.Vec c)

Global average pooling over a CHW tensor.

PyTorch analogue: torch.nn.AdaptiveAvgPool2d((1, 1)) followed by flattening.

Instances For

def NN.API.nn.pure.globalAvgPoolNCHW (n c h w : ℕ) {hN : n > 0} {hC : c > 0} {hH : h > 0} {hW : w > 0} :

TorchLean.NN.Seq (Tensor.Shape.NCHW n c h w) (Spec.Shape.dim n (Spec.Shape.dim c Spec.Shape.scalar))

Global average pooling over an NCHW tensor (preserves the batch dimension).

Instances For

structure NN.API.nn.pure.LayerNorm :

LayerNorm configuration for batched (batch x seqLen x embedDim) tensors.

PyTorch analogue: torch.nn.LayerNorm. See https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html.

seedGamma : ℕ
Seed for deterministic initialization of gamma (scale).
seedBeta : ℕ
Seed for deterministic initialization of beta (shift).

Instances For

def NN.API.nn.pure.layerNormWith {batch seqLen embedDim : ℕ} (cfg : LayerNorm) (hSeq : seqLen > 0) (hEmbed : embedDim > 0) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim))

Layer normalization over (batch × seqLen × embedDim) tensors, with explicit positivity proofs.

This matches the common Transformer usage: normalize each token’s embedDim-vector independently, with learnable scale/shift parameters gamma and beta.

PyTorch analogue: torch.nn.LayerNorm(embedDim) applied to a tensor of shape (batch, seqLen, embedDim).

Most users should call nn.layerNorm, which uses NeZero to discharge the positivity proofs.

Instances For

def NN.API.nn.pure.layerNorm {batch seqLen embedDim : ℕ} (cfg : LayerNorm := { }) [NeZero seqLen] [NeZero embedDim] :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim))

Layer normalization over (batch × seqLen × embedDim) tensors.

This normalizes each embedDim-vector (per batch element, per sequence position), and applies learned affine parameters gamma and beta.

PyTorch analogue: torch.nn.LayerNorm(embedDim) on a tensor shaped (batch, seqLen, embedDim).

Implementation note: TorchLean uses NeZero to ensure seqLen and embedDim are positive, avoiding degenerate shapes.

Instances For

structure NN.API.nn.pure.RMSNorm :

RMSNorm configuration for batched (batch x seqLen x embedDim) tensors.

This is a common alternative to LayerNorm in modern transformer architectures.

seedGamma : ℕ
Seed for deterministic initialization of gamma (scale).

Instances For

def NN.API.nn.pure.rmsNormWith {batch seqLen embedDim : ℕ} (cfg : RMSNorm) (hSeq : seqLen > 0) (hEmbed : embedDim > 0) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim))

RMS normalization over (batch × seqLen × embedDim) tensors, with explicit positivity proofs.

This is like LayerNorm but without mean subtraction: we scale by the root-mean-square over the embedDim axis, and apply a learned scale gamma.

PyTorch analogue: many libraries provide an RMSNorm(embedDim) module; conceptually it is applied to tensors shaped (batch, seqLen, embedDim).

Most users should call nn.rmsNorm, which uses NeZero to discharge the positivity proofs.

Instances For

def NN.API.nn.pure.rmsNorm {batch seqLen embedDim : ℕ} (cfg : RMSNorm := { }) [NeZero seqLen] [NeZero embedDim] :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim))

RMS normalization over (batch × seqLen × embedDim) tensors.

This normalizes by the root-mean-square over the embedDim axis (per batch element, per position), then applies a learned scale gamma.

Implementation note: TorchLean uses NeZero to ensure seqLen and embedDim are positive, avoiding degenerate shapes.

Instances For

structure NN.API.nn.pure.BatchNorm2d :

BatchNorm2d configuration (learned scale/shift).

PyTorch analogue: torch.nn.BatchNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html.

seedGamma : ℕ
Seed for deterministic initialization of gamma (scale).
seedBeta : ℕ
Seed for deterministic initialization of beta (shift).

Instances For

def NN.API.nn.pure.batchNorm2dNCHWWith {n c h w : ℕ} (cfg : BatchNorm2d) (hN : n > 0) (hC : c > 0) (hH : h > 0) (hW : w > 0) :

Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w)

BatchNorm2d over NCHW inputs (train/eval is handled by Seq mode).

Instances For

def NN.API.nn.pure.batchNorm2d {n c h w : ℕ} (cfg : BatchNorm2d := { }) [NeZero n] [NeZero c] [NeZero h] [NeZero w] :

Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w)

BatchNorm2d over NCHW inputs, using NeZero to hide the positivity proofs.

PyTorch analogue: torch.nn.BatchNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html.

Instances For

structure NN.API.nn.pure.InstanceNorm2d :

InstanceNorm2d configuration (learned scale/shift).

PyTorch analogue: torch.nn.InstanceNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html.

seedGamma : ℕ
Seed for deterministic initialization of gamma (scale).
seedBeta : ℕ
Seed for deterministic initialization of beta (shift).

Instances For

def NN.API.nn.pure.instanceNorm2dWith {n c h w : ℕ} (cfg : InstanceNorm2d) (hN : n > 0) (hC : c > 0) (hH : h > 0) (hW : w > 0) :

Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w)

InstanceNorm2d over NCHW inputs, using explicit positivity proofs.

PyTorch analogue: torch.nn.InstanceNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html.

Instances For

def NN.API.nn.pure.instanceNorm2d {n c h w : ℕ} (cfg : InstanceNorm2d := { }) [NeZero n] [NeZero c] [NeZero h] [NeZero w] :

Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w)

InstanceNorm2d over NCHW inputs, using NeZero to hide the positivity proofs.

PyTorch analogue: torch.nn.InstanceNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html.

Instances For

def NN.API.nn.pure.groupNorm2dNCHW (n c h w groups : ℕ) {hN : n > 0} {hC : c > 0} {hH : h > 0} {hW : w > 0} {hG : groups > 0} (hGE : c ≥ groups) (hDiv : c % groups = 0) (seedGamma seedBeta : ℕ := 0) :

Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w)

GroupNorm over NCHW inputs.

PyTorch analogue: torch.nn.GroupNorm. See https://pytorch.org/docs/stable/generated/torch.nn.GroupNorm.html.

Instances For

structure NN.API.nn.pure.MultiheadAttention :

Multi-head self-attention configuration.

PyTorch analogue: torch.nn.MultiheadAttention (conceptually). See https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html.

numHeads : ℕ
Number of attention heads.
headDim : ℕ
Per-head embedding dimension.
seedW : ℕ
Base seed for deterministic parameter initialization.

Instances For

def NN.API.nn.pure.multiheadAttentionWith {batch n dModel : ℕ} (cfg : MultiheadAttention) (hN : n ≠ 0) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Multi-head self-attention with an explicit nonzero sequence length proof.

If mask is provided, it is a boolean attention mask of shape (n × n) (e.g. causal masking).

Instances For

def NN.API.nn.pure.multiheadAttention {batch n dModel : ℕ} (cfg : MultiheadAttention) [NeZero n] (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Multi-head self-attention using NeZero to hide the nonzero sequence length proof.

If mask is provided, it is a boolean attention mask of shape (n × n) (e.g. causal masking).

Instances For

inductive NN.API.nn.pure.blocks.Activation :

Small set of activation choices for block builders.

PyTorch analogues:

relu <-> torch.nn.ReLU
gelu <-> torch.nn.GELU
silu <-> torch.nn.SiLU
tanh <-> torch.nn.Tanh
sigmoid <-> torch.nn.Sigmoid

relu : Activation
gelu : Activation
silu : Activation
tanh : Activation
sigmoid : Activation

Instances For

def NN.API.nn.pure.blocks.instReprActivation.repr :

Activation → ℕ → Std.Format

Instances For

@[implicit_reducible]

instance NN.API.nn.pure.blocks.instReprActivation :

Repr Activation

@[implicit_reducible]

instance NN.API.nn.pure.blocks.instDecidableEqActivation :

DecidableEq Activation

def NN.API.nn.pure.blocks.activation {s : Spec.Shape} :

Activation → Sequential s s

Interpret an Activation as a TorchLean layer.

Instances For

structure NN.API.nn.pure.blocks.MLP :

MLP (multi-layer perceptron) configuration.

This is a lightweight builder that produces a sequential stack of linear layers with activations and optional dropout.

PyTorch analogue: a hand-written nn.Sequential(Linear(...), ReLU(), ..., Linear(...)).

hidden : List ℕ
Hidden layer widths (each entry creates a Linear -> Activation stage).
activation : Activation
Activation used after each hidden linear layer.
dropout? : Option Float
Optional dropout probability after each activation.
seedBase : ℕ
Base seed used to deterministically initialize all linear layers (and dropout if present).

Instances For

def NN.API.nn.pure.blocks.mlpGo (act : Activation) (dropout? : Option Float) (inDim : ℕ) (hidden : List ℕ) (outDim seed : ℕ) :

Sequential (Tensor.Shape.Vec inDim) (Tensor.Shape.Vec outDim)

Internal recursion for mlp.

This builds the sequential stack stage-by-stage, threading a seed so each linear (and optional dropout) layer gets a deterministic initialization key.

Instances For

def NN.API.nn.pure.blocks.mlp (inDim outDim : ℕ) (cfg : MLP := { }) :

Sequential (Tensor.Shape.Vec inDim) (Tensor.Shape.Vec outDim)

Build an MLP as a sequential stack of linear layers and activations.

This is a small "PyTorch-shaped" helper: a typical call looks like: API.nn.blocks.mlp 784 10 { hidden := [128, 128], activation := .relu }.

Instances For

structure NN.API.nn.pure.blocks.Conv2dAct :

Conv2d + activation (+ optional dropout) block configuration (CHW layout).

This compact helper is used by vision examples before moving to larger curated blocks.

conv : Conv2d
Conv hyperparameters and seeds.
activation : Activation
Activation applied after the convolution.
dropout? : Option Float
Optional dropout probability after the activation.
seedDropout : ℕ
Seed for dropout RNG (only used when dropout? is present).

Instances For

def NN.API.nn.pure.blocks.conv2dAct {inC inH inW : ℕ} (cfg : Conv2dAct) [NeZero inC] [NeZero cfg.conv.kH] [NeZero cfg.conv.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.conv.outC ((inH + 2 * cfg.conv.padding - cfg.conv.kH) / cfg.conv.stride + 1) ((inW + 2 * cfg.conv.padding - cfg.conv.kW) / cfg.conv.stride + 1))

Conv2d -> Activation -> (optional Dropout) over CHW inputs.

Instances For

Vision blocks #

These are small, named-field building blocks intended for public examples:

reduce seed/proof noise at call sites,
keep composition explicit (still seq! stacking),
provide canonical blocks users expect from PyTorch codebases.

They are intentionally conservative: the goal is readability and stable typing, not maximum coverage.

structure NN.API.nn.pure.blocks.Conv2dNormAct :

Configuration for a common vision block: Conv2d -> BatchNorm2d -> Activation -> (optional Dropout).

This is used by conv2dNormActCHW (single-image CHW) and conv2dNormAct (batched NCHW). We keep deterministic seed allocation explicit via seedBase so examples stay reproducible.

conv : Conv2d
Conv hyperparameters (seeds inside this record are ignored; use seedBase).
activation : Activation
Activation after normalization.
dropout? : Option Float
Optional dropout applied after the activation.
seedBase : ℕ
Base seed for deterministic init (derived seeds are allocated in a fixed order).

Instances For

def NN.API.nn.pure.blocks.conv2dNormActCHW {inC inH inW : ℕ} (cfg : Conv2dNormAct) [NeZero inC] [NeZero cfg.conv.kH] [NeZero cfg.conv.kW] [NeZero cfg.conv.outC] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.conv.outC ((inH + 2 * cfg.conv.padding - cfg.conv.kH) / cfg.conv.stride + 1) ((inW + 2 * cfg.conv.padding - cfg.conv.kW) / cfg.conv.stride + 1))

Conv2d -> BatchNorm -> Activation -> (optional Dropout), over a single CHW image (no batch axis).

Seed allocation (relative to seedBase):

seedBase + 0,1: conv kernel / bias
seedBase + 2..5: BN gamma / beta / running-mean / running-var
seedBase + 6: dropout

Instances For

structure NN.API.nn.pure.blocks.Conv2dNormActPool :

Configuration for conv2dNormActPool*: a Conv2dNormAct block followed by max-pooling.

This matches the common “conv-bn-act-pool” pattern used in small CNNs.

block : Conv2dNormAct
Conv/BN/activation/dropout block configuration.
pool : MaxPool2d
Pooling hyperparameters (defaults to 2×2 stride-2 max pool).

Instances For

def NN.API.nn.pure.blocks.conv2dNormActPoolCHW {inC inH inW : ℕ} (cfg : Conv2dNormActPool) [NeZero inC] [NeZero cfg.block.conv.kH] [NeZero cfg.block.conv.kW] [NeZero cfg.block.conv.outC] [NeZero cfg.pool.kH] [NeZero cfg.pool.kW] :

Sequential (Tensor.Shape.Image inC inH inW) (Tensor.Shape.Image cfg.block.conv.outC (((inH + 2 * cfg.block.conv.padding - cfg.block.conv.kH) / cfg.block.conv.stride + 1 - cfg.pool.kH) / cfg.pool.stride + 1) (((inW + 2 * cfg.block.conv.padding - cfg.block.conv.kW) / cfg.block.conv.stride + 1 - cfg.pool.kW) / cfg.pool.stride + 1))

conv2dNormActCHW followed by MaxPool2dCHW.

Instances For

def NN.API.nn.pure.blocks.conv2dNormAct {n inC inH inW : ℕ} (cfg : Conv2dNormAct) [NeZero n] [NeZero inC] [NeZero cfg.conv.kH] [NeZero cfg.conv.kW] [NeZero cfg.conv.outC] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n cfg.conv.outC ((inH + 2 * cfg.conv.padding - cfg.conv.kH) / cfg.conv.stride + 1) ((inW + 2 * cfg.conv.padding - cfg.conv.kW) / cfg.conv.stride + 1))

Conv2d -> BatchNorm2d -> Activation -> (optional Dropout), over batched image tensors (N×C×H×W).

This is the public PyTorch-like path: examples should build CNNs directly over batched images.

Instances For

def NN.API.nn.pure.blocks.conv2dNormActPool {n inC inH inW : ℕ} (cfg : Conv2dNormActPool) [NeZero n] [NeZero inC] [NeZero cfg.block.conv.kH] [NeZero cfg.block.conv.kW] [NeZero cfg.block.conv.outC] [NeZero cfg.pool.kH] [NeZero cfg.pool.kW] :

Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n cfg.block.conv.outC (((inH + 2 * cfg.block.conv.padding - cfg.block.conv.kH) / cfg.block.conv.stride + 1 - cfg.pool.kH) / cfg.pool.stride + 1) (((inW + 2 * cfg.block.conv.padding - cfg.block.conv.kW) / cfg.block.conv.stride + 1 - cfg.pool.kW) / cfg.pool.stride + 1))

conv2dNormAct followed by MaxPool2d, over batched image tensors.

Instances For

def NN.API.nn.pure.blocks.residualLayer {s : Spec.Shape} (inner : Sequential s s) :

Residual/skip-connection wrapper as a single LayerDef.

Given inner : Seq s s, this builds a layer that computes x |-> inner(x) + x.

PyTorch analogue: x + f(x) blocks used throughout ResNets and Transformers.

Instances For

def NN.API.nn.pure.blocks.residual {s : Spec.Shape} (inner : Sequential s s) :

Lift residualLayer into a sequential model.

Instances For

Branching (skip connections) #

Seq is linear, but we sometimes want a PyTorch-like x |-> f(x) + g(x) block.

We expose this as a single LayerDef whose parameter list is params(f) ++ params(g) and whose forward pass runs both programs and adds their outputs.

def NN.API.nn.pure.blocks.addBranchesLayer {σ τ : Spec.Shape} (f g : Sequential σ τ) :

Combine two sequential branches into a single layer that adds their outputs.

The resulting layer runs both f and g on the same input x and returns f(x) + g(x). Parameters are concatenated as params(f) ++ params(g).

Instances For

def NN.API.nn.pure.blocks.addBranches {σ τ : Spec.Shape} (f g : Sequential σ τ) :

Sequential σ τ

Combine two models with the same input/output shapes by summing their outputs.

This is a typed “residual add” helper: addBranches f g represents the model x ↦ f(x) + g(x), and its parameter list is the concatenation of the two branches’ parameter lists.

Instances For

ResNet BasicBlock #

We provide a typed and composable ResNet-18 style BasicBlock over CHW tensors.

Key idea: we use a small canonical stride-2 formula down2 (matching GraphSpec/Models/resnet18) so projection shortcuts typecheck cleanly without leaking Nat arithmetic at call sites.

@[reducible, inline]

abbrev NN.API.nn.pure.blocks.down2 (h : ℕ) :

Canonical stride-2 spatial downsampling formula used by ResNet blocks.

down2 h = (h - 1) / 2 + 1 = ceil(h / 2).

This matches the output-size formula for common stride-2 layers used in ResNet downsampling (e.g. 3×3 conv with padding 1, or 1×1 conv with padding 0).

Instances For

theorem NN.API.nn.pure.blocks.down2_pos (h : ℕ) :

down2 is always positive (used to discharge NeZero goals).

theorem NN.API.nn.pure.blocks.conv3_same_out_eq {h : ℕ} (hh : h > 0) :

(h + 2 * 1 - 3) / 1 + 1 = h

Shape arithmetic helper: 3×3 conv with stride 1 and padding 1 preserves a positive spatial size.

This matches the standard conv output formula used by conv2dCHW.

theorem NN.API.nn.pure.blocks.conv1_same_out_eq {h : ℕ} (hh : h > 0) :

(h + 2 * 0 - 1) / 1 + 1 = h

Shape arithmetic helper: 1×1 conv with stride 1 and padding 0 preserves a positive spatial size.

def NN.API.nn.pure.blocks.conv3x3Same {inC outC h w : ℕ} [NeZero inC] [NeZero h] [NeZero w] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Image inC h w) (Tensor.Shape.Image outC h w)

ResNet helper: 3×3 convolution with padding 1, stride 1 (shape-preserving), over CHW images.

Instances For

def NN.API.nn.pure.blocks.conv3x3Down {inC outC h w : ℕ} [NeZero inC] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Image inC h w) (Tensor.Shape.Image outC (down2 h) (down2 w))

ResNet helper: 3×3 convolution with padding 1, stride 2 (spatial downsampling via down2), over CHW images.

Instances For

def NN.API.nn.pure.blocks.conv1x1Same {inC outC h w : ℕ} [NeZero inC] [NeZero h] [NeZero w] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Image inC h w) (Tensor.Shape.Image outC h w)

ResNet helper: 1×1 convolution with stride 1 (shape-preserving), over CHW images.

Instances For

def NN.API.nn.pure.blocks.conv1x1Down {inC outC h w : ℕ} [NeZero inC] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Image inC h w) (Tensor.Shape.Image outC (down2 h) (down2 w))

ResNet helper: 1×1 convolution with stride 2 (spatial downsampling via down2), over CHW images.

Instances For

def NN.API.nn.pure.blocks.conv3x3SameImages {n inC outC h w : ℕ} [NeZero n] [NeZero inC] [NeZero h] [NeZero w] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Images n inC h w) (Tensor.Shape.Images n outC h w)

ResNet helper: 3×3 convolution over batched images (NCHW-style), preserving spatial size.

Instances For

def NN.API.nn.pure.blocks.conv3x3DownImages {n inC outC h w : ℕ} [NeZero n] [NeZero inC] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Images n inC h w) (Tensor.Shape.Images n outC (down2 h) (down2 w))

ResNet helper: 3×3 convolution over batched images (NCHW-style), downsampling via down2.

Instances For

def NN.API.nn.pure.blocks.conv1x1SameImages {n inC outC h w : ℕ} [NeZero n] [NeZero inC] [NeZero h] [NeZero w] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Images n inC h w) (Tensor.Shape.Images n outC h w)

ResNet helper: 1×1 convolution over batched images (NCHW-style), preserving spatial size.

Instances For

def NN.API.nn.pure.blocks.conv1x1DownImages {n inC outC h w : ℕ} [NeZero n] [NeZero inC] (seedK seedB : ℕ := 0) (kInit : Runtime.Autograd.Torch.Init.Scheme := Runtime.Autograd.Torch.Init.Scheme.uniform (-0.1) 0.1) :

Sequential (Tensor.Shape.Images n inC h w) (Tensor.Shape.Images n outC (down2 h) (down2 w))

ResNet helper: 1×1 convolution over batched images (NCHW-style), downsampling via down2.

Instances For

structure NN.API.nn.pure.blocks.ResNetBasicBlock :

ResNet-style "basic block" configuration (CHW layout).

PyTorch reference (conceptual): torchvision.models.resnet.BasicBlock (see https://pytorch.org/vision/stable/models/resnet.html).

outC : ℕ
Number of output channels produced by the block.
downsample : Bool
If true, use stride-2 downsampling + projection shortcut; otherwise preserve spatial dims.
activation : Activation
Activation used inside the block (and after the residual addition).
seedBase : ℕ
Base seed used to derive deterministic per-layer seeds inside the block.

Instances For

def NN.API.nn.pure.blocks.resnetBasicBlockCHW {inC h w : ℕ} (cfg : ResNetBasicBlock) [NeZero inC] [NeZero h] [NeZero w] [NeZero cfg.outC] :

Sequential (Tensor.Shape.Image inC h w) (Tensor.Shape.Image cfg.outC (if cfg.downsample = true then down2 h else h) (if cfg.downsample = true then down2 w else w))

ResNet-style "basic block" configuration (CHW layout).

This public building block follows the standard ResNet basic-block pattern: conv3x3 -> BN -> act -> conv3x3 -> BN with a residual/skip connection.

PyTorch references (for the conceptual shape):

Torchvision ResNet: https://pytorch.org/vision/stable/models/resnet.html

Instances For

def NN.API.nn.pure.blocks.resnetBasicBlock {n inC h w : ℕ} (cfg : ResNetBasicBlock) [NeZero n] [NeZero inC] [NeZero h] [NeZero w] [NeZero cfg.outC] :

Sequential (Tensor.Shape.Images n inC h w) (Tensor.Shape.Images n cfg.outC (if cfg.downsample = true then down2 h else h) (if cfg.downsample = true then down2 w else w))

ResNet-18 style BasicBlock over batched image tensors (N×C×H×W).

Instances For

structure NN.API.nn.pure.blocks.TransformerEncoderBlock :

Config record for transformerEncoderBlock.

Separating the config as a structure makes it easier to write readable examples and keep seed management deterministic.

numHeads : ℕ
Number of attention heads.
headDim : ℕ
Per-head embedding dimension.
ffnHidden : ℕ
Hidden dimension of the feed-forward network.
activation : Activation
Activation used in the feed-forward network.
dropout? : Option Float
Optional dropout probability for examples; none means no dropout.
seedBase : ℕ
Base seed used to derive deterministic per-layer seeds inside the block.

Instances For

def NN.API.nn.pure.blocks.transformerEncoderBlockWithMask {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (cfg : TransformerEncoderBlock) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Transformer encoder block configuration.

This follows the familiar pattern: (residual MHA) -> LayerNorm -> (residual FFN) -> LayerNorm.

PyTorch analogue:

torch.nn.TransformerEncoderLayer (https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html)

Instances For

def NN.API.nn.pure.blocks.transformerEncoderBlock {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (cfg : TransformerEncoderBlock) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Transformer encoder block.

This is transformerEncoderBlockWithMask; pass mask := ... to enable causal masking (or other attention masks).

Instances For

structure NN.API.nn.pure.blocks.TransformerEncoderStack :

Config record for transformerEncoderStack.

This builds layers copies of transformerEncoderBlock, allocating seeds in a fixed stride.

layers : ℕ
Layer stack.
block : TransformerEncoderBlock
Template config for each block (its seedBase is ignored; we allocate per-layer seeds).
seedBase : ℕ
Base seed for the whole stack.
seedStride : ℕ
Seed stride between consecutive blocks (must exceed the per-block seed footprint).

Instances For

def NN.API.nn.pure.blocks.transformerStackGoWithMask {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (template : TransformerEncoderBlock) (seedBase seedStride : ℕ) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) (layerIdx remaining : ℕ) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Internal recursion for transformerEncoderStack.

Builds remaining blocks starting at layerIdx, allocating each block's seedBase as seedBase + layerIdx * seedStride.

Instances For

def NN.API.nn.pure.blocks.transformerStackGo {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (template : TransformerEncoderBlock) (seedBase seedStride layerIdx remaining : ℕ) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Internal recursion for transformerEncoderStack (unmasked).

This is transformerStackGoWithMask with mask := none.

Instances For

def NN.API.nn.pure.blocks.transformerEncoderStackWithMask {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (cfg : TransformerEncoderStack) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Stack cfg.layers copies of blocks.transformerEncoderBlock.

This is the TorchLean analogue of composing torch.nn.TransformerEncoderLayer into a torch.nn.TransformerEncoder (modulo the fact that TorchLean uses Seq composition).

Instances For

def NN.API.nn.pure.blocks.transformerEncoderStack {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (cfg : TransformerEncoderStack) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel))

Stack cfg.layers copies of blocks.transformerEncoderBlock.

This is transformerEncoderStackWithMask; pass mask := ... to enable causal masking (or other attention masks).

Instances For

def NN.API.nn.pure.blocks.transformerEncoderClassifier {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (classes : ℕ) (cfg : TransformerEncoderStack) :

Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Vec classes))

Transformer encoder followed by a flatten+linear classification head.

PyTorch analogue (roughly): nn.TransformerEncoder(...) + pooling/flattening + nn.Linear.

Instances For

def NN.API.nn.pure.heads.classifier {s : Spec.Shape} (classes : ℕ) (seedW seedB : ℕ := 0) :

Sequential s (Tensor.Shape.Vec classes)

Classification head: Flatten -> Linear.

This is a small convenience wrapper around nn.flattenLinear.

Instances For

def NN.API.nn.pure.heads.regressor {s : Spec.Shape} (outDim : ℕ := 1) (seedW seedB : ℕ := 0) :

Sequential s (Tensor.Shape.Vec outDim)

Regression head: Flatten -> Linear with outDim outputs.

Instances For

def NN.API.nn.pure.heads.classifierBatch {n : ℕ} {s : Spec.Shape} (classes : ℕ) (seedW seedB : ℕ := 0) :

Sequential (Spec.Shape.dim n s) (Tensor.Shape.Mat n classes)

Flatten(start_dim=1) -> Linear head for batched tensors.

Input: N × σ Output: Mat N classes

Instances For

def NN.API.nn.pure.heads.regressorBatch {n : ℕ} {s : Spec.Shape} (outDim : ℕ := 1) (seedW seedB : ℕ := 0) :

Sequential (Spec.Shape.dim n s) (Tensor.Shape.Mat n outDim)

Batched regression head: Flatten(start_dim=1) -> Linear(_, outDim) producing Mat N outDim.

Instances For

Optimizer configs for the high-level training helpers.

These mirror common PyTorch optimizers (by name and default hyperparameters), but they produce a TorchLean trainer config rather than a mutable optimizer object.

PyTorch references:

torch.optim: https://pytorch.org/docs/stable/optim.html

@[reducible, inline]

abbrev NN.API.optim.Optimizer :

Optimizer hyperparameter configuration for the supervised training helpers.

We keep this small for examples and lightweight trainers. It mirrors a few common PyTorch optimizers by name/defaults, but it does not try to cover the full option surface of torch.optim.*.

Instances For

@[reducible, inline]

abbrev NN.API.optim.sgd (lr : Float) :

TorchLean.Trainer.Optimizer

SGD optimizer config.

PyTorch analogue: torch.optim.SGD (https://pytorch.org/docs/stable/generated/torch.optim.SGD.html).

Instances For

@[reducible, inline]

abbrev NN.API.optim.momentumSGD (lr : Float) :

TorchLean.Trainer.Optimizer

Momentum SGD optimizer config (PyTorch-style default momentum = 0.9).

This is just sgd lr momentum with a different default.

Instances For

@[reducible, inline]

abbrev NN.API.optim.adam (lr : Float) :

TorchLean.Trainer.Optimizer

Adam optimizer config with standard defaults.

PyTorch analogue: torch.optim.Adam (https://pytorch.org/docs/stable/generated/torch.optim.Adam.html).

Instances For

@[reducible, inline]

abbrev NN.API.optim.adamw (lr : Float) :

TorchLean.Trainer.Optimizer

AdamW optimizer config with standard defaults (PyTorch-style weightDecay = 0.01).

PyTorch analogue: torch.optim.AdamW (https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html).

Instances For

@[reducible, inline]

abbrev NN.API.loss.Reduction :

Reduction mode for losses that start as elementwise tensors.

PyTorch analogy: reduction="mean" or reduction="sum".

Instances For

High-level training helpers.

This namespace is designed for executable demos: it wires together

a model (nn.Sequential)
a loss (regression or classification)
an optimizer config (API.optim)
optional LR schedules

It stays intentionally lightweight: rather than hiding everything behind a large framework, it exposes a small set of default building blocks so tutorials can focus on models and verification.

PyTorch Mapping #

These helpers correspond to the training loop code you would typically write around:

torch.optim.*
forward pass + loss
loss.backward() + optimizer step
batching via torch.utils.data.DataLoader

@[reducible, inline]

abbrev NN.API.train.Task (σ τ : Spec.Shape) :

A supervised task is just a model plus a choice of loss.

Instances For

@[reducible, inline]

abbrev NN.API.train.Runner (α : Type) [Semantics.Scalar α] [DecidableEq Spec.Shape] {σ τ : Spec.Shape} (task : TorchLean.Supervised.SeqTask σ τ) :

A fully instantiated supervised task runner.

This bundles:

the imperative ScalarModule (parameters/buffers stored in refs),
compiled predictors and loss functions for both .train and .eval modes (so switching mode is cheap),
and the current mode stored in an IO.Ref.

The mode influences both operator behavior (e.g. dropout/batchnorm) and whether buffers are updated during training.

Instances For

@[reducible, inline]

abbrev NN.API.train.Stepper (α : Type) [Semantics.Scalar α] [DecidableEq Spec.Shape] {σ τ : Spec.Shape} (task : TorchLean.Supervised.SeqTask σ τ) :

Stateful training loop object: a Runner plus an optimizer state and a step counter.

This is the TorchLean analogue of holding a PyTorch optimizer object plus the model, ready to step() on batches.

Instances For

@[reducible, inline]

abbrev NN.API.train.FitConfig :

Step-based training configuration for fit / fitDataset.

Fields:

steps: number of parameter updates,
optimizer: optimizer hyperparameters,
scheduler: optional learning-rate schedule (applied per step),
logEvery: progress printing frequency (0 disables logging).

Instances For

@[reducible, inline]

abbrev NN.API.train.LoaderFitConfig :

Epoch-based training configuration for fitLoader (data-loader training).

Fields:

epochs: number of epochs (each epoch iterates once over the loader),
optimizer: optimizer hyperparameters,
scheduler: optional learning-rate schedule (applied per step/epoch depending on helper),
logEvery: progress printing frequency (0 disables logging).

Instances For

@[reducible, inline]

abbrev NN.API.train.FitReport (α : Type) :

Small summary returned by fit* helpers.

By default, before and after are mean loss values, but the type is polymorphic so callers can report other scalars in the same shape.

Instances For

Most of API.train.* is just a public re-export of TorchLean.Trainer.*.

We use export (rather than rewriting 1-line forwarders) so this file stays small and avoids duplicating implementation details at the facade layer.

Metric Artifacts #

The public training facade also exposes TorchLean's lightweight metric artifact format. This is the local equivalent of “log scalars during a run, then inspect them later”: write a JSON TrainLog, view it with the training widgets, or adapt the JSON to an external tracker such as Weights & Biases.

structure NN.API.train.TaskRunner (σ τ : Spec.Shape) (α : Type) [Semantics.Scalar α] [DecidableEq Spec.Shape] :

A runner bundled with the task that created it.

This is an ergonomic wrapper around Runner α task: it remembers the dependent task, so tutorial code can call tr.predict x, tr.fit cfg samples, etc. without repeatedly writing (task := task).

task : Task σ τ
The supervised task: model plus loss.
runner : Runner α self.task
The instantiated runner for task.

Instances For

def NN.API.train.TaskRunner.ofRunner {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (runner : Runner α task) :

TaskRunner σ τ α

Bundle an existing runner with its task.

Instances For

def NN.API.train.TaskRunner.params {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) :

IO (Runtime.Autograd.Torch.TList α (TorchLean.Supervised.paramShapes tr.task))

Get the current model parameters from a bundled runner.

Instances For

def NN.API.train.TaskRunner.mode {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) :

IO TorchLean.NN.Mode

Read the current mode (.train or .eval) from a bundled runner.

Instances For

def NN.API.train.TaskRunner.setMode {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) (value : TorchLean.NN.Mode) :

Set the mode (.train or .eval) on a bundled runner.

Instances For

def NN.API.train.TaskRunner.trainMode {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) :

Switch a bundled runner to training mode.

Instances For

def NN.API.train.TaskRunner.evalMode {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) :

Switch a bundled runner to evaluation mode.

Instances For

def NN.API.train.TaskRunner.isTraining {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) :

Check whether a bundled runner is in training mode.

Instances For

def NN.API.train.TaskRunner.predict {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) (x : Spec.Tensor α σ) :

IO (Spec.Tensor α τ)

Predict on one input tensor using the bundled runner's active mode.

Instances For

def NN.API.train.TaskRunner.predictBatch {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (tr : TaskRunner σ τ α) (xs : List (Spec.Tensor α σ)) :

IO (List (Spec.Tensor α τ))

Predict on a list of inputs using the bundled runner's active mode.

Instances For

def NN.API.train.TaskRunner.meanLossDataset {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (tr : TaskRunner σ τ α) (dataset : Runtime.Autograd.Train.Dataset (sample.Supervised α σ τ)) :

IO α

Mean loss over an entire dataset for a bundled runner.

Instances For

def NN.API.train.TaskRunner.fit {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (tr : TaskRunner σ τ α) (cfg : FitConfig) (samples : List (sample.Supervised α σ τ)) :

IO (FitReport α)

Fit a bundled runner on an explicit list of samples for a fixed number of steps.

Instances For

def NN.API.train.TaskRunner.fitDataset {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (tr : TaskRunner σ τ α) (cfg : FitConfig) (dataset : Runtime.Autograd.Train.Dataset (sample.Supervised α σ τ)) :

IO (FitReport α)

Fit a bundled runner on a Dataset for a fixed number of steps.

Instances For

def NN.API.train.TaskRunner.fitLoader {σ τ : Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (tr : TaskRunner σ τ α) (cfg : LoaderFitConfig) (loader : Runtime.Autograd.Train.DataLoader (sample.Supervised α σ τ)) :

IO (FitReport α × Runtime.Autograd.Train.DataLoader (sample.Supervised α σ τ))

Fit a bundled runner using a DataLoader for a fixed number of epochs.

Instances For

def NN.API.train.runTask {σ τ : Spec.Shape} (task : Task σ τ) (args : List String) (k : {α : Type} → [inst : Semantics.Scalar α] → [inst_1 : DecidableEq Spec.Shape] → [ToString α] → [Runtime.Scalar α] → TaskRunner σ τ α → List String → IO Unit) :

CLI-oriented runner entry point that passes a bundled TaskRunner to the continuation.

This mirrors train.run, but removes the need to keep threading (task := task) after instantiation.

Instances For

def NN.API.train.accuracyOneHotBatched {σ : Spec.Shape} {classes batch : ℕ} {task : Task (Spec.Shape.dim batch σ) (Spec.Shape.dim batch (Tensor.Shape.Vec classes))} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (samples : List (sample.Batch α batch σ (Tensor.Shape.Vec classes))) :

IO (ℕ × ℕ)

Count correct predictions in a one-hot labeled batched dataset.

This is the minibatch analogue of accuracyOneHot: the task already has a leading dim0 batch axis, so we score each row of the batch independently and accumulate totals.

Returns (correct, total) where total = batch * numBatches.

Instances For

def NN.API.train.meanLossDataset {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (runner : Runner α task) (dataset : Runtime.Autograd.Train.Dataset (sample.Supervised α σ τ)) :

IO α

Mean loss over an entire dataset (useful for quick before/after reports).

Instances For

def NN.API.train.fit {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (runner : Runner α task) (cfg : FitConfig) (samples : List (sample.Supervised α σ τ)) :

IO (FitReport α)

Fit on an explicit list of samples for a fixed number of steps.

Instances For

def NN.API.train.fitDataset {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (runner : Runner α task) (cfg : FitConfig) (dataset : Runtime.Autograd.Train.Dataset (sample.Supervised α σ τ)) :

IO (FitReport α)

Fit on a Dataset for a fixed number of steps.

Instances For

def NN.API.train.fitLoader {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (runner : Runner α task) (cfg : LoaderFitConfig) (loader : Runtime.Autograd.Train.DataLoader (sample.Supervised α σ τ)) :

IO (FitReport α × Runtime.Autograd.Train.DataLoader (sample.Supervised α σ τ))

Fit using a DataLoader for a fixed number of epochs.

Instances For

structure NN.API.train.StepEvent (α : Type) :

Callback event fired after each training step.

epoch : ℕ
Current epoch number.
step : ℕ
Global optimizer-step counter.
loss : α
Loss reported for this step.

Instances For

structure NN.API.train.EpochEvent :

Callback event fired at the end of an epoch (how many steps ran).

epoch : ℕ
Epoch number that just completed.
steps : ℕ
Number of steps executed in the epoch.

Instances For

structure NN.API.train.Callbacks (α : Type) :

Hooks for instrumenting fitLoaderBatched-style training loops.

These are lightweight by design (IO callbacks). If you want richer logging, consider building a wrapper in your own project that translates these events into structured JSON/metrics.

onTrainStart : IO Unit
Called once before training starts.
onStep : StepEvent α → IO Unit
Called after each training step.
onEpochEnd : EpochEvent → IO Unit
Called after each epoch.
onTrainEnd : FitReport α → IO Unit
Called once after training finishes.

Instances For

def NN.API.train.Callbacks.empty {α : Type} :

No-op callbacks.

Instances For

def NN.API.train.Callbacks.append {α : Type} (a b : Callbacks α) :

Combine two callback collections by running them in sequence.

Instances For

@[implicit_reducible]

instance NN.API.train.Callbacks.instEmptyCollection {α : Type} :

EmptyCollection (Callbacks α)

∅ for callbacks: a no-op callback collection.

@[implicit_reducible]

instance NN.API.train.Callbacks.instAppend {α : Type} :

Append (Callbacks α)

Callbacks form a monoid under sequential composition.

def NN.API.train.onTrainStart {α : Type} (action : IO Unit) :

Build callbacks that run action once at the start of training.

Instances For

def NN.API.train.onStep {α : Type} (f : StepEvent α → IO Unit) :

Build callbacks that observe every training step.

Instances For

def NN.API.train.onEpochEnd {α : Type} (f : EpochEvent → IO Unit) :

Build callbacks that run at the end of each epoch.

Instances For

def NN.API.train.onTrainEnd {α : Type} (f : FitReport α → IO Unit) :

Build callbacks that run once at the end of training, with the final report.

Instances For

def NN.API.train.logLossEvery {α : Type} [ToString α] (every : ℕ := 1) :

Callback helper: log the loss every every steps (if every > 0).

Instances For

def NN.API.train.withMode {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] {β : Type} (runner : Runner α task) (value : TorchLean.NN.Mode) (action : IO β) :

IO β

Run an action with the runner temporarily switched to value mode.

This is useful for "evaluate on a validation set during training" in callback-based loops.

Instances For

def NN.API.train.meanLossModuleLoader {σ τ : Spec.Shape} {n : ℕ} {paramShapes : List Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (module : Runtime.Autograd.TorchLean.ScalarModule α paramShapes [Spec.Shape.dim n σ, Spec.Shape.dim n τ]) (loader : Data.BatchLoader α n σ τ) :

IO α

Mean loss for an already-instantiated scalar module over a typed minibatch loader.

This is the general streaming evaluation path used by the runtime examples. It is deliberately not CIFAR-specific: any supervised task whose loss module consumes [dim n σ, dim n τ] can use the same loader. The loader stores ordinary per-example samples (x : σ, y : τ); this helper asks Data.epoch for raw minibatches and calls Data.collateSupervised to build one shape-typed batch at a time.

Two details are important for larger examples:

We force shuffle := false for evaluation so before/after metrics are deterministic.
We do not call Data.BatchLoader.batchDataset, because that would materialize every collated minibatch at once. Streaming keeps the same API usable for image, sequence, and scientific ML examples where the batch tensors are much larger than small tabular datasets.

Instances For

def NN.API.train.meanLossBatchLoader {σ τ : Spec.Shape} {n : ℕ} {task : Task (Spec.Shape.dim n σ) (Spec.Shape.dim n τ)} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (runner : Runner α task) (loader : Data.BatchLoader α n σ τ) :

IO α

Mean loss over a typed minibatch loader through a train.Runner.

This is the runner-facing wrapper around meanLossModuleLoader. Use it when the example is built around train.run, task modes, and the proof-facing trainer abstraction. Use meanLossModuleLoader directly when the example has already instantiated a runtime TorchLean.Module.ScalarModule, which is the common fast path for CUDA demos.

Instances For

def NN.API.train.accuracyOneHotBatchLoader {σ : Spec.Shape} {classes batch : ℕ} {task : Task (Spec.Shape.dim batch σ) (Spec.Shape.dim batch (Tensor.Shape.Vec classes))} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (loader : Data.BatchLoader α batch σ (Tensor.Shape.Vec classes)) :

IO (ℕ × ℕ)

One-hot accuracy over a typed minibatch loader without materializing all collated batches.

Instances For

def NN.API.train.fitModuleLoaderWith {σ τ : Spec.Shape} {n : ℕ} {paramShapes : List Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (module : Runtime.Autograd.TorchLean.ScalarModule α paramShapes [Spec.Shape.dim n σ, Spec.Shape.dim n τ]) (optimizer : Runtime.Autograd.TorchLean.Optimizer α paramShapes) (epochs : ℕ) (loader : Data.BatchLoader α n σ τ) (callbacks : Callbacks α := Callbacks.empty) :

IO (FitReport α × Data.BatchLoader α n σ τ)

Train a runtime scalar module from a typed minibatch loader.

This is the shared "real epoch loop" for model examples that instantiate a module directly with TorchLean.Module.instantiateWithOptions, including CUDA runs. It mirrors the PyTorch structure:

create an optimizer state for the module parameters;
for each epoch, ask the general Data.batchLoader for shuffled raw batches;
collate each raw batch into a shape-typed (xBatch, yBatch) sample;
report the scalar loss through callbacks;
run forward/backward/optimizer.step through TorchLean.Module.stepWith.

The function is polymorphic in the input shape σ, target shape τ, batch size n, scalar type α, parameter shapes, and optimizer. It is not an image-specific helper. CNN, ResNet, ViT, MLP, sequence, operator-learning, and future model demos should all be able to use this path whenever their supervised loss module has input shapes [dim n σ, dim n τ].

Instances For

def NN.API.train.fitLoaderWith {σ τ : Spec.Shape} {n : ℕ} {task : Task (Spec.Shape.dim n σ) (Spec.Shape.dim n τ)} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (runner : Runner α task) (cfg : LoaderFitConfig) (loader : Data.BatchLoader α n σ τ) (callbacks : Callbacks α := Callbacks.empty) :

IO (FitReport α × Data.BatchLoader α n σ τ)

Train from a runner-backed loader with explicit callbacks instead of inline printing in example code.

This is the proof/trainer-facing public escape hatch for PyTorch-style custom loops:

keep the optimizer/scheduler logic in the library,
inject logging, evaluation, and probe reporting through callbacks.

This path keeps the Runner abstraction, including task modes and scheduler support. For CUDA-heavy tutorials that already have a TorchLean.Module.ScalarModule, prefer fitModuleLoaderWith; both paths consume the same general API.Data.batchLoader.

Instances For

def NN.API.train.fitLoaderBatched {σ τ : Spec.Shape} {n : ℕ} {task : Task (Spec.Shape.dim n σ) (Spec.Shape.dim n τ)} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (runner : Runner α task) (cfg : LoaderFitConfig) (loader : Runtime.Autograd.Train.DataLoader (sample.Supervised α σ τ)) :

IO (FitReport α × Runtime.Autograd.Train.DataLoader (sample.Supervised α σ τ))

Public minibatch training path.

data.batchLoader produces a typed BatchLoader (with a type-level batch size n), and this helper bridges from an untyped runtime loader into the typed training loop.

Instances For

def NN.API.train.stepper {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [Runtime.Scalar α] (runner : Runner α task) (optimizer : optim.Optimizer) (scheduler : Option TorchLean.Schedulers.Config := none) :

IO (Stepper α task)

Create a Stepper loop for a runner and optimizer (optionally with an LR scheduler).

This corresponds to the “inner training loop” state in typical PyTorch code: an optimizer state plus (optional) schedule state, ready to step on a batch.

Instances For

def NN.API.train.step {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (loop : Stepper α task) (sample : sample.Supervised α σ τ) :

IO α

Run one optimization step on a single supervised sample (one batch).

Instances For

def NN.API.train.epoch {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (loop : Stepper α task) (samples : List (sample.Supervised α σ τ)) :

IO (List α)

Run one epoch over a list of supervised samples, returning the per-step losses.

Instances For

Small Reporting Helpers (IO) #

These helpers keep tutorial code readable by factoring out common "print a loss/accuracy table" patterns. They do not affect semantics: they only call the underlying train.* functions and print human-facing summaries.

def NN.API.train.Report.reportProbes {β : Type} (title : String) (probes : List β) (lineOf : β → IO String) :

Print a titled list of probe lines.

Instances For

def NN.API.train.Report.reportMeanLoss {σ τ : Spec.Shape} {task : Task σ τ} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (runner : Runner α task) (dataset : Runtime.Autograd.Train.Dataset (sample.Supervised α σ τ)) (label : String) :

Convenience: mean loss on a dataset, printed with a label.

Instances For

def NN.API.train.Report.reportMeanLossLoader {σ τ : Spec.Shape} {batch : ℕ} {task : Task (Spec.Shape.dim batch σ) (Spec.Shape.dim batch τ)} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (runner : Runner α task) (loader : Data.BatchLoader α batch σ τ) (label : String) :

Convenience: mean loss on a typed minibatch loader, streamed batch by batch.

Instances For

def NN.API.train.Report.reportMeanLossModuleLoader {σ τ : Spec.Shape} {batch : ℕ} {paramShapes : List Spec.Shape} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] (module : Runtime.Autograd.TorchLean.ScalarModule α paramShapes [Spec.Shape.dim batch σ, Spec.Shape.dim batch τ]) (loader : Data.BatchLoader α batch σ τ) (label : String) :

Convenience: mean loss on a typed minibatch loader for an already-instantiated runtime module.

Use this in direct CUDA/runtime examples to avoid building a Runner only for logging. The data path is still the same public loader path: Data.batchLoader plus Data.collateSupervised.

Instances For

def NN.API.train.Report.reportClassProbes {σ : Spec.Shape} {classes : ℕ} {task : Task σ (Spec.Shape.dim classes Spec.Shape.scalar)} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (probes : List (String × Spec.Tensor α σ × ℕ)) (title : String := "predictions") (includeLogits : Bool := false) :

Report predicted classes on a list of named probes.

Each probe entry is (name, x, expectedClass). If includeLogits := true, also prints the raw model outputs.

Instances For

def NN.API.train.Report.reportClassProbesBatchedFromSingle {σ : Spec.Shape} {classes batch : ℕ} {task : Task (Spec.Shape.dim batch σ) (Spec.Shape.dim batch (Tensor.Shape.Vec classes))} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (probes : List (String × Spec.Tensor α σ × ℕ)) (title : String := "predictions") (includeLogits : Bool := false) :

Report predicted classes on a list of named probes, for a batched model.

This expects probes of the unbatched input shape σ and replicates each probe across the batch axis, then reports the prediction for row 0.

Instances For

def NN.API.train.Report.reportLossAccuracyOneHot {σ : Spec.Shape} {classes : ℕ} {task : Task σ (Spec.Shape.dim classes Spec.Shape.scalar)} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (dataset : Runtime.Autograd.Train.Dataset (sample.Supervised α σ (Spec.Shape.dim classes Spec.Shape.scalar))) (label : String) :

Convenience: mean loss + one-hot accuracy on a dataset, printed with a label.

Instances For

def NN.API.train.Report.reportLossAccuracyOneHotBatched {σ : Spec.Shape} {classes batch : ℕ} {task : Task (Spec.Shape.dim batch σ) (Spec.Shape.dim batch (Tensor.Shape.Vec classes))} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (dataset : Runtime.Autograd.Train.Dataset (sample.Batch α batch σ (Tensor.Shape.Vec classes))) (label : String) :

Batched variant of reportLossAccuracyOneHot.

Instances For

def NN.API.train.Report.reportLossAccuracyOneHotLoader {σ : Spec.Shape} {classes batch : ℕ} {task : Task (Spec.Shape.dim batch σ) (Spec.Shape.dim batch (Tensor.Shape.Vec classes))} {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] [ToString α] [Add α] [Div α] [Zero α] [Coe ℕ α] [LT α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (runner : Runner α task) (loader : Data.BatchLoader α batch σ (Tensor.Shape.Vec classes)) (label : String) :

Loader variant of reportLossAccuracyOneHotBatched, streaming through minibatches.

Instances For

Model Builders and Seeding #

TorchLean keeps initialization randomness explicit so examples are reproducible.

nn.* is the default seeded builder API: layer constructors allocate initialization seeds via nn.M (a deterministic seed stream).
nn.pure.* contains the explicit-seed constructors (proof/reproducibility-friendly).

Typical patterns:

Explicit seeds (best for proofs / reproducibility-sensitive code):
- build with nn.pure.linear ... (seedW := ...) (seedB := ...) etc
- compose with seq! ... / >>>
Script-style “manual seed once”:
- nn.manualSeed seed
- let seed ← nn.nextSeed
- let model := nn.run seed <| nn.sequential![ ... ]

Note: nn.Sequential lives in Type 2, so it cannot be returned directly from IO. We keep model building pure by drawing a base seed in IO and then calling nn.run.

def NN.API.nn.manualSeed (seed : ℕ) :

PyTorch-like global seeding convenience for seeded model builders.

This sets the global seed stream used by nn.runGlobal / nn.nextSeed.

Instances For

@[reducible, inline]

abbrev NN.API.nn.Embedding :

Embedding table initialization configuration (one-hot / token-distribution inputs).

This is the TorchLean-friendly analogue of torch.nn.Embedding in the common demo setting where token ids are represented as one-hot vectors (or soft token distributions), so lookup is a matrix multiplication rather than integer indexing.

Instances For

@[reducible, inline]

abbrev NN.API.nn.LearnedPositionalEmbedding :

Learned positional embedding configuration.

This is a trainable parameter tensor of shape (seqLen × embedDim) that is broadcast across the leading batch dimension and added to the input.

Instances For

@[reducible, inline]

abbrev NN.API.nn.SinusoidalPositionalEncoding :

Sinusoidal positional encoding configuration.

This is the classic (non-trainable) Transformer sinusoidal encoding, added to token embeddings. startPos is an absolute-position offset (useful for KV-cache decoding).

Instances For

@[reducible, inline]

abbrev NN.API.nn.RoPE :

Rotary positional embedding (RoPE) configuration.

startPos is an absolute-position offset (useful for KV-cache decoding).

Instances For

@[reducible, inline]

abbrev NN.API.nn.Conv2d :

Named-field Conv2d configuration (CHW layout).

This is the public, PyTorch-like entry point for convolution in TorchLean. PyTorch analogue: torch.nn.Conv2d. See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.Conv :

Named-field Conv2d configuration (CHW layout).

This is the public, PyTorch-like entry point for convolution in TorchLean. PyTorch analogue: torch.nn.Conv2d. See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.MaxPool2d :

MaxPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.MaxPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.MaxPool :

MaxPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.MaxPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.AvgPool2d :

AvgPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.AvgPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.AvgPool :

AvgPool2d configuration for CHW inputs.

PyTorch analogue: torch.nn.AvgPool2d. See https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.LayerNorm :

LayerNorm configuration for batched (batch x seqLen x embedDim) tensors.

PyTorch analogue: torch.nn.LayerNorm. See https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.RMSNorm :

RMSNorm configuration for batched (batch x seqLen x embedDim) tensors.

This is a common alternative to LayerNorm in modern transformer architectures.

Instances For

@[reducible, inline]

abbrev NN.API.nn.BatchNorm2d :

BatchNorm2d configuration (learned scale/shift).

PyTorch analogue: torch.nn.BatchNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.InstanceNorm2d :

InstanceNorm2d configuration (learned scale/shift).

PyTorch analogue: torch.nn.InstanceNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html.

Instances For

@[reducible, inline]

abbrev NN.API.nn.MultiheadAttention :

Multi-head self-attention configuration.

PyTorch analogue: torch.nn.MultiheadAttention (conceptually). See https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html.

Instances For

def NN.API.nn.globalAvgPoolCHW (c h w : ℕ) {hC : c > 0} {hH : h > 0} {hW : w > 0} :

TorchLean.NN.Seq (Tensor.Shape.CHW c h w) (Tensor.Shape.Vec c)

Global average pooling over a CHW tensor.

PyTorch analogue: torch.nn.AdaptiveAvgPool2d((1, 1)) followed by flattening.

Instances For

def NN.API.nn.globalAvgPoolNCHW (n c h w : ℕ) {hN : n > 0} {hC : c > 0} {hH : h > 0} {hW : w > 0} :

TorchLean.NN.Seq (Tensor.Shape.NCHW n c h w) (Spec.Shape.dim n (Spec.Shape.dim c Spec.Shape.scalar))

Global average pooling over an NCHW tensor (preserves the batch dimension).

Instances For

Seeded Builders (Default `nn.*`) #

For end-user code, the default nn.* layer constructors allocate initialization seeds automatically via nn.M (a deterministic seed-stream builder).

Use nn.pure.* when you want to pass explicit seeds (proof-friendly / fully reproducible).

@[reducible, inline]

abbrev NN.API.nn.M (α : Type u_1) :

Type u_1

Seeded builder monad: a state monad over API.rand.SeedStream.

Instances For

def NN.API.nn.run {α : Type 2} (seed : ℕ) (x : M α) :

α

Run a seeded builder starting from a base seed.

Instances For

def NN.API.nn.lift {α : Type 2} (x : α) :

M α

Lift a pure value into the seeded builder (consumes no seeds).

Instances For

def NN.API.nn.withSeed {α : Type 2} (k : ℕ → α) :

M α

Consume one fresh seed and pass it to k.

Instances For

def NN.API.nn.withSeeds2 {α : Type 2} (k : ℕ → ℕ → α) :

M α

Consume two fresh seeds and pass them to k (in order).

Instances For

def NN.API.nn.relu {s : Spec.Shape} :

M (Sequential s s)

Elementwise ReLU. PyTorch analogue: torch.nn.ReLU / torch.nn.functional.relu.

Instances For

def NN.API.nn.silu {s : Spec.Shape} :

M (Sequential s s)

Elementwise SiLU/Swish. PyTorch analogue: torch.nn.SiLU / torch.nn.functional.silu.

Instances For

def NN.API.nn.gelu {s : Spec.Shape} :

M (Sequential s s)

Elementwise GELU. PyTorch analogue: torch.nn.GELU / torch.nn.functional.gelu.

Instances For

def NN.API.nn.sigmoid {s : Spec.Shape} :

M (Sequential s s)

Elementwise sigmoid. PyTorch analogue: torch.nn.Sigmoid / torch.nn.functional.sigmoid.

Instances For

def NN.API.nn.tanh {s : Spec.Shape} :

M (Sequential s s)

Elementwise tanh. PyTorch analogue: torch.nn.Tanh / torch.nn.functional.tanh.

Instances For

def NN.API.nn.softmax {s : Spec.Shape} :

M (Sequential s s)

Softmax. PyTorch analogue: torch.nn.Softmax / torch.nn.functional.softmax.

Instances For

def NN.API.nn.sum {s : Spec.Shape} :

M (Sequential s Spec.Shape.scalar)

Reduce-sum to a scalar. PyTorch analogue: torch.sum.

Instances For

def NN.API.nn.flatten {s : Spec.Shape} :

M (Sequential s (Spec.Shape.dim s.size Spec.Shape.scalar))

Flatten any tensor into a 1D vector of length size s. PyTorch analogue: torch.flatten.

Instances For

def NN.API.nn.flattenBatch {n : ℕ} {s : Spec.Shape} :

M (Sequential (Spec.Shape.dim n s) (Tensor.Shape.Mat n s.size))

Flatten a batched tensor N × σ into a matrix N × (size σ).

PyTorch analogue: torch.flatten(x, start_dim=1).

Instances For

def NN.API.nn.flattenStart1 {n : ℕ} {s : Spec.Shape} :

M (Sequential (Spec.Shape.dim n s) (Tensor.Shape.Mat n s.size))

Flatten a batched tensor starting at dimension 1 (keep dim0).

Synonym for flattenBatch, matching PyTorch’s start_dim=1 wording.

Instances For

def NN.API.nn.maxPool2d {n inC inH inW : ℕ} (cfg : MaxPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

M (Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1)))

MaxPool2d using NeZero to hide nonzero kernel proofs.

Instances For

def NN.API.nn.maxPool {n inC inH inW : ℕ} (cfg : MaxPool) [NeZero cfg.kH] [NeZero cfg.kW] :

M (Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1)))

Max pooling over batched CHW images, allocating any required initialization seeds automatically.

Shorthand for maxPool2d.

Instances For

def NN.API.nn.avgPool2d {n inC inH inW : ℕ} (cfg : AvgPool2d) [NeZero cfg.kH] [NeZero cfg.kW] :

M (Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1)))

AvgPool2d over batched NCHW inputs (shape N×C×H×W, like PyTorch).

Instances For

def NN.API.nn.avgPool {n inC inH inW : ℕ} (cfg : AvgPool) [NeZero cfg.kH] [NeZero cfg.kW] :

M (Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n inC ((inH - cfg.kH) / cfg.stride + 1) ((inW - cfg.kW) / cfg.stride + 1)))

Average pooling over batched CHW images, allocating any required initialization seeds automatically.

Shorthand for avgPool2d.

Instances For

def NN.API.nn.linear (inDim outDim : ℕ) (pfx : Spec.Shape := Spec.Shape.scalar) :

M (Sequential (pfx.appendDim inDim) (pfx.appendDim outDim))

Linear layer on the last axis (prefix-shape preserving).

PyTorch analogue: torch.nn.Linear. See https://pytorch.org/docs/stable/generated/torch.nn.Linear.html.

Unlike the lower-level TorchLean layer constructor (which is vector-only), this public facade matches PyTorch’s convention:

if x has shape [..., inDim], linear inDim outDim returns a model of shape [..., outDim].

The leading “prefix” dimensions are treated as a batch (they are flattened to (numel(prefix), inDim), the affine map is applied once, and the result is reshaped back).

Instances For

def NN.API.nn.linearV (inDim outDim : ℕ) :

M (Sequential (Tensor.Shape.Vec inDim) (Tensor.Shape.Vec outDim))

Vector-only linear layer alias.

This is shorthand for nn.linear inDim outDim at scalar prefix shape, so examples do not need to mention pfx := Spec.Shape.scalar.

Instances For

def NN.API.nn.rnn (seqLen inputSize hiddenSize : ℕ) :

M (Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize))

Vanilla RNN layer (time-major sequence, no batch axis).

Semantics: h_t = tanh(W [x_t; h_{t-1}] + b), with h_{-1} = 0.

This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both CPU and CUDA backends.

PyTorch analogy: torch.nn.RNN(inputSize, hiddenSize, nonlinearity="tanh") with batch_first=false, specialized to a single batch element.

Instances For

def NN.API.nn.gru (seqLen inputSize hiddenSize : ℕ) :

M (Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize))

GRU layer (time-major sequence, no batch axis).

This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both CPU and CUDA backends.

PyTorch analogy: torch.nn.GRU(inputSize, hiddenSize) with batch_first=false, specialized to a single batch element.

Instances For

def NN.API.nn.mamba (seqLen inputSize hiddenSize : ℕ) :

M (Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize))

Trainable Mamba-style gated diagonal state-space layer.

The layer is time-major and single-batch, matching the simple rnn/gru/lstm constructors: input (seqLen × inputSize), output (seqLen × hiddenSize). It is unrolled with differentiable TorchLean ops, so CPU and CUDA training use the same API.

Instances For

def NN.API.nn.lstm (seqLen inputSize hiddenSize : ℕ) :

M (Sequential (Tensor.Shape.Mat seqLen inputSize) (Tensor.Shape.Mat seqLen hiddenSize))

LSTM layer (time-major sequence, no batch axis).

This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both CPU and CUDA backends.

PyTorch analogy: torch.nn.LSTM(inputSize, hiddenSize) with batch_first=false, specialized to a single batch element.

Instances For

def NN.API.nn.conv2d {n inC inH inW : ℕ} (cfg : Conv2d) [NeZero inC] [NeZero cfg.kH] [NeZero cfg.kW] :

M (Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1)))

2D convolution over a batched image tensor (shape N×C×H×W, like PyTorch).

Instances For

def NN.API.nn.conv {n inC inH inW : ℕ} (cfg : Conv) [NeZero inC] [NeZero cfg.kH] [NeZero cfg.kW] :

M (Sequential (Tensor.Shape.Images n inC inH inW) (Tensor.Shape.Images n cfg.outC ((inH + 2 * cfg.padding - cfg.kH) / cfg.stride + 1) ((inW + 2 * cfg.padding - cfg.kW) / cfg.stride + 1)))

Convolution over batched CHW images, allocating initialization seeds automatically.

Shorthand for conv2d.

Instances For

def NN.API.nn.batchNorm2d {n c h w : ℕ} (cfg : BatchNorm2d := { }) [NeZero n] [NeZero c] [NeZero h] [NeZero w] :

M (Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w))

BatchNorm2d over NCHW inputs, using NeZero to hide the positivity proofs.

PyTorch analogue: torch.nn.BatchNorm2d. See https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html.

Instances For

def NN.API.nn.batchNorm {n c h w : ℕ} (cfg : BatchNorm2d := { }) [NeZero n] [NeZero c] [NeZero h] [NeZero w] :

M (Sequential (Tensor.Shape.Images n c h w) (Tensor.Shape.Images n c h w))

BatchNorm over batched CHW images, allocating initialization seeds automatically.

Shorthand for batchNorm2d.

Instances For

def NN.API.nn.embedding (vocab embedDim : ℕ) (cfg : Embedding := { }) {pfx : Spec.Shape} :

M (Sequential (pfx.appendDim vocab) (pfx.appendDim embedDim))

Embedding layer for one-hot / token-distribution inputs (no bias).

Input shape: [..., vocab] Output shape: [..., embedDim]

PyTorch analogue: conceptually nn.Embedding(vocab, embedDim) but applied to one-hot inputs.

Instances For

def NN.API.nn.sinusoidalPositionalEncoding {batch seqLen embedDim : ℕ} (cfg : SinusoidalPositionalEncoding := { }) :

M (Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)))

Add sinusoidal positional encodings to a batched (batch × seqLen × embedDim) tensor.

Implementation:

precompute PE : (seqLen × embedDim) at initialization time (stored as a non-trainable buffer),
broadcast it across the leading batch axis and add to the input.

Instances For

def NN.API.nn.rope {batch numHeads seqLen headDim : ℕ} (cfg : RoPE := { }) :

M (Sequential (Spec.Shape.dim batch (Spec.Shape.dim numHeads (Tensor.Shape.Mat seqLen headDim))) (Spec.Shape.dim batch (Spec.Shape.dim numHeads (Tensor.Shape.Mat seqLen headDim))))

Apply RoPE to a batched multi-head tensor (batch × numHeads × seqLen × headDim).

This matches the standard identity:

rope(x) = x * cos + rotatePairs(x) * sin

where cos/sin depend only on (pos, dim) and broadcast across (batch, numHeads).

Notes:

This layer is differentiable (gradients flow through the rotation), but it has no trainable parameters; the precomputed cos/sin tables are stored as non-trainable buffers.
The pure spec version is in NN.Spec.Layers.PositionalEncoding (Spec.rope_apply_heads_spec).

Instances For

def NN.API.nn.learnedPositionalEmbedding {batch seqLen embedDim : ℕ} (cfg : LearnedPositionalEmbedding := { }) :

M (Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)))

Add learned positional embeddings to a batched (batch × seqLen × embedDim) tensor.

PyTorch analogue: x + pos[:seqLen] where pos is a parameter table.

Instances For

def NN.API.nn.layerNorm {batch seqLen embedDim : ℕ} [NeZero seqLen] [NeZero embedDim] :

M (Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)) (Spec.Shape.dim batch (Tensor.Shape.Mat seqLen embedDim)))

Layer normalization over (batch × seqLen × embedDim) tensors.

This normalizes each embedDim-vector (per batch element, per sequence position), and applies learned affine parameters gamma and beta.

PyTorch analogue: torch.nn.LayerNorm(embedDim) on a tensor shaped (batch, seqLen, embedDim).

Implementation note: TorchLean uses NeZero to ensure seqLen and embedDim are positive, avoiding degenerate shapes.

Instances For

def NN.API.nn.multiheadAttention {batch n dModel : ℕ} [NeZero n] (cfg : MultiheadAttention) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

M (Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)))

Multi-head self-attention using NeZero to hide the nonzero sequence length proof.

If mask is provided, it is a boolean attention mask of shape (n × n) (e.g. causal masking).

Instances For

def NN.API.nn.transformerEncoderBlock {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (cfg : blocks.TransformerEncoderBlock) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

M (Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)))

Transformer encoder block.

This is transformerEncoderBlockWithMask; pass mask := ... to enable causal masking (or other attention masks).

Instances For

def NN.API.nn.transformerEncoderStack {batch n dModel : ℕ} [NeZero n] [NeZero dModel] (cfg : blocks.TransformerEncoderStack) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

M (Sequential (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)) (Spec.Shape.dim batch (Tensor.Shape.Mat n dModel)))

Stack cfg.layers copies of blocks.transformerEncoderBlock.

This is transformerEncoderStackWithMask; pass mask := ... to enable causal masking (or other attention masks).

Instances For

def NN.API.nn.resnetBasicBlock {n inC h w : ℕ} (cfg : blocks.ResNetBasicBlock) [NeZero n] [NeZero inC] [NeZero h] [NeZero w] [NeZero cfg.outC] :

M (Sequential (Tensor.Shape.Images n inC h w) (Tensor.Shape.Images n cfg.outC (if cfg.downsample = true then blocks.down2 h else h) (if cfg.downsample = true then blocks.down2 w else w)))

ResNet-18 style BasicBlock over batched image tensors (N×C×H×W).

Instances For

def NN.API.nn.dropout {s : Spec.Shape} (p : Float) :

M (Sequential s s)

Dropout layer (active in train mode, identity in eval mode).

PyTorch analogue: torch.nn.Dropout.

Instances For

def NN.API.nn.runGlobal {α : Type} (x : M α) :

IO α

Run a seeded builder using the global seed stream set by nn.manualSeed (results in Type).

Note: model values like nn.Sequential live in Type 2, so they cannot be returned from IO. For models, use nn.run with an explicit base seed (obtained from nn.nextSeed).

Instances For

def NN.API.nn.nextSeed :

Draw a fresh base seed from the global seed stream set by nn.manualSeed.

Instances For

def NN.API.nn.nextSeeds (n : ℕ) :

Draw n fresh base seeds from the global seed stream.

Instances For

Naming Convenience #

nn.run / nn.nextSeed are the core primitives, but in user code it is often clearer to read:

“build a model from this seed” (nn.build)
“draw a fresh init seed” (nn.freshSeed)
“build a model using the next global init seed” (nn.withModel)

@[reducible, inline]

abbrev NN.API.nn.build {α : Type 2} (seed : ℕ) (x : M α) :

α

Alias for nn.run (PyTorch-style wording: build/init a model from a base seed).

Instances For

@[reducible, inline]

abbrev NN.API.nn.freshSeed :

Alias for nn.nextSeed (draw a fresh base seed from the global seed stream).

Instances For

def NN.API.nn.withModel {σ τ : Spec.Shape} {β : Type} (mk : M (Sequential σ τ)) (k : Sequential σ τ → IO β) :

IO β

Build a model using the next global seed, then run a continuation.

Why this exists: nn.Sequential lives in Type 2, so we can't directly return a model from IO. This helper keeps model construction pure while letting executable code avoid repeating the nextSeed/run pattern.

Instances For

Autograd helpers (grad/vjp/jacobian) over TorchLean programs.

This namespace is conceptually similar to PyTorch autograd + functorch/torch.func:

gradients of losses w.r.t. parameters and inputs
VJPs and Jacobians for analysis and verification tooling

PyTorch references:

Autograd: https://pytorch.org/docs/stable/autograd.html
torch.func (jacfwd/jacrev, etc.): https://pytorch.org/docs/stable/func.html

@[reducible, inline]

abbrev NN.API.autograd.model.Params {σ τ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (α : Type) :

Parameter list type for a given model (a TList over Seq.paramShapes).

Instances For

@[reducible, inline]

abbrev NN.API.autograd.model.OutputLoss (τ υ : Spec.Shape) :

Loss function over a model output and a target.

This is expressed in terms of RefTy so it works uniformly for eager execution and compiled execution.

Instances For

@[reducible, inline]

abbrev NN.API.autograd.model.linearParams {α : Type} {inDim outDim seedW seedB : ℕ} (w : Spec.Tensor α (Tensor.Shape.Mat outDim inDim)) (b : Spec.Tensor α (Tensor.Shape.Vec outDim)) :

Params (TorchLean.Layers.linear inDim outDim seedW seedB) α

Pack explicit weight and bias tensors for a single Layers.linear model.

Instances For

@[reducible, inline]

abbrev NN.API.autograd.model.OutputLoss.mse {τ : Spec.Shape} (reduction : TorchLean.Loss.Reduction := Runtime.Autograd.TorchLean.Loss.Reduction.mean) :

OutputLoss τ τ

Mean-squared error loss (mse) between yhat and y.

Instances For

@[reducible, inline]

abbrev NN.API.autograd.model.OutputLoss.crossEntropyOneHot {τ : Spec.Shape} (reduction : TorchLean.Loss.Reduction := Runtime.Autograd.TorchLean.Loss.Reduction.mean) :

OutputLoss τ τ

Cross-entropy loss between logits and one-hot targets. PyTorch analogue: nn.CrossEntropyLoss.

Instances For

@[reducible, inline]

abbrev NN.API.autograd.model.OutputLoss.detach {τ υ : Spec.Shape} (loss : OutputLoss τ υ) :

OutputLoss τ υ

Detach the model output before feeding it into a loss.

This is useful when you want to compute a metric loss without backpropagating through it.

Instances For

def NN.API.autograd.model.gradParams {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (TorchLean.Autodiff.Model.Params model α)

Gradient of a model-loss w.r.t. the model parameters.

This is the common training use case (PyTorch analogue: loss.backward() followed by parameter updates).

Instances For

def NN.API.autograd.model.gradInputs {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (Runtime.Autograd.Torch.TList α [σ, υ])

Gradient of the loss w.r.t. the inputs (x and target).

Instances For

def NN.API.autograd.model.gradX {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (Spec.Tensor α σ)

Convenience: gradient of the loss w.r.t. x.

Instances For

def NN.API.autograd.model.gradTarget {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (Spec.Tensor α υ)

Convenience: gradient of the loss w.r.t. the target argument.

Instances For

structure NN.API.autograd.model.ValueAndGrads {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (α : Type) :

Forward+backward result for a scalar loss built from a model output.

PyTorch comparison: this is the "compute loss + backward" payload, but with shapes tracked.

value : Spec.Tensor α Spec.Shape.scalar
Value at the current point.
dparams : TorchLean.Autodiff.Model.Params model α
Gradients w.r.t. parameters.
dx : Spec.Tensor α σ
Gradient w.r.t. input.
dtarget : Spec.Tensor α υ
Gradient w.r.t. target.

Instances For

def NN.API.autograd.model.valueAndGrads {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (ValueAndGrads model α)

Run loss(model(params, x), target) and compute gradients w.r.t:

model parameters,
x,
target.

This hides the CompiledScalar/argument-pack boilerplate for the common "one sample" case.

Instances For

def NN.API.autograd.model.valueAndGradParams {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (Spec.Tensor α Spec.Shape.scalar × TorchLean.Autodiff.Model.Params model α)

Return just (loss_value, grad_params).

Instances For

def NN.API.autograd.model.valueAndGradParamsScalar {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (α × TorchLean.Autodiff.Model.Params model α)

valueAndGradParams, but convert the 0-dim loss tensor to a scalar α.

Instances For

def NN.API.autograd.model.valueAndGradX {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (Spec.Tensor α Spec.Shape.scalar × Spec.Tensor α σ)

Return (loss_value, grad_x).

Instances For

def NN.API.autograd.model.valueAndGradTarget {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) :

IO (Spec.Tensor α Spec.Shape.scalar × Spec.Tensor α υ)

Return (loss_value, grad_target).

Instances For

def NN.API.autograd.model.vjpParams {σ τ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (seedOut : Spec.Tensor α τ) :

IO (TorchLean.Autodiff.Model.Params model α)

Vector-Jacobian product (VJP) w.r.t. model parameters.

This is the "grad of outputs back into parameters" primitive. It is useful for custom losses or analysis tooling when you already have a seed tensor seedOut : τ.

Instances For

def NN.API.autograd.model.vjpInputs {σ τ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (seedOut : Spec.Tensor α τ) :

IO (Runtime.Autograd.Torch.TList α [σ])

VJP w.r.t. the model input.

This returns a one-element TList to match the general "inputs list" API shape. For the common case, use vjpInput to get the tensor directly.

Instances For

def NN.API.autograd.model.vjpInput {σ τ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (seedOut : Spec.Tensor α τ) :

IO (Spec.Tensor α σ)

Convenience wrapper: unwrap vjpInputs to return just dx.

Instances For

def NN.API.autograd.model.jacrevParams {σ τ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) :

IO (Array (TorchLean.Autodiff.Model.Params model α))

Reverse-mode Jacobian (jacrev) of the model output w.r.t. parameters.

Returns an array of parameter-structured gradients: one entry per output coordinate. This mirrors the usual "jacrev returns a stack of per-output gradients" shape.

Instances For

def NN.API.autograd.model.jvpParams {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) (vparams : TorchLean.Autodiff.Model.Params model α) :

IO α

Jacobian-vector product (JVP) of a scalar loss w.r.t. parameters.

This is the directional derivative in the direction vparams. Conceptually: d/dt loss(params + t*vparams, x, target) | t = 0.

Instances For

def NN.API.autograd.model.hvpParams {σ τ υ : Spec.Shape} (model : TorchLean.NN.Seq σ τ) (loss : TorchLean.Autodiff.Model.OutputLoss τ υ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (params : TorchLean.Autodiff.Model.Params model α) (x : Spec.Tensor α σ) (target : Spec.Tensor α υ) (vparams : TorchLean.Autodiff.Model.Params model α) :

IO (TorchLean.Autodiff.Model.Params model α)

Hessian-vector product (HVP) of a scalar loss w.r.t. parameters.

Returns a parameter-structured tensor list of the same shape as params.

Instances For

In PyTorch terms, this is the "functorch" style: differentiate plain functions, not modules.

@[reducible, inline]

abbrev NN.API.autograd.fn1.Fn (σ τ : Spec.Shape) :

Type of a pure tensor function expressed in RefTy form.

This matches the calling convention expected by TorchLean.Program/autodiff compilation.

Instances For

def NN.API.autograd.fn1.jacfwd {σ τ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) :

IO (Array (Spec.Tensor α τ))

Forward-mode Jacobian (jacfwd) for a pure tensor function.

Instances For

def NN.API.autograd.fn1.hessian {σ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ Spec.Shape.scalar) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) :

IO (Array (Spec.Tensor α σ))

Hessian for a scalar-valued function.

Instances For

def NN.API.autograd.fn1.vjp {σ τ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) (seedOut : Spec.Tensor α τ) :

IO (Spec.Tensor α σ)

Vector-Jacobian product (VJP) for a pure function.

Instances For

def NN.API.autograd.fn1.jacrev {σ τ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ τ) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) :

IO (Array (Spec.Tensor α σ))

Reverse-mode Jacobian (jacrev) of a pure tensor function.

Returns the Jacobian rows as an array of doutput/dinput tensors.

Instances For

def NN.API.autograd.fn1.grad {σ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ Spec.Shape.scalar) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) :

IO (Spec.Tensor α σ)

Gradient of a scalar-valued function w.r.t. its input.

Instances For

def NN.API.autograd.fn1.valueAndGrad {σ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ Spec.Shape.scalar) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) :

IO (Spec.Tensor α Spec.Shape.scalar × Spec.Tensor α σ)

Return (value, grad) for a scalar-valued function at x.

Instances For

def NN.API.autograd.fn1.valueAndGradScalar {σ : Spec.Shape} (f : TorchLean.Autodiff.Function1.Fn σ Spec.Shape.scalar) {α : Type} [Semantics.Scalar α] [DecidableEq Spec.Shape] (x : Spec.Tensor α σ) :

IO (α × Spec.Tensor α σ)

valueAndGrad, but convert the 0-dim value tensor to a scalar α.

Instances For