Seeded model builders #
This module reopens NN.API.nn with the PyTorch-style seeded builder API. It sits on top of the
pure builders from NN.API.Public.NN and allocates deterministic initialization seeds for users.
Model Builders and Seeding #
TorchLean keeps initialization randomness explicit so examples are reproducible.
nn.*is the default seeded builder API: layer constructors allocate initialization seeds viann.M(a deterministic seed stream).nn.pure.*contains the explicit-seed constructors (proof/reproducibility-friendly).
Typical patterns:
Explicit seeds (best for proofs / reproducibility-sensitive code):
- build with
nn.pure.linear ... (seedW := ...) (seedB := ...)etc - compose with
seq! .../>>>
- build with
Script-style “manual seed once”:
nn.manualSeed seedlet seed ← nn.nextSeedlet model := nn.run seed <| nn.Sequential![ ... ]
Note: nn.Sequential lives in Type 2, so it cannot be returned directly from IO. We keep
model building pure by drawing a base seed in IO and then calling nn.run.
PyTorch-like global seeding convenience for seeded model builders.
This sets the global seed stream used by nn.runGlobal / nn.nextSeed.
Instances For
Embedding table initialization configuration (one-hot / token-distribution inputs).
This is the TorchLean-friendly analogue of torch.nn.Embedding in the common setting where
token ids are represented as one-hot vectors (or soft token distributions), so lookup is a matrix
multiplication rather than integer indexing.
Instances For
Learned positional embedding configuration.
This is a trainable parameter tensor of shape (seqLen × embedDim) that is broadcast across the
leading batch dimension and added to the input.
Instances For
Sinusoidal positional encoding configuration.
This is the classic (non-trainable) Transformer sinusoidal encoding, added to token embeddings.
startPos is an absolute-position offset (useful for KV-cache decoding).
Instances For
Rotary positional embedding (RoPE) configuration.
startPos is an absolute-position offset (useful for KV-cache decoding).
Instances For
Named-field Conv2d configuration (CHW layout).
This is the public, PyTorch-like entry point for convolution in TorchLean.
PyTorch analogue: torch.nn.Conv2d.
See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.
Instances For
Named-field Conv2d configuration (CHW layout).
This is the public, PyTorch-like entry point for convolution in TorchLean.
PyTorch analogue: torch.nn.Conv2d.
See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.
Instances For
MaxPool2d configuration for CHW inputs.
PyTorch analogue: torch.nn.MaxPool2d.
See https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.
Instances For
MaxPool2d configuration for CHW inputs.
PyTorch analogue: torch.nn.MaxPool2d.
See https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.
Instances For
AvgPool2d configuration for CHW inputs.
PyTorch analogue: torch.nn.AvgPool2d.
See https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html.
Instances For
AvgPool2d configuration for CHW inputs.
PyTorch analogue: torch.nn.AvgPool2d.
See https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html.
Instances For
LayerNorm configuration for batched (batch x seqLen x embedDim) tensors.
PyTorch analogue: torch.nn.LayerNorm.
See https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html.
Instances For
RMSNorm configuration for batched (batch x seqLen x embedDim) tensors.
This is a common alternative to LayerNorm in modern transformer architectures.
Instances For
BatchNorm2d configuration (learned scale/shift).
PyTorch analogue: torch.nn.BatchNorm2d.
See https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html.
Instances For
InstanceNorm2d configuration (learned scale/shift).
PyTorch analogue: torch.nn.InstanceNorm2d.
See https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html.
Instances For
Multi-head self-attention configuration.
PyTorch analogue: torch.nn.MultiheadAttention (conceptually).
See https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html.
Instances For
Global average pooling over a CHW tensor.
PyTorch analogue: torch.nn.AdaptiveAvgPool2d((1, 1)) followed by flattening.
Instances For
Global average pooling over an NCHW tensor (preserves the batch dimension).
Instances For
Seeded Builders (Default nn.*) #
For end-user code, the default nn.* layer constructors allocate initialization seeds
automatically via nn.M (a deterministic seed-stream builder).
Use nn.pure.* when you want to pass explicit seeds (proof-friendly / fully reproducible).
Seeded builder monad: a state monad over API.rand.SeedStream.
Instances For
Run a seeded builder starting from a base seed.
Instances For
Lift a pure value into the seeded builder (consumes no seeds).
Instances For
Consume one fresh seed and pass it to k.
Instances For
Reduce-sum to a scalar. PyTorch analogue: torch.sum.
Instances For
Flatten any tensor into a 1D vector of length size s. PyTorch analogue: torch.flatten.
Instances For
Flatten any tensor into a 1D vector of length size s. PyTorch analogue: torch.flatten.
Instances For
Flatten a batched tensor N × σ into a matrix N × (size σ).
PyTorch analogue: torch.flatten(x, start_dim=1).
Instances For
Flatten a batched tensor N × σ into a matrix N × (size σ).
PyTorch analogue: torch.flatten(x, start_dim=1).
Instances For
Flatten a batched tensor starting at dimension 1 (keep dim0).
Synonym for flattenBatch, matching PyTorch’s start_dim=1 wording.
Instances For
MaxPool2d using NeZero to hide nonzero kernel proofs.
Instances For
Max pooling over batched CHW images, allocating any required initialization seeds automatically.
Shorthand for maxPool2d.
Instances For
AvgPool2d over batched NCHW inputs (shape N×C×H×W, like PyTorch).
Instances For
Average pooling over batched CHW images, allocating any required initialization seeds automatically.
Shorthand for avgPool2d.
Instances For
Linear layer on the last axis (prefix-shape preserving).
PyTorch analogue: torch.nn.Linear.
See https://pytorch.org/docs/stable/generated/torch.nn.Linear.html.
Unlike the runtime TorchLean layer constructor (which is vector-only), this public layer constructor follows PyTorch’s convention:
- if
xhas shape[..., inDim],linear inDim outDimreturns a model of shape[..., outDim].
The leading “prefix” dimensions are treated as a batch (they are flattened to (numel(prefix), inDim),
the affine map is applied once, and the result is reshaped back).
Instances For
Linear layer on the last axis (prefix-shape preserving).
PyTorch analogue: torch.nn.Linear.
See https://pytorch.org/docs/stable/generated/torch.nn.Linear.html.
Unlike the runtime TorchLean layer constructor (which is vector-only), this public layer constructor follows PyTorch’s convention:
- if
xhas shape[..., inDim],linear inDim outDimreturns a model of shape[..., outDim].
The leading “prefix” dimensions are treated as a batch (they are flattened to (numel(prefix), inDim),
the affine map is applied once, and the result is reshaped back).
Instances For
Vector-only linear layer, specialized to the scalar prefix shape.
Instances For
Vector-only linear layer, specialized to the scalar prefix shape.
Instances For
Vanilla RNN layer (time-major sequence, no batch axis).
Semantics:
h_t = tanh(W [x_t; h_{t-1}] + b), with h_{-1} = 0.
This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both
CPU and CUDA backends.
PyTorch analogy: torch.nn.RNN(inputSize, hiddenSize, nonlinearity="tanh") with
batch_first=false, specialized to a single batch element.
Instances For
GRU layer (time-major sequence, no batch axis).
This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both
CPU and CUDA backends.
PyTorch analogy: torch.nn.GRU(inputSize, hiddenSize) with batch_first=false, specialized to a
single batch element.
Instances For
Trainable Mamba-style gated diagonal state-space layer.
The layer is time-major and single-batch, matching the simple rnn/gru/lstm constructors:
input (seqLen × inputSize), output (seqLen × hiddenSize). It is unrolled with differentiable
TorchLean ops, so CPU and CUDA training use the same API.
Instances For
LSTM layer (time-major sequence, no batch axis).
This is implemented by unrolling seqLen steps using existing TorchLean ops, so it runs on both
CPU and CUDA backends.
PyTorch analogy: torch.nn.LSTM(inputSize, hiddenSize) with batch_first=false, specialized to a
single batch element.
Instances For
BatchNorm2d over NCHW inputs, using NeZero to hide the positivity proofs.
PyTorch analogue: torch.nn.BatchNorm2d.
See https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html.
Instances For
BatchNorm over batched CHW images, allocating initialization seeds automatically.
Shorthand for batchNorm2d.
Instances For
Embedding layer for one-hot / token-distribution inputs (no bias).
Input shape: [..., vocab]
Output shape: [..., embedDim]
PyTorch analogue: conceptually nn.Embedding(vocab, embedDim) but applied to one-hot inputs.
Instances For
Add sinusoidal positional encodings to a batched (batch × seqLen × embedDim) tensor.
Implementation:
- precompute
PE : (seqLen × embedDim)at initialization time (stored as a non-trainable buffer), - broadcast it across the leading
batchaxis and add to the input.
Instances For
Apply RoPE to a batched multi-head tensor (batch × numHeads × seqLen × headDim).
This matches the standard identity:
rope(x) = x * cos + rotatePairs(x) * sin
where cos/sin depend only on (pos, dim) and broadcast across (batch, numHeads).
Notes:
- This layer is differentiable (gradients flow through the rotation), but it has no trainable
parameters; the precomputed
cos/sintables are stored as non-trainable buffers. - The pure spec version is in
NN.Spec.Layers.PositionalEncoding(Spec.rope_apply_heads_spec).
Instances For
Add learned positional embeddings to a batched (batch × seqLen × embedDim) tensor.
PyTorch analogue: x + pos[:seqLen] where pos is a parameter table.
Instances For
Layer normalization over (batch × seqLen × embedDim) tensors.
This normalizes each embedDim-vector (per batch element, per sequence position), and applies
learned affine parameters gamma and beta.
PyTorch analogue: torch.nn.LayerNorm(embedDim) on a tensor shaped (batch, seqLen, embedDim).
Implementation note:
TorchLean uses NeZero to ensure seqLen and embedDim are positive, avoiding degenerate shapes.
Instances For
Multi-head self-attention using NeZero to hide the nonzero sequence length proof.
If mask is provided, it is a boolean attention mask of shape (n × n) (e.g. causal masking).
Instances For
Transformer encoder block.
This is transformerEncoderBlockWithMask; pass mask := ... to enable causal masking (or other
attention masks).
Instances For
Stack cfg.layers copies of blocks.transformerEncoderBlock.
This is transformerEncoderStackWithMask; pass mask := ... to enable causal masking (or other
attention masks).
Instances For
ResNet-18 style BasicBlock over batched image tensors (N×C×H×W).
Instances For
Dropout layer (active in train mode, identity in eval mode).
PyTorch analogue: torch.nn.Dropout.
Instances For
Run a seeded builder using the global seed stream set by nn.manualSeed (results in Type).
Note: model values like nn.Sequential live in Type 2, so they cannot be returned from IO.
For models, use nn.run with an explicit base seed (obtained from nn.nextSeed).
Instances For
Draw a fresh base seed from the global seed stream set by nn.manualSeed.
Instances For
Naming Convenience #
nn.run / nn.nextSeed are the core primitives, but in user code it is often clearer to read:
- “build a model from this seed” (
nn.run) - “draw a fresh init seed” (
nn.freshSeed) - “build a model using the next global init seed” (
nn.withModel)
Build a model using the next global seed, then run a continuation.
nn.Sequential lives in Type 2, so executable code passes the model to a continuation rather than
returning it directly from IO.