Sequential GraphSpec #
This file defines the sequential authoring surface for GraphSpec.
The important design decision is:
DAG.Modelis the canonical general GraphSpec model representation.Graph ps σ τis lightweight syntax for the common special case where the model is just a chain of layers.
So Graph is not a competing graph IR. It is a pleasant way to write:
Linear >>> ReLU >>> Linear
Conv >>> ReLU >>> Pool >>> Flatten >>> Linear
and then lower that chain to the general DAG representation when downstream tooling wants one model shape for everything.
GraphSpec as a whole is a typed DSL for describing neural-network computations, with the explicit goal of being usable in two complementary ways:
- Reference / proof semantics: interpret the graph as a pure Lean function on tensors
(
Interp.spec). This is the semantics we want to reason about: shape safety, algebraic identities, equivalence of model refactorings, etc. - Executable semantics: compile the same graph into a backend-generic
TorchLean.Program(Compile.torchProgram) so it can run on the TorchLean runtime (which can target eager or compiled execution backends).
Shapes and parameter shapes are part of the graph type. Concretely, a graph is indexed by:
σ τ : Shape— input/output tensor shapes, andps : List Shape— the ordered list of parameter tensor shapes the graph expects.
That “parameter interface” is not a convention (like “whatever state_dict() happens to return”);
it is baked into the model type. Sequential composition concatenates parameter lists, and
evaluation splits them canonically.
Why do this if PyTorch already exists? #
PyTorch is excellent at running and training neural networks:
nn.Modulepackages parameters and aforwardmethod.- Autograd records an operation tape during execution and provides gradients.
- Modern tooling can capture/transform graphs (
torch.fx,torch.export) and compile them (torch.compile).
GraphSpec is not trying to replace any of that. Instead, it focuses on the pieces PyTorch does not give us inside Lean:
- A machine-checkable mathematical semantics we can use for proofs.
- Static shape discipline: shapes appear in the type, not as runtime asserts.
- A typed parameter interface: parameter shapes and ordering are explicit, so “which tensor is weight #2?” is not an out-of-band convention.
In practice, the expected workflow is:
- use GraphSpec to write down a model architecture in a shape-typed way,
- run it via TorchLean for concrete execution/training experiments, and
- use
Interp.specand proof libraries to prove properties of the same architecture.
Why do this if TorchLean already exists? #
TorchLean is the runtime and operator layer: it gives us typed tensors, a backend interface,
and executable programs (TorchLean.Program) that can run under the autograd/training runtime.
GraphSpec is the architecture/specification layer: it gives us a small typed syntax for model structure that comes with two linked meanings:
- a pure semantics (
Interp.spec) that is amenable to proofs in Lean, and - a compiler (
Compile.torchProgram) that turns the same description into something runnable.
You can write models directly in TorchLean, but then the “thing you reason about” is already in
the executable world (monadic references + backend ops). For many proofs, it is much cleaner to
reason about a pure function Params → Tensor → Tensor and separately prove that compilation to
the runtime preserves that meaning.
In other words:
- TorchLean answers: “Given ops, how do we run/train them?”
- GraphSpec answers: “How do we describe models so we can both run them and prove things about them?”
Mathematical View For Sequential Chains #
For g : Graph ps σ τ, think of g as denoting a function
⟦g⟧ : Params(ps) → Tensor σ → Tensor τ.
In this file, that semantics is implemented by Interp.spec, and it is defined structurally:
⟦id⟧ params x = x⟦prim p⟧ params x = p.specFwd params x⟦g₁ >>> g₂⟧ params x: splitparams : Params(ps₁ ++ ps₂)into(params₁ : Params ps₁, params₂ : Params ps₂), then compute⟦g₂⟧ params₂ (⟦g₁⟧ params₁ x).
The compiler Compile.torchProgram follows the same structure, but targets a monadic Torch
interface and expects arguments as params ++ [input] (matching TorchLean.NN.Seq.program).
Scope of Core.lean #
This file defines only the sequential core:
Primitive— a single typed operation with both a pure spec and a TorchLean implementation.Graph— sequential composition (>>>) of primitives with a typed parameter list.Interp.spec— pure interpreter.Compile.torchProgram— compiler toTorchLean.Program.LowerToDAG.Graph.toDAGTerm/toDAGModelZeroInit— the bridge from chain syntax to the canonical DAG representation.
For skip connections, shared intermediates, residual adds, or other multi-input nodes, use
NN.GraphSpec.DAG directly.
Direction #
GraphSpec is intended to grow into a hygienic “write once, run/prove many” layer:
- richer primitive packs (vision, language, classical ML, …),
- richer DAG structure (limited control flow where it can be compiled),
- verified compilation passes (fusion, constant folding, layout transforms) with proofs that they
preserve
Interp.spec, - a better parameter/initialization interface (explicit RNG threading, serialization, interop with PyTorch/ONNX exports),
- and a library of reusable theorems about common architectures (e.g. invariants of residual blocks, bounds for Lipschitz constants, etc.).
References / citations #
- PyTorch
nn.Moduleand graph tooling (torch.fx,torch.export,torch.compile) for the practical “execution/training first” baseline. - Automatic differentiation background: Baydin et al. (2018), “Automatic Differentiation in Machine Learning: a Survey”.
- ReLU: Nair & Hinton (2010).
Core graph language #
A primitive node in the GraphSpec language.
GraphSpec primitives package both sides of the “spec vs runtime” interface:
- a pure spec forward function (
specFwd) used by the reference interpreter, and - a TorchLean program (
torchProgram) used by the compiler.
Optionally, a primitive may also provide a lowering to a TorchLean LayerDef (used to build a
TorchLean.NN.Seq for training ergonomics + deterministic parameter initialization). Not every
primitive needs this (e.g. control-flow-ish nodes kept outside the sequential layer).
Why a record?
- It lets us grow the op set by adding new primitives in new files (rather than editing a single global inductive just to extend the vocabulary).
- It keeps the “spec vs TorchLean” linkage explicit: when you add an op, you must define both interpretations side-by-side.
Type indices:
ps : List Shapeare the parameter tensor shapes this primitive expects, in order.σ τ : Shapeare input/output tensor shapes.
- name : String
Human-readable name used mainly for debugging / error messages.
- specFwd {α : Type} [Context α] : Runtime.Autograd.Torch.TList α ps → Tensor.Tensor α σ → Tensor.Tensor α τ
Pure reference semantics (forward pass).
This is the function used by
Interp.spec. - torchProgram {α : Type} [Context α] [DecidableEq Shape] : Runtime.Autograd.TorchLean.Program α (ps ++ [σ]) τ
Executable TorchLean program.
The program expects its arguments as
ps ++ [σ](all parameters first, then the input). This convention aligns with how sequential TorchLean models (TorchLean.NN.Seq) expose their program interfaces. - toLayerDefM? : Option (ℕ → { l : Runtime.Autograd.TorchLean.NN.LayerDef σ τ // l.paramShapes = ps })
Optional lowering to a TorchLean
LayerDef.We thread an occurrence index (
Nat) so primitives can implement deterministic “per-layer” initialization (e.g. seed = f(index)). - countsAsLayer : Bool
Whether encountering this primitive should increment the layer-occurrence counter.
Instances For
Graph ps σ τ is a (restricted) model that:
- takes an input tensor of shape
σ, - produces an output tensor of shape
τ, - and uses parameters whose shapes are listed in
ps(in order).
This is a sequential (chain) graph language: the only composition operator is seq (>>>).
For sharing/skip connections, use NN.GraphSpec.DAG.
Implementation note:
- We encode the parameter list at the type level so composition automatically concatenates
parameter lists (
ps := ps₁ ++ ps₂). - This means every graph has a canonical “ABI” for parameters: a single typed list
TList α ps. When composingg₁ : Graph ps₁ σ τandg₂ : Graph ps₂ τ υ, the composite graph expects parameters of shape listps₁ ++ ps₂, and evaluation splits that list into the pieces needed by each subgraph.
- id
(s : Shape)
: Graph [] s s
Identity graph: passes the input through unchanged and requires no parameters.
- seq
{ps₁ ps₂ : List Shape}
{σ τ υ : Shape}
: Graph ps₁ σ τ → Graph ps₂ τ υ → Graph (ps₁ ++ ps₂) σ υ
Sequential composition. Parameter lists concatenate.
- prim
{ps : List Shape}
{σ τ : Shape}
: Primitive ps σ τ → Graph ps σ τ
Embed a single primitive node as a graph.
Instances For
Standard primitives (initial op set) #
Primitive linear layer.
Mathematical semantics (vector case):
Let x : Vec inDim, W : Mat outDim inDim, and b : Vec outDim. Then:
linear(W,b,x) = W * x + b.
This matches the standard dense layer as in PyTorch torch.nn.Linear / torch.nn.functional.linear
(up to the usual row/column convention; here the shape indices make the intended dimensions
explicit).
Type-level parameter interface:
- parameter shapes are
[Mat outDim inDim, Vec outDim], - input shape is
Vec inDim, - output shape is
Vec outDim.
So a graph containing a linear node forces you to supply exactly a weight matrix and bias
vector of the right shapes, and it fixes their ordering in the model’s parameter list.
References:
- Dense layers are standard; for PyTorch behavior see
torch.nn.Lineardocumentation. - For the semantics used by the spec interpreter, see
NN.Spec.Module.Linear(Spec.linear_spec).
Initialization semantics:
- we attach a TorchLean
LayerDefso graphs can be lowered toTorchLean.NN.Seq, - and we seed
W,bdeterministically from the layer-occurrence index:seedW = 2*i,seedB = 2*i + 1.
The deterministic occurrence-index rule keeps end-to-end examples reproducible while preserving a single GraphSpec → TorchLean → training path.
Instances For
ReLU activation (parameter-free).
Mathematical semantics: elementwise relu(x) = max(x, 0).
This is shape-preserving and parameter-free, so its parameter list is [] and its input/output
shape indices are both s.
References:
- Nair & Hinton (2010), “Rectified Linear Units Improve Restricted Boltzmann Machines”.
- Spec definition:
Activation.relu_specinNN.Spec.Layers.Activation.
Instances For
Last-axis softmax (parameter-free).
Softmax turns “logits” into a probability distribution along the last axis:
softmax(x)_i = exp(x_i) / (∑_j exp(x_j)).
In TorchLean’s spec layer, this is implemented as a genuine last-axis tensor softmax (recursing
over outer dimensions), analogous to torch.softmax(x, dim=-1) in PyTorch.
Notes:
- Softmax is not elementwise; it normalizes across an axis, so it is a canonical example of a non-pointwise activation.
- For numerical stability, practical implementations often rewrite softmax using
logsumexp. The spec semantics here follows the dedicatedActivation.softmax_spec.
References:
- Spec definition:
Activation.softmax_specinNN.Spec.Layers.Activation. - PyTorch API analogy:
torch.softmax(x, dim=-1).
Instances For
Graph constructor for Primitive.linear.
Instances For
Graph constructor for Primitive.relu.
Instances For
Graph constructor for Primitive.softmax.
Instances For
Lowering: sequential Graph → DAG term/model #
GraphSpec has two surface syntaxes:
NN.GraphSpec.Core: a sequential DSL (Graph+>>>), ideal for pure pipelines.NN.GraphSpec.DAG.Core: a general SSA/A-normal-form term language, ideal for sharing/skip connections.
The DAG term/model language is GraphSpec’s “general graph” core: it is the representation that can express sharing and skip connections.
Sequential Graph exists because it is the clearest way to write pipelines, and it has its own
direct Spec semantics (Interp.spec) and compiler (Compile.torchProgram).
This lowering is still useful whenever you want to embed a sequential pipeline into the DAG world (e.g. to reuse DAG-only tooling, or to keep a single GraphSpec example surface that can export DAG models).
This section provides a structural lowering:
Graph.toDAGTermproduces aDAG.Term (ps ++ [σ]) τ, i.e. a DAG term whose environment starts with the parameter listpsand ends with the (single) data inputσ.Graph.toDAGModelZeroInitwraps that term into aDAG.Model ps [σ] τwith a simple default parameter initialization (all zeros).
Notes:
- The lowering is purely structural: it introduces
let1binders between stages to make the sequential flow explicit in SSA form. - Each sequential
Primitive ps σ τis embedded as a DAG primitive op with inputsps ++ [σ]. This embedding is generic: any custom GraphSpec primitive automatically becomes usable in the DAG world.
Lowering internals #
The definitions in this section (castTerm, toTerm, …) are internal adapters for the structural
lowering. The intended public API is Graph.toDAGTerm / Graph.toDAGModelZeroInit.
Embed a sequential GraphSpec primitive as a DAG primitive op.
The resulting op has input shapes ps ++ [σ] (parameters followed by the data input).
Instances For
Building well-typed DAG arguments for a primitive call #
Build a typed DAG.Args list from an index-based family of argument terms.
This is the bridge from “arguments as a function of Fin ins.length” to the inductive DAG.Args
encoding used by DAG.Term.op.
Instances For
Reference the ith parameter block inside a larger environment layout.
The surrounding environment is split as pre ++ ps ++ post ++ extra; this helper returns the term
that points at parameter i : Fin ps.length while keeping the full ambient environment explicit.
Instances For
Lower a unary Primitive application into the DAG term language.
Parameters are read from the middle ps segment of the ambient environment, in the same order as
the primitive's parameter ABI, and the final data input is supplied by x.
Instances For
Graph lowering #
Public API #
Initialize a parameter list by filling every tensor with zeros (useful for proofs and examples).
Instances For
Deterministic initialization for sequential graphs #
Graph.toDAGModelZeroInit is total, but its parameters are all-zero tensors, which is convenient
for proofs and shape-only examples but not representative of training setups.
For graphs whose primitives provide Primitive.toLayerDefM?, we can reuse TorchLean’s deterministic
initializers (e.g. Xavier init for linear weights) in a way that matches ToTorchLean.toSeq:
- we thread an occurrence index
i : Nat, - primitives with
countsAsLayer = trueincrement it, - and each primitive’s
LayerDef.initParamsuses seeds derived fromi.
We expose this as Graph.toDAGModelDetInit? : Except String (DAG.Model ...):
it fails if any primitive lacks a toLayerDefM? lowering.
Compute deterministic initialization tensors for a sequential Graph, threading a “layer
occurrence index”.
This matches ToTorchLean.toSeq’s notion of “occurrence”: only primitives with
countsAsLayer = true advance the counter.
Instances For
Convenience wrapper: start the occurrence index at 0 and discard its final value.
Instances For
Lower a sequential Graph to a DAG Model with a simple default init (all zeros).
This is mainly a convenience for GraphSpec example organization; for training-oriented init,
see NN.GraphSpec.ToTorchLean (Seq lowering) and/or provide your own initializer.
Instances For
Lower a sequential Graph to a DAG Model, using deterministic initialization.
This is the DAG analogue of ToTorchLean.toSeq’s initialization semantics: it uses each primitive’s
toLayerDefM? to obtain a TorchLean LayerDef, then reuses the LayerDef.initParams.
This returns Except String because not every primitive necessarily admits a LayerDef lowering.
Instances For
Semantics (sequential core) #
The sequential DSL (Graph with >>>) has direct semantics:
Interp.specevaluates a sequential graph as a pure function on tensors, andCompile.torchProgramcompiles it to a backend-genericTorchLean.Program.
Even though a sequential graph is semantically a path-shaped DAG, we keep the sequential interpreter/compiler direct for two pragmatic reasons:
- Proof ergonomics. For chain graphs, definitional reduction is much simpler when we evaluate directly rather than going through an SSA lowering.
- Engineering clarity. The sequential and DAG languages have different invariants (parameter
concatenation vs explicit
let1sharing). Keeping each semantics close to its syntax makes the code easier to audit.
We still provide a structural lowering LowerToDAG.Graph.toDAGModelZeroInit so that DAG-only
tooling
can consume sequential models. The DAG path becomes the canonical execution route when a caller
wants explicit sharing together with the corresponding simp lemmas / proof infrastructure.
Split a typed parameter list for a sequential composition.
If ps = ps₁ ++ ps₂, then a value of type TList α ps can be split into the prefix parameters for
the left subgraph and the remaining parameters for the right subgraph.
Instances For
Pure Spec semantics of a sequential Graph.
Instances For
Compile a sequential Graph to a backend-generic TorchLean Program.