GraphSpec ResNet-18 #
This file defines a ResNet-18–style convolutional network using GraphSpec's general DAG IR, with BasicBlocks, projection shortcuts, shape-indexed parameters, and TorchLean compilation support.
Why This Is A DAG Model #
Classic ResNet blocks are not purely sequential: the input x flows down two paths:
- a “main” path (Conv → BN → ReLU → Conv → BN),
- a “skip” path (identity, or a learned projection when shapes change),
and then they are added. In a chain-only representation you either:
- recompute shared values, or
- add special-case “skip” combinators that complicate the core language.
GraphSpec’s DAG IR takes a different approach: it provides a small SSA-like term language
(Term + Args) that can naturally express sharing. The semantics are:
Term.eval: pure Spec interpreter (math-first).Term.compile: TorchLean program compilation (executable).
ResNet-18 here is written once, then we get:
- spec-side forward semantics (for proofs / reference),
- a backend-generic TorchLean
Program(for execution / training).
Model Scope #
This is a “CHW, no batch” variant (C×H×W), matching the rest of the Spec/TorchLean vision layers. It is faithful to the core ResNet-18 structure:
- stem: 7×7 conv stride 2 padding 3, BN, ReLU, 3×3 maxpool stride 2 padding 1
- stages: [2,2,2,2] BasicBlocks with channel widths [64,128,256,512]
- first block of stages 2–4 downsamples (stride 2) and uses a 1×1 projection shortcut
- head: global average pool, linear classifier
The state records exactly the metadata needed for parameter allocation:
- Conv bias is included (our Conv2D spec has it), even though many PyTorch ResNets omit it.
- BatchNorm is “train-time” BN with learnable gamma/beta (no running mean/var state).
Shapes And Type-Level Arithmetic #
The main practical challenge is typing the residual adds:
- for stride=1 blocks, we need the conv output shape to be exactly
CHW c h wso we can add it to the skip inputx : CHW c h w. - for stride=2 blocks, both main-path and skip-path must agree on the downsampled shape.
We solve this by defining a small family of typed primitives that cast the “raw” conv/pool output shapes into a canonical downsample formula:
down2(h) = (h - 1) / 2 + 1
This is the standard stride-2 output formula for kernels with effective receptive field 1 or 3 when you choose padding the usual ResNet way:
- 7×7 s=2 p=3 → outH = (h - 1)/2 + 1
- 3×3 s=2 p=1 → outH = (h - 1)/2 + 1
- 1×1 s=2 p=0 → outH = (h - 1)/2 + 1
- 3×3 maxpool s=2 p=1 → outH = (h - 1)/2 + 1
By enforcing this “canonical” output shape, the residual add becomes definitional/typecheckable without introducing runtime reshapes.
References / citations:
- He et al. (2016), “Deep Residual Learning for Image Recognition” (ResNet-18, BasicBlock).
- Ioffe & Szegedy (2015), “Batch Normalization…” (BN).
- Lin et al. (2013), “Network In Network” (global average pooling).
Convenient local abbreviation for channel-first image tensors (C × H × W).
Instances For
Note on parameter indexing #
Inside model, the environment is Γ = params ++ [x], so parameters are accessed by index.
We avoid clever macros here. In practice, being explicit is faster to debug and (importantly) more stable under refactors: each use site carries its own proof that the index points at the expected shape.
Canonical stride-2 downsample formula #
Canonical stride-2 output-size formula used throughout this file.
We write it once and reuse it for the stem, the downsampling residual blocks, and the max-pool so that all of those paths literally agree on the same type-level height/width expression.
Instances For
Small typed primitives for ResNet typing #
Typed 7×7 stem convolution (stride 2, padding 3), cast to the canonical down2 spatial sizes.
Instances For
Typed 3×3 convolution (stride 2, padding 1), cast to the canonical down2 spatial sizes.
Instances For
Typed 1×1 projection convolution (stride 2), cast to the canonical down2 spatial sizes.
Instances For
Parameter layout #
Parameter layout for a convolution layer: kernel first, then bias.
Instances For
Parameter layout for affine batch norm: (gamma, beta).
Instances For
Parameter ABI for a single BasicBlock.
When downsample = true, this includes the projection shortcut (1×1 conv + BN) parameters after
the two main-path conv/BN pairs.
Instances For
Parameter ABI for one ResNet-18 stage.
A stage is two BasicBlocks. The first block downsamples exactly when inC ≠ outC; the second
always keeps the same number of channels.
Instances For
Full parameter ABI for the GraphSpec ResNet-18 model.
This list is deliberately written in a flat, explicit order. The model body indexes parameters by
closed numerals, and the flat ABI keeps those index proofs predictable for simp and easy to audit.
Instances For
The ResNet-18 GraphSpec ABI contains exactly 82 parameter tensors.
This is a small but useful structural theorem: the executable wrapper, deterministic initializer,
and DAG body all rely on the same flat ABI. Keeping the count as a named theorem makes accidental
parameter-layout edits visible during review instead of hiding them inside a local simp.
Deterministic initialization #
Deterministically initialize a convolution kernel (uniform) and bias (zeros).
Instances For
Deterministically initialize BatchNorm parameters (gamma, beta) as (ones, zeros).
Instances For
Deterministically initialize linear weights (uniform) and bias (zeros).
Instances For
Model #
GraphSpec ResNet-18 model.
This is the public entrypoint for the DAG-authored ResNet in this directory. It packages together:
- the full typed parameter ABI (
params inC numClasses), - deterministic initialization for all 82 parameter tensors,
- and a DAG body whose pure semantics can be interpreted via
DAG.Model.specFwdand compiled viaDAG.Model.torchProgram.
The model is channel-first (CHW) and batch-free, matching the rest of TorchLean's vision-side
Spec and runtime layers.