GraphSpec ResNet-18 #

This file defines a ResNet-18–style convolutional network using GraphSpec's general DAG IR, with BasicBlocks, projection shortcuts, shape-indexed parameters, and TorchLean compilation support.

Why This Is A DAG Model #

Classic ResNet blocks are not purely sequential: the input x flows down two paths:

a “main” path (Conv → BN → ReLU → Conv → BN),
a “skip” path (identity, or a learned projection when shapes change),

and then they are added. In a chain-only representation you either:

recompute shared values, or
add special-case “skip” combinators that complicate the core language.

GraphSpec’s DAG IR takes a different approach: it provides a small SSA-like term language (Term + Args) that can naturally express sharing. The semantics are:

Term.eval: pure Spec interpreter (math-first).
Term.compile: TorchLean program compilation (executable).

ResNet-18 here is written once, then we get:

spec-side forward semantics (for proofs / reference),
a backend-generic TorchLean Program (for execution / training).

Model Scope #

This is a “CHW, no batch” variant (C×H×W), matching the rest of the Spec/TorchLean vision layers. It is faithful to the core ResNet-18 structure:

stem: 7×7 conv stride 2 padding 3, BN, ReLU, 3×3 maxpool stride 2 padding 1
stages: [2,2,2,2] BasicBlocks with channel widths [64,128,256,512]
- first block of stages 2–4 downsamples (stride 2) and uses a 1×1 projection shortcut
head: global average pool, linear classifier

The state records exactly the metadata needed for parameter allocation:

Conv bias is included (our Conv2D spec has it), even though many PyTorch ResNets omit it.
BatchNorm is “train-time” BN with learnable gamma/beta (no running mean/var state).

Shapes And Type-Level Arithmetic #

The main practical challenge is typing the residual adds:

for stride=1 blocks, we need the conv output shape to be exactly CHW c h w so we can add it to the skip input x : CHW c h w.
for stride=2 blocks, both main-path and skip-path must agree on the downsampled shape.

We solve this by defining a small family of typed primitives that cast the “raw” conv/pool output shapes into a canonical downsample formula:

down2(h) = (h - 1) / 2 + 1

This is the standard stride-2 output formula for kernels with effective receptive field 1 or 3 when you choose padding the usual ResNet way:

7×7 s=2 p=3 → outH = (h - 1)/2 + 1
3×3 s=2 p=1 → outH = (h - 1)/2 + 1
1×1 s=2 p=0 → outH = (h - 1)/2 + 1
3×3 maxpool s=2 p=1 → outH = (h - 1)/2 + 1

By enforcing this “canonical” output shape, the residual add becomes definitional/typecheckable without introducing runtime reshapes.

References / citations:

He et al. (2016), “Deep Residual Learning for Image Recognition” (ResNet-18, BasicBlock).
Ioffe & Szegedy (2015), “Batch Normalization…” (BN).
Lin et al. (2013), “Network In Network” (global average pooling).

GraphSpec ResNet-18 #

Why This Is A DAG Model #

Model Scope #

Shapes And Type-Level Arithmetic #

Note on parameter indexing #

Canonical stride-2 downsample formula #

Small typed primitives for ResNet typing #

Parameter layout #

Deterministic initialization #

Model #