TorchLean API

Docs Home Guide Examples Graphs

NN.Spec.Layers.GlobalPooling

Global pooling (spec layer) #

Global pooling reduces spatial dimensions (H×W) either to 1×1 (retain channel axis) or to a flat vector of size inC. This file provides both average and max variants, together with explicit backward rules.

We tried to mimic PyTorch closely:

The common pattern is AdaptiveAvgPool2d((1,1)) / AdaptiveMaxPool2d((1,1)), then flatten to a length-C vector before a classifier.
We usually work with a single image (C,H,W) (no batch dimension) here to keep the API small.

Forward generalizes cleanly (and we intentionally structure the code that way):

Global pooling is "reduce each channel over (H,W)".
The helpers global_pool2d_1x1 and global_pool2d_flat already capture the reusable shape and indexing discipline; the only thing that changes between avg/max/min/etc. is the reduce : Image inH inW α → Tensor α .scalar.

Max-pooling subtlety:

If there are multiple spatial positions achieving the same maximum, the backward pass needs a tie-breaking convention. This file provides both:
- a "mask all max positions" rule (sending the full gradient to every max), and
- a "distributed" rule (split the gradient evenly among max positions). PyTorch's exact tie behavior is an implementation detail; the important thing is to make the choice explicit in the spec.

Why the backward does not unify for free:

Different reductions have genuinely different adjoints. Average pooling sends the upstream gradient uniformly to every spatial position; max/min pooling routes gradients only to the argmax/argmin set and must choose a tie convention.
So while the forward can be abstracted over a reduce, a fully generic backward would need extra structure (basically "a reduce + its VJP"). That is why we keep explicit backward specs for the concrete ops we care about.

Layer tags #

Global pooling has no trainable parameters. We still keep a compact "layer spec" record so call sites can carry a tag (and so the API matches the style of other layer files).

structure Spec.GlobalAvgPool2DSpec :

Tag structure for global average pooling (no trainable parameters).

Instances For

structure Spec.GlobalMaxPool2DSpec :

Tag structure for global max pooling (no trainable parameters).

Instances For

def Spec.globalPool2dOutShape (inC : ℕ) :

Output shape for global pooling that keeps a 1 x 1 spatial grid: (C,H,W) -> (C,1,1).

Instances For

def Spec.globalPool2dFlatOutShape (inC : ℕ) :

Output shape for global pooling that flattens spatial dims away: (C,H,W) -> (C).

Instances For

Helper: reduce a single channel over its spatial grid #

This is the shared "walk the (H,W) grid" loop used by avg/max pooling.

def Spec.reduceSpatial {α : Type} (inH inW : ℕ) (init : α) (f : α → α → α) (channel_data : Image inH inW α) :

α

Reduce a single channel Image inH inW α down to a scalar using a fold over (H,W).

Instances For

def Spec.channelSpatialMax {α : Type} [Max α] {inH inW : ℕ} (hH : inH ≠ 0) (hW : inW ≠ 0) (channel_data : Image inH inW α) :

α

Compute the exact spatial maximum of a non-empty (inH × inW) channel.

Instances For

def Spec.globalChannelReduce {α : Type} (inH inW : ℕ) (channel_data : Image inH inW α) (init : α) (f : α → α → α) :

α

Alias for reduce_spatial (kept to make call sites read like "reduce this channel").

Instances For

Helper: "wrap a scalar result back into an image" #

PyTorch mental picture: after pooling you conceptually have a scalar per channel; these helpers put that scalar back into a scalar tensor shape.

def Spec.broadcastScalar1x1 {α : Type} (v : Tensor α Shape.scalar) :

Image 1 1 α

Broadcast a scalar into a 1 x 1 image.

Instances For

def Spec.globalPool2d1x1 {α : Type} [Zero α] (inC inH inW : ℕ) (reduce : Image inH inW α → Tensor α Shape.scalar) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC 1 1 α

Generic global pooling helper producing (C,1,1).

Instances For

def Spec.globalPool2dFlat {α : Type} [Zero α] (inC inH inW : ℕ) (reduce : Image inH inW α → Tensor α Shape.scalar) (input : MultiChannelImage inC inH inW α) :

Tensor α (Shape.dim inC Shape.scalar)

Generic global pooling helper producing (C).

Instances For

Forward specs #

These are the layer-level forward meanings, written in the same style as PyTorch.

def Spec.globalAvgPool2dSpec {α : Type} [Zero α] [Add α] [HDiv α ℕ α] [OfNat α 0] [OfNat α 1] {inC inH inW : ℕ} (_h1 : inH ≠ 0) (_h2 : inW ≠ 0) (_layer : GlobalAvgPool2DSpec) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC 1 1 α

Global average pooling: (C,H,W) -> (C,1,1).

Instances For

def Spec.globalMaxPool2dSpec {α : Type} [Numbers α] [Max α] [Zero α] {inC inH inW : ℕ} (h1 : inH ≠ 0) (h2 : inW ≠ 0) (_layer : GlobalMaxPool2DSpec) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC 1 1 α

Global max pooling: (C,H,W) -> (C,1,1).

Instances For

def Spec.globalAvgPool2dFlatSpec {α : Type} [Coe ℕ α] [Div α] [Zero α] [Add α] [Zero α] [One α] {inC inH inW : ℕ} (_h1 : inH ≠ 0) (_h2 : inW ≠ 0) (_layer : GlobalAvgPool2DSpec) (input : MultiChannelImage inC inH inW α) :

Tensor α (Shape.dim inC Shape.scalar)

Global average pooling (flattened): (C,H,W) -> (C).

Instances For

def Spec.globalMaxPool2dFlatSpec {α : Type} [Numbers α] [Max α] [Zero α] {inC inH inW : ℕ} (h1 : inH ≠ 0) (h2 : inW ≠ 0) (_layer : GlobalMaxPool2DSpec) (input : MultiChannelImage inC inH inW α) :

Tensor α (Shape.dim inC Shape.scalar)

Global max pooling (flattened): (C,H,W) -> (C).

Instances For

Backward/VJP specs #

These are reverse-mode rules that match the intended math:

avg pooling: distribute the upstream gradient evenly over all (H,W) positions;
max pooling: route the upstream gradient to the max locations (with a tie convention).

def Spec.globalAvgPool2dBackwardSpec {α : Type} [Context α] {inC inH inW : ℕ} (_h1 : inH ≠ 0) (_h2 : inW ≠ 0) (_layer : GlobalAvgPool2DSpec) (grad_output : MultiChannelImage inC 1 1 α) :

MultiChannelImage inC inH inW α

Backward/VJP for global average pooling (C,1,1) output.

Instances For

def Spec.globalAvgPool2dFlatBackwardSpec {α : Type} [Context α] {inC inH inW : ℕ} (_h1 : inH ≠ 0) (_h2 : inW ≠ 0) (_layer : GlobalAvgPool2DSpec) (grad_output : Tensor α (Shape.dim inC Shape.scalar)) :

MultiChannelImage inC inH inW α

Backward/VJP for flattened global average pooling (C) output.

Instances For

def Spec.globalMaxPool2dBackwardSpec {α : Type} [Context α] {inC inH inW : ℕ} (h1 : inH ≠ 0) (h2 : inW ≠ 0) (_layer : GlobalMaxPool2DSpec) (input : MultiChannelImage inC inH inW α) (grad_output : MultiChannelImage inC 1 1 α) :

MultiChannelImage inC inH inW α

Backward/VJP for global max pooling (C,1,1) output.

Tie convention: every spatial position equal to the maximum receives the full upstream gradient.

Instances For

def Spec.globalMaxPool2dFlatBackwardSpec {α : Type} [Context α] {inC inH inW : ℕ} (h1 : inH ≠ 0) (h2 : inW ≠ 0) (_layer : GlobalMaxPool2DSpec) (input : MultiChannelImage inC inH inW α) (grad_output : Tensor α (Shape.dim inC Shape.scalar)) :

MultiChannelImage inC inH inW α

Backward/VJP for flattened global max pooling (C) output.

Tie convention: every spatial position equal to the maximum receives the full upstream gradient.

Instances For

def Spec.globalMaxPool2dBackwardDistributedSpec {α : Type} [Context α] {inC inH inW : ℕ} (h1 : inH ≠ 0) (h2 : inW ≠ 0) (_layer : GlobalMaxPool2DSpec) (input : MultiChannelImage inC inH inW α) (grad_output : MultiChannelImage inC 1 1 α) :

MultiChannelImage inC inH inW α

Alternative max-pooling backward that splits the gradient evenly across max positions.

This is often a nicer mathematical choice when the max is not unique.

Instances For