Pooling layers (spec layer) #

This file defines a small set of pooling operators in TorchLean's spec layer.

PyTorch analogies:

max_pool2d_* corresponds to torch.nn.functional.max_pool2d.
avg_pool2d_* corresponds to torch.nn.functional.avg_pool2d.
adaptive_* corresponds to torch.nn.functional.adaptive_{avg,max}_pool2d (output size fixed, pooling regions vary per output position).

We also include a smooth log-sum-exp surrogate for max pooling. It is useful when you want an everywhere-differentiable approximation for proofs or analysis, without changing the rest of the pooling API.

source

structure Spec.MaxPool2DSpec (kH kW stride : ℕ) (h1 : kH ≠ 0) (h2 : kW ≠ 0) (hStride : stride ≠ 0) :

Type

MaxPool2d configuration.

The spec uses a fixed kernel (kH,kW) and a single stride value (applied to both height and width). We require kH ≠ 0, kW ≠ 0, and stride ≠ 0 so windows are nonempty and the output-shape arithmetic is well-defined.

PyTorch analogy: F.max_pool2d(x, kernel_size=(kH,kW), stride=stride).

kernelHeight : ℕ
kernel Height.
kernelWidth : ℕ
kernel Width.
stride : ℕ
Stride.

Instances For

source

structure Spec.AvgPool2DSpec (kH kW stride : ℕ) (h1 : kH ≠ 0) (h2 : kW ≠ 0) (hStride : stride ≠ 0) :

Type

AvgPool2d configuration.

We treat the pooling window as kH*kW elements and divide by that count. This corresponds to PyTorch's default behavior when no padding is present.

kernelHeight : ℕ
kernel Height.
kernelWidth : ℕ
kernel Width.
stride : ℕ
Stride.

Instances For

source

def Spec.pool2dOutShape (inH inW kH kW stride : ℕ) :

Shape

Output shape for a 2D pooling op (single-channel) with no padding.

This uses the standard "valid" pooling formula:

outH = floor((inH - kH)/stride) + 1, outW = floor((inW - kW)/stride) + 1.

PyTorch analogy: ceil_mode=false with no padding.

Instances For

source

def Spec.pool2dMultiOutShape (inC inH inW kH kW stride : ℕ) :

Shape

Output shape for multi-channel 2D pooling (channels preserved).

Instances For

source

def Spec.pool2dOutShapePad (inH inW kH kW stride padding : ℕ) :

Shape

Output shape for a 2D pooling op (single-channel) with symmetric padding.

padding means we use the usual PyTorch output-size formula for an input extended by padding cells on each side. Hard max-pooling ignores padded cells (the PyTorch -∞ convention), while average-pooling below explicitly includes padded zeros.

Instances For

source

def Spec.pool2dMultiOutShapePad (inC inH inW kH kW stride padding : ℕ) :

Shape

Output shape for multi-channel 2D pooling with symmetric padding (channels preserved).

Instances For

Smooth max pooling #

max_pool2d_spec uses max and is non-differentiable (ties and kink points).

For proofs that need everywhere differentiability, we provide a smooth surrogate based on log-sum-exp over each pooling window:

smooth_max(x₁,…,xₙ) = (1 / β) * log (∑ exp (β * xᵢ))

This is the standard log-sum-exp surrogate and is intended for β ≠ 0.

source

def Spec.smoothMaxPool2dSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (beta : α) (input : Image inH inW α) :

Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

Smooth max-pooling (single-channel) using a log-sum-exp surrogate.

This is useful in proof settings that want a differentiable alternative to max_pool2d_spec. For large beta, the output approaches hard max pooling.

Instances For

source

def Spec.smoothMaxPool2dMultiSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (beta : α) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

Smooth max-pooling (multi-channel): apply smooth_max_pool2d_spec per channel.

Instances For

source

def Spec.smoothMaxPool2dJvpSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (beta : α) (input tangent : Image inH inW α) :

Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

Forward-mode JVP for smooth max-pooling (single-channel).

For each pooling window this is the differential of the log-sum-exp surrogate, Σᵢ softmax(beta*xᵢ) * dxᵢ. This mirrors the VJP weights below but pushes an input tangent forward instead of pulling an output cotangent backward.

Instances For

source

def Spec.smoothMaxPool2dMultiJvpSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (beta : α) (input tangent : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

Multi-channel JVP for smooth max-pooling (channel-wise application).

Instances For

source

def Spec.maxPool2dSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input : Image inH inW α) :

Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

MaxPool2d forward pass (single-channel).

This takes the maximum over each kH×kW window sampled with the given stride. The return type encodes the standard output spatial size formula.

Instances For

source

def Spec.maxPool2dMultiSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

MaxPool2d forward pass (multi-channel): apply max_pool2d_spec per channel.

Instances For

source

def Spec.maxPool2dJvpSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input tangent : Image inH inW α) :

Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

Forward-mode JVP for hard max-pooling (single-channel).

The tangent is read at the argmax chosen by the primal input. At ties the first row-major maximizer is used, matching maxPool2dBackwardSpec and PyTorch's index convention.

Instances For

source

def Spec.maxPool2dMultiJvpSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input tangent : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

Multi-channel JVP for hard max-pooling (channel-wise application).

Instances For

source

def Spec.avgPool2dSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : AvgPool2DSpec kH kW stride h1 h2 hStride) (input : Image inH inW α) :

Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

AvgPool2d forward pass (single-channel).

We sum all values in the window and divide by kH*kW. PyTorch analogy: avg_pool2d with count_include_pad=true only matters for padded pooling; for the unpadded case it matches the usual definition.

Instances For

source

def Spec.avgPool2dMultiSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} (h1 : kH ≠ 0) (h2 : kW ≠ 0) {hStride : stride ≠ 0} (layer : AvgPool2DSpec kH kW stride h1 h2 hStride) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α

AvgPool2d forward pass (multi-channel): apply avg_pool2d_spec per channel.

Instances For

source

structure Spec.AdaptiveAvgPool2DSpec (outH outW : ℕ) :

Type

Spec record for adaptive average pooling to a fixed output size.

outputHeight : ℕ
output Height.
outputWidth : ℕ
output Width.

Instances For

source

structure Spec.AdaptiveMaxPool2DSpec (outH outW : ℕ) :

Type

Spec record for adaptive max pooling to a fixed output size.

outputHeight : ℕ
output Height.
outputWidth : ℕ
output Width.

Instances For

Adaptive pooling #

PyTorch defines adaptive pooling by partitioning the input into out bins. For output index i, the pooling region is:

start = floor(i * in / out)
end = ceil((i+1) * in / out)

This matters when in is not divisible by out: region sizes vary by at most 1.

source

def Spec.adaptiveStart (inSize outSize i : ℕ) :

ℕ

Adaptive-pooling region start index: floor(i * in / out) (PyTorch definition).

Instances For

source

def Spec.adaptiveEnd (inSize outSize i : ℕ) :

ℕ

Adaptive-pooling region end index: ceil((i+1) * in / out) (PyTorch definition).

Instances For

source

def Spec.adaptiveAvgPool2dSpec {α : Type} [Context α] {inH inW inC : ℕ} (outH outW : ℕ) (_layer : AdaptiveAvgPool2DSpec outH outW) (input : MultiChannelImage inC inH inW α) (_h_inH : inH > 0 := by norm_num) (_h_inW : inW > 0 := by norm_num) (_h_outH : outH > 0 := by norm_num) (_h_outW : outW > 0 := by norm_num) :

MultiChannelImage inC outH outW α

AdaptiveAvgPool2d forward pass.

Unlike fixed-kernel pooling, adaptive pooling chooses a window for each output position so that the whole input is covered by outH×outW bins. This follows the PyTorch start/end formula (see the section comment above).

Instances For

source

def Spec.adaptiveMaxPool2dSpec {α : Type} [Context α] {inH inW inC : ℕ} (outH outW : ℕ) (_layer : AdaptiveMaxPool2DSpec outH outW) (input : MultiChannelImage inC inH inW α) (_h_inH : inH > 0 := by norm_num) (_h_inW : inW > 0 := by norm_num) (_h_outH : outH > 0 := by norm_num) (_h_outW : outW > 0 := by norm_num) :

MultiChannelImage inC outH outW α

AdaptiveMaxPool2d forward pass (same binning as adaptive avg, but with max instead of mean).

We intentionally do not use a numeric sentinel value to seed the max fold; we seed from the first element of the region via getValueAtPosition. That keeps the spec meaningful across different scalar backends.

Instances For

source

def Spec.maxPool2dBackwardSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (_layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input : Image inH inW α) (grad_output : Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α) :

Image inH inW α

Backward/VJP for max_pool2d_spec.

This propagates each output gradient to the argmax location inside the corresponding window. Tie-breaking: if multiple values in the window are equal to the maximum, we keep the first position in row-major order (same convention as PyTorch's max-pool indices).

Instances For

source

def Spec.maxPool2dMultiBackwardSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input : MultiChannelImage inC inH inW α) (grad_output : MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α) :

MultiChannelImage inC inH inW α

Multi-channel max-pooling backward (channel-wise application of max_pool2d_backward_spec).

Instances For

source

def Spec.avgPool2dBackwardSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} (_h1 : kH ≠ 0) (_h2 : kW ≠ 0) {hStride : stride ≠ 0} (_layer : AvgPool2DSpec kH kW stride _h1 _h2 hStride) (grad_output : Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α) :

Image inH inW α

Backward/VJP for avg_pool2d_spec (single-channel).

Each output gradient is evenly distributed across its corresponding input window.

Instances For

Padded pooling (symmetric padding) #

For max-pooling, padded locations are not real input elements and are ignored when selecting the maximum. This is the scalar-polymorphic way to model PyTorch's -∞ max-pool padding without adding a backend-specific infinity constant to Context α.

For average pooling, this corresponds to including padded zeros in the average (PyTorch's default count_include_pad = true).

source

def Spec.unpadImage {α : Type} [Context α] {H W padding : ℕ} (img : Image (H + 2 * padding) (W + 2 * padding) α) :

Image H W α

Remove symmetric zero-padding from a single-channel image.

Instances For

source

def Spec.unpadMultiChannel {α : Type} [Context α] {C H W padding : ℕ} (img : MultiChannelImage C (H + 2 * padding) (W + 2 * padding) α) :

MultiChannelImage C H W α

Remove symmetric zero-padding from a multi-channel image (channel-wise unpad_image).

Instances For

source

def Spec.maxPool2dMultiSpecPad {α : Type} [Context α] {kH kW inH inW inC stride padding : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH + 2 * padding - kH) / stride + 1) ((inW + 2 * padding - kW) / stride + 1) α

Multi-channel max-pooling forward pass with PyTorch-style padding (-∞ outside bounds).

Instances For

source

def Spec.maxPool2dMultiJvpSpecPad {α : Type} [Context α] {kH kW inH inW inC stride padding : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input tangent : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH + 2 * padding - kH) / stride + 1) ((inW + 2 * padding - kW) / stride + 1) α

Forward-mode JVP for padded hard max-pooling.

Padding cells are ignored exactly as in maxPool2dMultiSpecPad, so the tangent is taken from the primal winner among real input locations only. If a window contains no real input cells, the forward value and tangent are both 0.

Instances For

source

def Spec.avgPool2dMultiSpecPad {α : Type} [Context α] {kH kW inH inW inC stride padding : ℕ} (h1 : kH ≠ 0) (h2 : kW ≠ 0) {hStride : stride ≠ 0} (layer : AvgPool2DSpec kH kW stride h1 h2 hStride) (input : MultiChannelImage inC inH inW α) :

MultiChannelImage inC ((inH + 2 * padding - kH) / stride + 1) ((inW + 2 * padding - kW) / stride + 1) α

Multi-channel average pooling forward pass with symmetric zero padding.

Instances For

source

def Spec.maxPool2dMultiBackwardSpecPad {α : Type} [Context α] {kH kW inH inW inC stride padding : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (input : MultiChannelImage inC inH inW α) (grad_output : MultiChannelImage inC ((inH + 2 * padding - kH) / stride + 1) ((inW + 2 * padding - kW) / stride + 1) α) :

MultiChannelImage inC inH inW α

Multi-channel max-pooling backward pass with PyTorch-style padding (-∞ outside bounds).

Instances For

source

def Spec.avgPool2dMultiBackwardSpecPad {α : Type} [Context α] {kH kW inH inW inC stride padding : ℕ} (h1 : kH ≠ 0) (h2 : kW ≠ 0) {hStride : stride ≠ 0} (layer : AvgPool2DSpec kH kW stride h1 h2 hStride) (grad_output : MultiChannelImage inC ((inH + 2 * padding - kH) / stride + 1) ((inW + 2 * padding - kW) / stride + 1) α) :

MultiChannelImage inC inH inW α

Multi-channel average-pooling backward pass with symmetric padding (backprop then unpad).

Instances For

source

def Spec.smoothMaxPool2dBackwardSpec {α : Type} [Context α] {kH kW inH inW stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (_layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (beta : α) (input : Image inH inW α) (grad_output : Image ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α) :

Image inH inW α

Backward/VJP for smooth_max_pool2d_spec (log-sum-exp surrogate).

Instances For

source

def Spec.smoothMaxPool2dMultiBackwardSpec {α : Type} [Context α] {kH kW inH inW inC stride : ℕ} {h1 : kH ≠ 0} {h2 : kW ≠ 0} {hStride : stride ≠ 0} (layer : MaxPool2DSpec kH kW stride h1 h2 hStride) (beta : α) (input : MultiChannelImage inC inH inW α) (grad_output : MultiChannelImage inC ((inH - kH) / stride + 1) ((inW - kW) / stride + 1) α) :

MultiChannelImage inC inH inW α

Multi-channel backward for smooth_max_pool2d_multi_spec (apply per channel).

Instances For

Generic N-D pooling (channels-first, no batch) #

These operators generalize the existing 2D pooling specs to an arbitrary spatial rank d.

Conventions:

Input is channels-first: shape [C] ++ spatialDims.
Pooling is applied independently per channel (like the existing 2D specs).
kernel, stride, and padding are per-axis vectors (Vector Nat d).
Padding is symmetric and uses zeros.

PyTorch comparisons (conceptual, without batch axis):

max_pool_spec corresponds to torch.nn.functional.max_poolNd.
avg_pool_spec corresponds to torch.nn.functional.avg_poolNd.

Layer configs + output shapes #

source

structure Spec.MaxPoolSpec (d : ℕ) (kernel stride padding : Vector ℕ d) (hKernel : ∀ (i : Fin d), kernel.get i ≠ 0) (hStride : ∀ (i : Fin d), stride.get i ≠ 0) :

Type

Kernel/stride/padding configuration for N-D max pooling.

kernelSizes : Vector ℕ d
Kernel sizes per spatial axis (outermost to innermost).
strideSizes : Vector ℕ d
Strides per spatial axis (outermost to innermost).
paddingSizes : Vector ℕ d
Symmetric zero padding per spatial axis (outermost to innermost).

Instances For

source

structure Spec.AvgPoolSpec (d : ℕ) (kernel stride padding : Vector ℕ d) (hKernel : ∀ (i : Fin d), kernel.get i ≠ 0) (hStride : ∀ (i : Fin d), stride.get i ≠ 0) :

Type

Kernel/stride/padding configuration for N-D average pooling.

kernelSizes : Vector ℕ d
Kernel sizes per spatial axis (outermost to innermost).
strideSizes : Vector ℕ d
Strides per spatial axis (outermost to innermost).
paddingSizes : Vector ℕ d
Symmetric zero padding per spatial axis (outermost to innermost).

Instances For

source

def Spec.poolOutSpatial {d : ℕ} (inSpatial kernel stride : Vector ℕ d) :

Vector ℕ d

"Valid" output spatial sizes (no padding): out = (in - k) / stride + 1 per axis.

Instances For

source

def Spec.poolOutSpatialPad {d : ℕ} (inSpatial kernel stride padding : Vector ℕ d) :

Vector ℕ d

Padded output spatial sizes: out = (in + 2*pad - k) / stride + 1 per axis.

Instances For

source

def Spec.poolOutShape {d : ℕ} (inSpatial kernel stride : Vector ℕ d) :

Shape

Output shape for single-channel N-D pooling (no padding).

Instances For

source

def Spec.poolMultiOutShape {d : ℕ} (inC : ℕ) (inSpatial kernel stride : Vector ℕ d) :

Shape

Output shape for channels-first N-D pooling (no padding; channels preserved).

Instances For

source

def Spec.poolOutShapePad {d : ℕ} (inSpatial kernel stride padding : Vector ℕ d) :

Shape

Output shape for single-channel N-D pooling with symmetric padding.

Instances For

source

def Spec.poolMultiOutShapePad {d : ℕ} (inC : ℕ) (inSpatial kernel stride padding : Vector ℕ d) :

Shape

Output shape for channels-first N-D pooling with symmetric padding (channels preserved).

Instances For

source

def Spec.Private.tensorOfDims {α : Type} (dims : List ℕ) (f : List ℕ → α) :

Tensor α (Shape.ofList dims)

Instances For

source

def Spec.Private.foldlIndices' {β : Type} (dims : List ℕ) (init : β) (f : β → List ℕ → β) :

Instances For

source

def Spec.Private.paddedCoords? (outIdxs winIdxs stride : List ℕ) :

Option (List ℕ)

Instances For

source

def Spec.Private.unpadCoords? (padded padding : List ℕ) :

Option (List ℕ)

Instances For

source

def Spec.Private.coordsInBounds (idx dims : List ℕ) :

Bool

Instances For

source

def Spec.Private.getPaddedAverageInputVal {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (input : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs winIdxs stride padding : List ℕ) :

Input lookup for average/smooth pooling.

For average-style pooling, padded cells contribute numeric zero and are still counted by the denominator chosen by the surrounding pooling spec. We keep this separate from getPaddedMaxInputVal?, where padded cells must be ignored rather than treated as zero.

Instances For

source

def Spec.Private.getPaddedMaxInputVal? {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (input : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs winIdxs stride padding : List ℕ) :

Option α

Input lookup for hard max-pooling.

Unlike average-pooling, max-pooling should not insert a numeric zero for padded cells: PyTorch's max-pool semantics treat padding as -∞. TorchLean keeps the spec scalar-polymorphic by returning none for padded coordinates and letting the max fold ignore them.

Instances For

source

def Spec.Private.kernelProd (kernel : List ℕ) :

ℕ

Instances For

source

def Spec.Private.maxPoolValue {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (input : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs kernel stride padding : List ℕ) :

Instances For

source

def Spec.Private.maxPoolJvpValue {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (input tangent : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs kernel stride padding : List ℕ) :

Directional derivative of hard max-pooling for one N-D window.

The derivative is taken along the same winner selected by maxPoolValue. At ties we keep the first winner in row-major order, matching the VJP convention below and PyTorch's index convention.

Instances For

source

def Spec.Private.avgPoolValue {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (input : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs kernel stride padding : List ℕ) :

Instances For

source

def Spec.Private.smoothMaxPoolValue {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (beta : α) (input : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs kernel stride padding : List ℕ) :

Instances For

source

def Spec.Private.smoothMaxPoolJvpValue {α : Type} [Context α] {d : ℕ} {inSpatial : Vector ℕ d} (beta : α) (input tangent : Tensor α (Shape.ofList inSpatial.toList)) (outIdxs kernel stride padding : List ℕ) :

Directional derivative of the smooth log-sum-exp pooling value.

For y = beta⁻¹ log Σ exp(beta*xᵢ), the directional derivative is Σ softmax(beta*xᵢ) * dxᵢ, using the same zero-padding convention as smoothMaxPoolValue.

Instances For

Forward (single-channel spatial tensor) #

source

def Spec.maxPoolSpatialSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList inSpatial.toList)) :

Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)

N-D max pooling on a spatial tensor (no explicit channel axis).

Instances For

source

def Spec.maxPoolSpatialJvpSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input tangent : Tensor α (Shape.ofList inSpatial.toList)) :

Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)

Forward-mode JVP for N-D hard max-pooling on a spatial tensor.

The derivative follows the same primal argmax as maxPoolSpatialSpec; at ties it keeps the first row-major maximizer. This is the correct directional derivative for TorchLean's chosen subgradient convention and matches the VJP tie policy.

Instances For

source

def Spec.avgPoolSpatialSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : AvgPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList inSpatial.toList)) :

Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)

N-D average pooling on a spatial tensor (no explicit channel axis).

Instances For

Backward (single-channel spatial tensor) #

These are the VJPs of the forward pooling specs above.

Conventions:

For max pooling, ties are broken by first occurrence in row-major order (same as the 2D spec).
For max pooling, padded cells are ignored, modeling PyTorch's -∞ padding without requiring a scalar-polymorphic infinity constant.
For average pooling, gradients are evenly distributed across the full kernel window (count_include_pad=true behavior when padding is present).

source

def Spec.maxPoolSpatialBackwardSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList inSpatial.toList)) (grad_output : Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)) :

Tensor α (Shape.ofList inSpatial.toList)

Backward/VJP for max_pool_spatial_spec.

Each output gradient is propagated to the argmax location in the corresponding input window. Ties keep the first position in row-major order.

Instances For

source

def Spec.avgPoolSpatialBackwardSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : AvgPoolSpec d kernel stride padding hKernel hStride) (grad_output : Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)) :

Tensor α (Shape.ofList inSpatial.toList)

Backward/VJP for avg_pool_spatial_spec (single-channel).

Each output gradient is evenly distributed across its kernel window.

Instances For

Forward (channels-first: `C × spatial...`) #

source

def Spec.maxPoolSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

N-D max pooling on a channels-first tensor: shape [C] ++ spatial.

Instances For

source

def Spec.maxPoolJvpSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input tangent : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

N-D hard max-pool JVP on a channels-first tensor (channel-wise application).

Instances For

source

def Spec.avgPoolSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : AvgPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

N-D average pooling on a channels-first tensor: shape [C] ++ spatial.

Instances For

Backward (channels-first: `C × spatial...`) #

source

def Spec.maxPoolBackwardSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) (grad_output : Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))) :

Tensor α (Shape.ofList (C :: inSpatial.toList))

Multi-channel VJP for max_pool_spec (apply spatial backward per channel).

Instances For

source

def Spec.avgPoolBackwardSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : AvgPoolSpec d kernel stride padding hKernel hStride) (grad_output : Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))) :

Tensor α (Shape.ofList (C :: inSpatial.toList))

Multi-channel VJP for avg_pool_spec (apply spatial backward per channel).

Instances For

Smooth max pooling (log-sum-exp surrogate) #

source

def Spec.smoothMaxPoolSpatialSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input : Tensor α (Shape.ofList inSpatial.toList)) :

Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)

Smooth log-sum-exp max pooling on a spatial tensor (no explicit channel axis).

Instances For

source

def Spec.smoothMaxPoolSpatialJvpSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input tangent : Tensor α (Shape.ofList inSpatial.toList)) :

Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)

Forward-mode JVP for N-D smooth max-pooling on a spatial tensor.

For the log-sum-exp surrogate this is the softmax-weighted sum of the input tangent over each window. It is the forward-mode counterpart of smoothMaxPoolSpatialBackwardSpec.

Instances For

source

def Spec.smoothMaxPoolSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

Smooth log-sum-exp max pooling on a channels-first tensor (channel-wise application).

Instances For

source

def Spec.smoothMaxPoolJvpSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input tangent : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

N-D smooth max-pool JVP on a channels-first tensor (channel-wise application).

Instances For

Smooth max pooling backward #

source

def Spec.smoothMaxPoolSpatialBackwardSpec {α : Type} [Context α] {d : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (_layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input : Tensor α (Shape.ofList inSpatial.toList)) (grad_output : Tensor α (Shape.ofList (poolOutSpatialPad inSpatial kernel stride padding).toList)) :

Tensor α (Shape.ofList inSpatial.toList)

Backward/VJP for smooth_max_pool_spatial_spec (log-sum-exp surrogate).

For a window x₁,…,xₙ, the surrogate is:

y = (1/beta) * log(∑ exp(beta*xᵢ))

and the VJP distributes upstream gradient proportionally to exp(beta*xᵢ).

Instances For

source

def Spec.smoothMaxPoolBackwardSpec {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) (grad_output : Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))) :

Tensor α (Shape.ofList (C :: inSpatial.toList))

Multi-channel VJP for smooth_max_pool_spec (apply spatial backward per channel).

Instances For

Friendly aliases #

source

@[reducible, inline]

abbrev Spec.maxPool {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

Alias for max_pool_spec.

Instances For

source

@[reducible, inline]

abbrev Spec.avgPool {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : AvgPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

Alias for avg_pool_spec.

Instances For

source

@[reducible, inline]

abbrev Spec.smoothMaxPool {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) :

Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))

Alias for smooth_max_pool_spec.

Instances For

source

@[reducible, inline]

abbrev Spec.maxPoolBackward {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) (grad_output : Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))) :

Tensor α (Shape.ofList (C :: inSpatial.toList))

Alias for max_pool_backward_spec.

Instances For

source

@[reducible, inline]

abbrev Spec.avgPoolBackward {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : AvgPoolSpec d kernel stride padding hKernel hStride) (grad_output : Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))) :

Tensor α (Shape.ofList (C :: inSpatial.toList))

Alias for avg_pool_backward_spec.

Instances For

source

@[reducible, inline]

abbrev Spec.smoothMaxPoolBackward {α : Type} [Context α] {d C : ℕ} {inSpatial kernel stride padding : Vector ℕ d} {hKernel : ∀ (i : Fin d), kernel.get i ≠ 0} {hStride : ∀ (i : Fin d), stride.get i ≠ 0} (layer : MaxPoolSpec d kernel stride padding hKernel hStride) (beta : α) (input : Tensor α (Shape.ofList (C :: inSpatial.toList))) (grad_output : Tensor α (Shape.ofList (C :: (poolOutSpatialPad inSpatial kernel stride padding).toList))) :

Tensor α (Shape.ofList (C :: inSpatial.toList))

Alias for smooth_max_pool_backward_spec.

Instances For

TorchLean API

NN.Spec.Layers.Pooling

Pooling layers (spec layer) #

Smooth max pooling #

Adaptive pooling #

Padded pooling (symmetric padding) #

Generic N-D pooling (channels-first, no batch) #

Layer configs + output shapes #

Forward (single-channel spatial tensor) #

Backward (single-channel spatial tensor) #

Forward (channels-first: `C × spatial...`) #

Backward (channels-first: `C × spatial...`) #

Smooth max pooling (log-sum-exp surrogate) #

Smooth max pooling backward #

Friendly aliases #

Pooling layers (spec layer) #

Smooth max pooling #

Adaptive pooling #

Padded pooling (symmetric padding) #

Generic N-D pooling (channels-first, no batch) #

Layer configs + output shapes #

Forward (single-channel spatial tensor) #

Backward (single-channel spatial tensor) #

Forward (channels-first: C × spatial...) #

Backward (channels-first: C × spatial...) #

Smooth max pooling (log-sum-exp surrogate) #

Smooth max pooling backward #

Friendly aliases #

Forward (channels-first: `C × spatial...`) #

Backward (channels-first: `C × spatial...`) #