N-D Pooling #
Dimension-polymorphic pooling specs for spatial tensors and channels-first tensors.
Generic N-D pooling (channels-first, no batch) #
These operators generalize the existing 2D pooling specs to an arbitrary spatial rank d.
Conventions:
- Input is channels-first: shape
[C] ++ spatialDims. - Pooling is applied independently per channel (like the existing 2D specs).
kernel,stride, andpaddingare per-axis vectors (Vector Nat d).- Padding is symmetric and uses zeros.
PyTorch comparisons (conceptual, without batch axis):
max_pool_speccorresponds totorch.nn.functional.max_poolNd.avg_pool_speccorresponds totorch.nn.functional.avg_poolNd.
Layer configs + output shapes #
Kernel/stride/padding configuration for N-D max pooling.
Kernel sizes per spatial axis (outermost to innermost).
Strides per spatial axis (outermost to innermost).
Symmetric zero padding per spatial axis (outermost to innermost).
Instances For
Kernel/stride/padding configuration for N-D average pooling.
Kernel sizes per spatial axis (outermost to innermost).
Strides per spatial axis (outermost to innermost).
Symmetric zero padding per spatial axis (outermost to innermost).
Instances For
Instances For
Input lookup for average/smooth pooling.
For average-style pooling, padded cells contribute numeric zero and are still counted by the
denominator chosen by the surrounding pooling spec. We keep this separate from
getPaddedMaxInputVal?, where padded cells must be ignored rather than treated as zero.
Instances For
Input lookup for hard max-pooling.
Unlike average-pooling, max-pooling should not insert a numeric zero for padded cells: PyTorch's
max-pool semantics treat padding as -∞. TorchLean keeps the spec scalar-polymorphic by returning
none for padded coordinates and letting the max fold ignore them.
Instances For
Directional derivative of hard max-pooling for one N-D window.
The derivative is taken along the same winner selected by maxPoolValue. At ties we keep the first
winner in row-major order, matching the VJP convention below and PyTorch's index convention.
Instances For
Directional derivative of the smooth log-sum-exp pooling value.
For y = beta⁻¹ log Σ exp(beta*xᵢ), the directional derivative is
Σ softmax(beta*xᵢ) * dxᵢ, using the same zero-padding convention as smoothMaxPoolValue.
Instances For
Forward (single-channel spatial tensor) #
N-D max pooling on a spatial tensor (no explicit channel axis).
Instances For
Forward-mode JVP for N-D hard max-pooling on a spatial tensor.
The derivative follows the same primal argmax as maxPoolSpatialSpec; at ties it keeps the first
row-major maximizer. This is the correct directional derivative for TorchLean's chosen subgradient
convention and matches the VJP tie policy.
Instances For
N-D average pooling on a spatial tensor (no explicit channel axis).
Instances For
Backward (single-channel spatial tensor) #
These are the VJPs of the forward pooling specs above.
Conventions:
- For max pooling, ties are broken by first occurrence in row-major order (same as the 2D spec).
- For max pooling, padded cells are ignored, modeling PyTorch's
-∞padding without requiring a scalar-polymorphic infinity constant. - For average pooling, gradients are evenly distributed across the full kernel window
(
count_include_pad=truebehavior when padding is present).
Backward/VJP for max_pool_spatial_spec.
Each output gradient is propagated to the argmax location in the corresponding input window. Ties keep the first position in row-major order.
Instances For
Backward/VJP for avg_pool_spatial_spec (single-channel).
Each output gradient is evenly distributed across its kernel window.
Instances For
Forward (channels-first: C × spatial...) #
N-D max pooling on a channels-first tensor: shape [C] ++ spatial.
Instances For
N-D hard max-pool JVP on a channels-first tensor (channel-wise application).
Instances For
N-D average pooling on a channels-first tensor: shape [C] ++ spatial.
Instances For
Backward (channels-first: C × spatial...) #
Multi-channel VJP for max_pool_spec (apply spatial backward per channel).
Instances For
Multi-channel VJP for avg_pool_spec (apply spatial backward per channel).
Instances For
Smooth max pooling (log-sum-exp surrogate) #
Smooth log-sum-exp max pooling on a spatial tensor (no explicit channel axis).
Instances For
Forward-mode JVP for N-D smooth max-pooling on a spatial tensor.
For the log-sum-exp surrogate this is the softmax-weighted sum of the input tangent over each
window. It is the forward-mode counterpart of smoothMaxPoolSpatialBackwardSpec.
Instances For
Smooth log-sum-exp max pooling on a channels-first tensor (channel-wise application).
Instances For
N-D smooth max-pool JVP on a channels-first tensor (channel-wise application).
Instances For
Smooth max pooling backward #
Backward/VJP for smooth_max_pool_spatial_spec (log-sum-exp surrogate).
For a window x₁,…,xₙ, the surrogate is:
y = (1/beta) * log(∑ exp(beta*xᵢ))
and the VJP distributes upstream gradient proportionally to exp(beta*xᵢ).
Instances For
Multi-channel VJP for smooth_max_pool_spec (apply spatial backward per channel).