TorchLean API

NN.Proofs.RuntimeApprox.NF.ConvBackward

Conv2D Backward Approximation #

NF (rounded) backend: Conv2D backward (VJP) runtime→spec approximation.

This file proves soundness of explicit bounds for the three Conv2D gradients computed by Spec.conv2d_backward_spec:

The file is intentionally explicit because each gradient has a different nested-indexing pattern. The important public objects are the tensor-level bounds (conv2d*BoundTensor), the approximation theorems (approxT_conv2d_*_deriv_spec), and conv2dRevNode, which packages Conv2D as a RevNode so it composes via RevGraph.backprop_approx.

PyTorch analogue: these are the VJP/gradient computations produced by Autograd for Conv2D. https://pytorch.org/docs/stable/autograd.html https://pytorch.org/docs/stable/generated/torch.nn.functional.conv2d.html

Map of this file #

References #

theorem Proofs.RuntimeApprox.NFBackend.entry_eq_scalar_get_at_or_zero4 {α : Type} [Zero α] {n1 n2 n3 n4 : } (t : Spec.Tensor α (Spec.Shape.dim n1 (Spec.Shape.dim n2 (Spec.Shape.dim n3 (Spec.Shape.dim n4 Spec.Shape.scalar))))) (i1 : Fin n1) (i2 : Fin n2) (i3 : Fin n3) (i4 : Fin n4) :
(match match match match t with | Spec.Tensor.dim f => f i1 with | Spec.Tensor.dim g => g i2 with | Spec.Tensor.dim h => h i3 with | Spec.Tensor.dim k => k i4) = Spec.Tensor.scalar (Spec.getAtOrZero t [i1, i2, i3, i4])
noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dBiasPointBound {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } {outC kH kW stride padding inH inW : } (δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)) (epsδ : ) (out_ch : Fin outC) :

Pointwise error bound for the Conv2D bias gradient (NF runtime vs spec).

This bound is a replay of the bias-gradient summation with per-term error epsδ coming from the grad_output approximation hypothesis.

Instances For
    theorem Proofs.RuntimeApprox.NFBackend.approx_conv2d_bias_point {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} {kernelS : Spec.Tensor (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {biasS : Spec.Tensor (Spec.Shape.dim outC Spec.Shape.scalar)} {biasR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC Spec.Shape.scalar)} {inputS : Spec.MultiChannelImage inC inH inW } {inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)} {δS : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) } {δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)} {epsδ : } ( : approxT toSpec δS δR epsδ) (out_ch : Fin outC) :
    have layerS := { kernel := kernelS, bias := biasS }; have layerR := { kernel := kernelR, bias := biasR }; |toSpec (Spec.getAtOrZero (Spec.conv2dBiasDerivSpec layerR inputR δR) [out_ch]) - Spec.getAtOrZero (Spec.conv2dBiasDerivSpec layerS inputS δS) [out_ch]| conv2dBiasPointBound δR epsδ out_ch

    Soundness of the Conv2D bias-gradient pointwise bound.

    Given approxT for grad_output, this shows the spec bias-gradient entry is approximated by the NF runtime entry within conv2dBiasPointBound.

    noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dBiasBoundTensor {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } {outC kH kW stride padding inH inW : } (δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)) (epsδ : ) :

    Tensor-shaped bias-gradient bound.

    This packages conv2dBiasPointBound into a Tensor so later approxT statements can use linfNorm to obtain a single scalar error bound.

    Instances For
      noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dKernelPointBound {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } {inC outC kH kW stride padding inH inW : } (inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)) (δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)) (epsX epsδ : ) (out_ch : Fin outC) (in_ch : Fin inC) (di : Fin kH) (dj : Fin kW) :

      Pointwise error bound for the Conv2D kernel gradient (NF runtime vs spec).

      The kernel-gradient entry accumulates products of padded input values and upstream gradients. The bound is a replay of this accumulation with per-term errors derived from epsX and epsδ.

      Instances For
        theorem Proofs.RuntimeApprox.NFBackend.approx_conv2d_kernel_point {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} {kernelS : Spec.Tensor (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {biasS : Spec.Tensor (Spec.Shape.dim outC Spec.Shape.scalar)} {biasR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC Spec.Shape.scalar)} {inputS : Spec.MultiChannelImage inC inH inW } {inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)} {δS : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) } {δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)} {epsX epsδ : } (hX : approxT toSpec inputS inputR epsX) ( : approxT toSpec δS δR epsδ) (out_ch : Fin outC) (in_ch : Fin inC) (di : Fin kH) (dj : Fin kW) :
        have layerS := { kernel := kernelS, bias := biasS }; have layerR := { kernel := kernelR, bias := biasR }; |toSpec (Spec.getAtOrZero (Spec.conv2dKernelDerivSpec layerR inputR δR) [out_ch, in_ch, di, dj]) - Spec.getAtOrZero (Spec.conv2dKernelDerivSpec layerS inputS δS) [out_ch, in_ch, di, dj]| conv2dKernelPointBound inputR δR epsX epsδ out_ch in_ch di dj

        Soundness of the Conv2D kernel-gradient pointwise bound.

        Given approxT hypotheses for the input and upstream gradient (grad_output), this shows the spec kernel-gradient entry is approximated by the NF runtime entry within conv2dKernelPointBound.

        noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dKernelBoundTensor {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } {inC outC kH kW stride padding inH inW : } (inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)) (δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)) (epsX epsδ : ) :

        Tensor-shaped kernel-gradient bound.

        This packages conv2dKernelPointBound into the full 4D kernel-tensor shape so later approxT lemmas can use a single global bound via linfNorm.

        Instances For
          theorem Proofs.RuntimeApprox.NFBackend.approxT_conv2d_bias_deriv_spec {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} {kernelS : Spec.Tensor (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {biasS : Spec.Tensor (Spec.Shape.dim outC Spec.Shape.scalar)} {biasR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC Spec.Shape.scalar)} {inputS : Spec.MultiChannelImage inC inH inW } {inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)} {δS : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) } {δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)} {epsδ : } ( : approxT toSpec δS δR epsδ) :
          have layerS := { kernel := kernelS, bias := biasS }; have layerR := { kernel := kernelR, bias := biasR }; have outS := Spec.conv2dBiasDerivSpec layerS inputS δS; have outR := Spec.conv2dBiasDerivSpec layerR inputR δR; have bT := conv2dBiasBoundTensor δR epsδ; approxT toSpec outS outR (linfNorm bT)

          Tensor-level approxT bound for the Conv2D bias gradient.

          This lifts approx_conv2d_bias_point entrywise and packages the error into linfNorm (conv2dBiasBoundTensor ...).

          theorem Proofs.RuntimeApprox.NFBackend.approxT_conv2d_kernel_deriv_spec {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} {kernelS : Spec.Tensor (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {biasS : Spec.Tensor (Spec.Shape.dim outC Spec.Shape.scalar)} {biasR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC Spec.Shape.scalar)} {inputS : Spec.MultiChannelImage inC inH inW } {inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)} {δS : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) } {δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)} {epsX epsδ : } (hX : approxT toSpec inputS inputR epsX) ( : approxT toSpec δS δR epsδ) :
          have layerS := { kernel := kernelS, bias := biasS }; have layerR := { kernel := kernelR, bias := biasR }; have outS := Spec.conv2dKernelDerivSpec layerS inputS δS; have outR := Spec.conv2dKernelDerivSpec layerR inputR δR; have bT := conv2dKernelBoundTensor inputR δR epsX epsδ; approxT toSpec outS outR (linfNorm bT)

          Tensor-level approxT bound for the Conv2D kernel gradient.

          This lifts approx_conv2d_kernel_point entrywise and packages the error into linfNorm (conv2dKernelBoundTensor ...).

          noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dInputPointBound {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } {inC outC kH kW stride padding inH inW : } (kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))) (δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)) (epsK epsδ : ) (in_ch : Fin inC) (i : Fin inH) (j : Fin inW) :

          Pointwise error bound for the Conv2D input gradient (NF runtime vs spec).

          The input-gradient entry accumulates contributions from all output channels and spatial positions that “hit” the given input coordinate under the stride/padding relation. The bound is a replay of that accumulation with per-term errors derived from epsK and epsδ.

          Instances For
            noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dInputBoundTensor {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } {inC outC kH kW stride padding inH inW : } (kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))) (δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)) (epsK epsδ : ) :

            Tensor-shaped input-gradient bound.

            This packages conv2dInputPointBound into the full input image shape so later approxT lemmas can use a single global bound via linfNorm.

            Instances For
              theorem Proofs.RuntimeApprox.NFBackend.approx_conv2d_input_point {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} {kernelS : Spec.Tensor (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {biasS : Spec.Tensor (Spec.Shape.dim outC Spec.Shape.scalar)} {biasR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC Spec.Shape.scalar)} {inputS : Spec.MultiChannelImage inC inH inW } {inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)} {δS : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) } {δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)} {epsK epsδ : } (hK : approxT toSpec kernelS kernelR epsK) ( : approxT toSpec δS δR epsδ) (in_ch : Fin inC) (i : Fin inH) (j : Fin inW) :
              have layerS := { kernel := kernelS, bias := biasS }; have layerR := { kernel := kernelR, bias := biasR }; |toSpec (Spec.getAtOrZero (Spec.conv2dInputDerivSpec layerR inputR δR) [in_ch, i, j]) - Spec.getAtOrZero (Spec.conv2dInputDerivSpec layerS inputS δS) [in_ch, i, j]| conv2dInputPointBound kernelR δR epsK epsδ in_ch i j

              Soundness of the Conv2D input-gradient pointwise bound.

              Given approxT hypotheses for the kernel and upstream gradient (grad_output), this shows the spec input-gradient entry is approximated by the NF runtime entry within conv2dInputPointBound.

              theorem Proofs.RuntimeApprox.NFBackend.approxT_conv2d_input_deriv_spec {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} {kernelS : Spec.Tensor (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {kernelR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))} {biasS : Spec.Tensor (Spec.Shape.dim outC Spec.Shape.scalar)} {biasR : Spec.Tensor (TorchLean.Floats.NF β fexp rnd) (Spec.Shape.dim outC Spec.Shape.scalar)} {inputS : Spec.MultiChannelImage inC inH inW } {inputR : Spec.MultiChannelImage inC inH inW (TorchLean.Floats.NF β fexp rnd)} {δS : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) } {δR : Spec.MultiChannelImage outC (conv2dOutH inH kH stride padding) (conv2dOutW inW kW stride padding) (TorchLean.Floats.NF β fexp rnd)} {epsK epsδ : } (hK : approxT toSpec kernelS kernelR epsK) ( : approxT toSpec δS δR epsδ) :
              have layerS := { kernel := kernelS, bias := biasS }; have layerR := { kernel := kernelR, bias := biasR }; have outS := Spec.conv2dInputDerivSpec layerS inputS δS; have outR := Spec.conv2dInputDerivSpec layerR inputR δR; have bT := conv2dInputBoundTensor kernelR δR epsK epsδ; approxT toSpec outS outR (linfNorm bT)

              Tensor-level approxT bound for the Conv2D input gradient.

              This lifts approx_conv2d_input_point entrywise and packages the error into linfNorm (conv2dInputBoundTensor ...).

              theorem Proofs.RuntimeApprox.NFBackend.idx_i_ne_of_shape_ne {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} (a : Idx Γ s₁) (b : Idx Γ s₂) (hs : s₁ s₂) :
              a.i b.i
              noncomputable def Proofs.RuntimeApprox.NFBackend.conv2dRevNode {β : TorchLean.Floats.NeuralRadix} {fexp : } [TorchLean.Floats.NeuralValidExp fexp] {rnd : } [TorchLean.Floats.NeuralValidRndToNearest rnd] {Γ : List Spec.Shape} {inC outC kH kW stride padding inH inW : } {h1 : inC 0} {h2 : kH 0} {h3 : kW 0} (kernelIdx : Idx Γ (Spec.Shape.dim outC (Spec.Shape.dim inC (Spec.Shape.dim kH (Spec.Shape.dim kW Spec.Shape.scalar))))) (biasIdx : Idx Γ (Spec.Shape.dim outC Spec.Shape.scalar)) (inputIdx : Idx Γ (Spec.Shape.dim inC (Spec.Shape.dim inH (Spec.Shape.dim inW Spec.Shape.scalar)))) :
              RevNode toSpec Γ (Spec.Shape.dim outC (Spec.Shape.dim (conv2dOutH inH kH stride padding) (Spec.Shape.dim (conv2dOutW inW kW stride padding) Spec.Shape.scalar)))

              Package Conv2D forward + backward (VJP) as a RevNode for RevGraph.backprop_approx.

              The node uses the forward bound from conv2d_forward and the three gradient bounds proved in this file (kernel/bias/input) to provide a compositional reverse-mode approximation theorem.

              Instances For