TorchLean API

NN.MLTheory.CROWN.Operators.Gelu

GELU Operator Bounds #

Float-specialized GELU transfer rules for CROWN/IBP-style bound propagation.

GELU Activation (Float-specialized) #

GELU is a smooth, non-monotone activation function used in Transformers. We implement IBP and affine bounds using the tanh approximation.

Key properties:

Because GELU requires specific numerical constants (√(2/π) ≈ 0.7978845608), we provide Float-specialized implementations.

Soundness status #

The implementations below use:

These estimates are executable transfer rules, not unconditional enclosure theorems. Use them under an explicit transfer-soundness assumption or replace them with a proved relaxation when the final claim requires unconditional CROWN soundness.

References #

GELU approximation using tanh: 0.5 * x * (1 + tanh(√(2/π) * (x + 0.044715 * x³)))

Instances For

    GELU derivative approximation

    Instances For

      IBP for GELU #

      Since GELU is not monotone, we need to be careful about the bounds. For simplicity, we use sampling at endpoints and critical point.

      Approximate min/max of GELU on [l, u] (Float version).

      This helper is executable and useful for certificate experiments; it is not itself a proof that the returned interval encloses the mathematical GELU.

      Instances For

        IBP for GELU activation (Float specialized)

        Instances For

          CROWN Affine Bounds for GELU #

          For CROWN, we need linear upper and lower bounds on GELU. We use a simple secant/tangent approach:

          Relaxation parameters for GELU

          • slope_lower : Float

            Slope of the lower affine relaxation candidate.

          • bias_lower : Float

            Bias of the lower affine relaxation candidate.

          • slope_upper : Float

            Slope of the upper affine relaxation candidate.

          • bias_upper : Float

            Bias of the upper affine relaxation candidate.

          Instances For

            Compute an affine relaxation candidate for GELU on [l, u] (Float version).

            This secant-style rule is executable. A theorem that uses it as a verifier step should carry the corresponding transfer-soundness assumption.

            Instances For

              CROWN affine propagation through GELU (Float specialized)

              Instances For

                Generic versions using Context #

                For polymorphic code, we provide versions that work with any Context type by computing bounds using simpler approximations.

                ReLU-shaped GELU bound used for generic Context scalars. For x > 0: GELU(x) ≈ x For x < 0: GELU(x) ≈ 0

                The theorem using this helper must provide the relevant transfer-soundness hypothesis for the chosen scalar semantics and interval restrictions.

                Instances For

                  IBP for GELU activation (generic, conservative)

                  Instances For