GELU Operator Bounds #
Float-specialized GELU transfer rules for CROWN/IBP-style bound propagation.
GELU Activation (Float-specialized) #
GELU is a smooth, non-monotone activation function used in Transformers. We implement IBP and affine bounds using the tanh approximation.
Key properties:
- GELU(0) = 0
- GELU is approximately linear for large |x|
- Has a small bump near x ≈ -1.5
Because GELU requires specific numerical constants (√(2/π) ≈ 0.7978845608), we provide Float-specialized implementations.
Soundness status #
The implementations below use:
- the common tanh-based GELU approximation, and
- interval estimates based on endpoint sampling plus a fixed critical-point estimate.
These estimates are executable transfer rules, not unconditional enclosure theorems. Use them under an explicit transfer-soundness assumption or replace them with a proved relaxation when the final claim requires unconditional CROWN soundness.
References #
- Hendrycks and Gimpel, "Gaussian Error Linear Units (GELUs)", 2016: https://arxiv.org/abs/1606.08415
- The tanh approximation used here appears in the GELU paper as a fast approximation to the Gaussian CDF-based definition.
GELU approximation using tanh: 0.5 * x * (1 + tanh(√(2/π) * (x + 0.044715 * x³)))
Instances For
GELU derivative approximation
Instances For
IBP for GELU #
Since GELU is not monotone, we need to be careful about the bounds. For simplicity, we use sampling at endpoints and critical point.
Approximate min/max of GELU on [l, u] (Float version).
This helper is executable and useful for certificate experiments; it is not itself a proof that the returned interval encloses the mathematical GELU.
Instances For
IBP for GELU activation (Float specialized)
Instances For
CROWN Affine Bounds for GELU #
For CROWN, we need linear upper and lower bounds on GELU. We use a simple secant/tangent approach:
- Upper bound: secant line if concave, tangent if convex
- Lower bound: tangent if concave, secant if convex
Relaxation parameters for GELU
- slope_lower : Float
Slope of the lower affine relaxation candidate.
- bias_lower : Float
Bias of the lower affine relaxation candidate.
- slope_upper : Float
Slope of the upper affine relaxation candidate.
- bias_upper : Float
Bias of the upper affine relaxation candidate.
Instances For
Compute an affine relaxation candidate for GELU on [l, u] (Float version).
This secant-style rule is executable. A theorem that uses it as a verifier step should carry the corresponding transfer-soundness assumption.
Instances For
Compute GELU relaxation per element
Instances For
CROWN affine propagation through GELU (Float specialized)
Instances For
Generic versions using Context #
For polymorphic code, we provide versions that work with any Context type by computing bounds using simpler approximations.
ReLU-shaped GELU bound used for generic Context scalars.
For x > 0: GELU(x) ≈ x
For x < 0: GELU(x) ≈ 0
The theorem using this helper must provide the relevant transfer-soundness hypothesis for the chosen scalar semantics and interval restrictions.
Instances For
IBP for GELU activation (generic, conservative)