Dropout (deterministic spec) #
Dropout is traditionally randomized: each element is kept with probability keep = 1 - p.
In this repository we often want a deterministic spec that still documents the intended meaning,
so downstream models can choose explicit inference-time or mask-driven dropout semantics.
We therefore expose two simple, deterministic variants:
dropout_inference_spec p x = (1 - p) * xThis is a deterministic "shrink activations" surrogate. It corresponds to the expected value of non-inverted dropout training (y = mask * xwithE[mask] = keep).dropout_masked_spec p mask xA fully deterministic "training-style" dropout that takes the mask explicitly. We use safe scaling bymax(keep, ε)so it is always defined even ifp ≈ 1.
How this differs from PyTorch:
torch.nn.Dropout(p)uses inverted dropout during training:y = mask * x / (1 - p), and becomes identity during evaluation (y = x).- The spec layer here avoids randomness. If you want something close to PyTorch training
semantics,
use
dropout_masked_specand pass the mask explicitly. If you want a cheap deterministic effect, usedropout_inference_spec.
Gradients:
- We treat
pandmaskas non-differentiable inputs. The backward specs only return the gradient with respect tox.
Deterministic "dropout-like" scaling: y = keep * x with keep = 1 - p.
This is not PyTorch's eval() behavior for nn.Dropout (which is the identity under inverted
dropout). We keep this around because it is a simple deterministic knob that many specs use.
Instances For
Deterministic training-style dropout with an explicit mask.
If mask[i] = true, keep element x[i], otherwise drop it to 0.
We use inverted-dropout scaling x / keepSafe with keepSafe = max(1 - p, ε).
Instances For
Backward/VJP for dropout_masked_spec with respect to x.
This mirrors the forward: gradients are masked and (in the kept positions) scaled by 1/keepSafe.