Dropout analysis properties #
TorchLean splits stochastic training-mode dropout into two pieces:
- a mask/seed producer, treated as non-differentiated data in autograd proofs, and
- a deterministic tensor map once the mask or inference probability is fixed.
This file records small spec-level identities for the deterministic inference map. The fixed-mask training-mode derivative infrastructure lives with the autograd tape-node proofs.
Reference: Srivastava et al., 2014, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”.
Deterministic dropout inference scaling is the identity when p = 0.
Inference dropout multiplies activations by the keep/dropout scaling factor from the spec. At zero
dropout probability that factor is 1, so the whole tensor is unchanged.