Neural-network operations for the eager engine.
This file implements runtime nodes such as dropout, normalization, attention, and recurrent/sequence building blocks on top of the core tensor operation layer.
Layer normalization for (seqLen, embedDim) tensors.
This records a single node whose backward returns gradients for x, gamma, and beta.
PyTorch comparison: torch.nn.LayerNorm(embedDim) (applied per token) / functional.layer_norm.
Instances For
Batch normalization for channel-first images (C,H,W) (no batch axis).
PyTorch comparison: conceptually torch.nn.BatchNorm2d(C) / functional.batch_norm on NCHW, but
specialized here to a single image.
Instances For
Multi-head self-attention.
This is a shape-specialized attention primitive used by transformer-style models. It depends on an
optional boolean (n,n) mask and returns the attended output of shape (n,dModel).
PyTorch comparison: similar to torch.nn.MultiheadAttention / scaled dot-product attention.