NN.Runtime.Autograd.Engine.Core.Neural

Neural-network operations for the eager engine.

This file implements runtime nodes such as dropout, normalization, attention, and recurrent/sequence building blocks on top of the core tensor operation layer.

source

def Runtime.Autograd.Tape.layerNorm {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [DecidableEq Spec.Shape] {seqLen embedDim : ℕ} (h_seq_pos : seqLen > 0) (h_embed_pos : embedDim > 0) (t : Tape α) (xId gammaId betaId : ℕ) :

Result (Tape α × ℕ)

Layer normalization for (seqLen, embedDim) tensors.

This records a single node whose backward returns gradients for x, gamma, and beta. PyTorch comparison: torch.nn.LayerNorm(embedDim) (applied per token) / functional.layer_norm.

Instances For

source

def Runtime.Autograd.Tape.batchnormChannelFirst {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [DecidableEq Spec.Shape] {channels height width : ℕ} (h_c : channels > 0) (h_h : height > 0) (h_w : width > 0) (t : Tape α) (xId gammaId betaId : ℕ) :

Result (Tape α × ℕ)

Batch normalization for channel-first images (C,H,W) (no batch axis).

PyTorch comparison: conceptually torch.nn.BatchNorm2d(C) / functional.batch_norm on NCHW, but specialized here to a single image.

Instances For

source

def Runtime.Autograd.Tape.multiHeadAttention {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [DecidableEq Spec.Shape] {n numHeads dModel headDim : ℕ} (h1 : n ≠ 0) (t : Tape α) (wqId wkId wvId woId xId : ℕ) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

Result (Tape α × ℕ)

Multi-head self-attention.

This is a shape-specialized attention primitive used by transformer-style models. It depends on an optional boolean (n,n) mask and returns the attended output of shape (n,dModel).

PyTorch comparison: similar to torch.nn.MultiheadAttention / scaled dot-product attention.

Instances For