TorchLean API

NN.Runtime.Autograd.Engine.Core.Neural

Neural-network operations for the eager engine.

This file implements runtime nodes such as dropout, normalization, attention, and recurrent/sequence building blocks on top of the core tensor operation layer.

def Runtime.Autograd.Tape.layerNorm {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [DecidableEq Spec.Shape] {seqLen embedDim : } (h_seq_pos : seqLen > 0) (h_embed_pos : embedDim > 0) (t : Tape α) (xId gammaId betaId : ) :

Layer normalization for (seqLen, embedDim) tensors.

This records a single node whose backward returns gradients for x, gamma, and beta. PyTorch comparison: torch.nn.LayerNorm(embedDim) (applied per token) / functional.layer_norm.

Instances For
    def Runtime.Autograd.Tape.batchnormChannelFirst {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [DecidableEq Spec.Shape] {channels height width : } (h_c : channels > 0) (h_h : height > 0) (h_w : width > 0) (t : Tape α) (xId gammaId betaId : ) :

    Batch normalization for channel-first images (C,H,W) (no batch axis).

    PyTorch comparison: conceptually torch.nn.BatchNorm2d(C) / functional.batch_norm on NCHW, but specialized here to a single image.

    Instances For
      def Runtime.Autograd.Tape.multiHeadAttention {α : Type} [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [DecidableEq Spec.Shape] {n numHeads dModel headDim : } (h1 : n 0) (t : Tape α) (wqId wkId wvId woId xId : ) (mask : Option (Spec.Tensor Bool (Spec.Shape.dim n (Spec.Shape.dim n Spec.Shape.scalar))) := none) :

      Multi-head self-attention.

      This is a shape-specialized attention primitive used by transformer-style models. It depends on an optional boolean (n,n) mask and returns the attended output of shape (n,dModel).

      PyTorch comparison: similar to torch.nn.MultiheadAttention / scaled dot-product attention.

      Instances For