Reductions, flatten/unflatten, and shape-changing helpers #

This module is where “shape-aware” operations live:

flattenSpec / unflattenSpec (convert between a tensor and a flat vector of length Shape.size)
broadcasting maps that change the output shape

Because shapes are indexed in types, many of these definitions necessarily carry equalities like Shape.size s = ... under the hood.

Tip: when you need to transport a tensor across a proved shape equality, use:

Tensor.cast_shape (defined in NN/Spec/Core/Tensor/Core.lean)

Prefer abbrevs in user-facing code so common shape equalities remain definitional rather than requiring transport proofs.

PyTorch mental model:

flattenSpec / unflattenSpec correspond to torch.flatten and view/reshape on a contiguous tensor.
broadcasting (broadcastTo / broadcastMapTo) corresponds to expand/broadcast_to plus elementwise ops.
reductions (reduceSum, reduceMean, reduceVar, reduceMax, and reduceMin) correspond to sum/mean/var/amax/amin along a chosen axis.

The difference is that our shapes live in types, so the spec definitions must be explicit about:

what the target/output shape is,
and why the axis is valid / reducible.

Naming note (sequence concatenation):

This file defines Spec.Tensor.concatSequenceSpec for concatenating along the time axis (axis 0), producing a longer sequence.
NN.Spec.Core.Sequence defines Spec.concatSequenceSpec for concatenating along the feature axis (inner axis) for same-length sequences. The names are intentionally similar, but they are different operations living in different namespaces (Spec.Tensor vs Spec).

References / analogies (shape intuition, not semantics):

PyTorch torch.flatten: https://pytorch.org/docs/stable/generated/torch.flatten.html
PyTorch torch.Tensor.reshape: https://pytorch.org/docs/stable/generated/torch.Tensor.reshape.html
PyTorch torch.Tensor.view: https://pytorch.org/docs/stable/generated/torch.Tensor.view.html
PyTorch torch.Tensor.expand: https://pytorch.org/docs/stable/generated/torch.Tensor.expand.html
PyTorch torch.sum: https://pytorch.org/docs/stable/generated/torch.sum.html
PyTorch torch.mean: https://pytorch.org/docs/stable/generated/torch.mean.html

source

def Spec.Tensor.flattenSpec {α : Type} [Inhabited α] {s : Shape} :

Tensor α s → Tensor α (Shape.dim s.size Shape.scalar)

Flatten a tensor into a 1‑D vector (length = Shape.size s).

The order is outermost‑dimension major (row‑major w.r.t. the shape tree). For proofs, the key invariant is that the output length matches Shape.size.

Why this exists: a lot of shape-changing ops are easiest to specify as "flatten, then rebuild", and this is also the bridge we use for some runtime interop where we want a plain sequence of scalars (e.g. importing weights or serializing test vectors).

Instances For

source

def Spec.Tensor.unflattenSpec {α : Type} [Inhabited α] (s : Shape) :

Tensor α (Shape.dim s.size Shape.scalar) → Tensor α s

Unflatten a 1‑D vector back into a tensor of a given shape.

PyTorch analogy: flat.view(shape) (assuming the element count matches). This is the inverse of flattenSpec up to the ordering convention.

Instances For

`flattenSpec` / `unflattenSpec` round-trip lemmas #

These are shape-transport facts: they justify treating flattenSpec/unflattenSpec like reshape/view in PyTorch, provided you keep the element count consistent.

PyTorch references:

torch.flatten: https://pytorch.org/docs/stable/generated/torch.flatten.html
Tensor.view / torch.reshape: https://pytorch.org/docs/stable/generated/torch.Tensor.view.html
torch.reshape: https://pytorch.org/docs/stable/generated/torch.reshape.html

Important nuance:

PyTorch allows zero-sized dimensions, and its reshape/flatten semantics remain total.
Our spec definitions are also total (they use Inhabited.default for unreachable branches), which keeps everything executable, but can make “inverse” proofs a bit index-heavy. The theorems below show that the round-trips do work for the spec definitions as written.

If a shape has Shape.size s = 0, then it contains no scalar leaves (it has a 0-length dimension somewhere). In that case, there is essentially only one possible tensor value of shape s (up to definitional equality), because at the 0-length dimension the indexing function has domain Fin 0.

We use this as a “vacuity” lemma to avoid needing division/modulo arithmetic when Shape.size s = 0.

source

theorem Spec.Tensor.flatten_unflatten_inverse {α : Type} [Inhabited α] {s : Shape} (t : Tensor α s) :

unflattenSpec s t.flattenSpec = t

Round-trip unflatten ∘ flatten = id.

This is the spec-layer analogue of reshape/view round-tripping in PyTorch when the element count matches.

source

theorem Spec.Tensor.unflatten_flatten_inverse {α : Type} [Inhabited α] {s : Shape} (v : Tensor α (Shape.dim s.size Shape.scalar)) :

(unflattenSpec s v).flattenSpec = v

Round-trip flatten ∘ unflatten = id.

This is the spec-layer analogue of flattening a reshaped/viewed tensor in PyTorch.

source

theorem Spec.Tensor.flatten_unflatten_inverse_wf {α : Type} [Inhabited α] {s : Shape} [s.WellFormed] (t : Tensor α s) :

unflattenSpec s t.flattenSpec = t

Convenience corollary: the unflatten ∘ flatten round-trip in the common well-formed regime.

source

def Spec.Tensor.reshapeSpec {α : Type} [Inhabited α] {s₁ s₂ : Shape} (t : Tensor α s₁) (h : s₁.size = s₂.size) :

Tensor α s₂

Reshape a tensor, given a proof that the number of elements matches.

Instances For

source

def Spec.Tensor.reshapeExplicitSpec {α : Type} [Inhabited α] {s₁ s₂ : Shape} (t : Tensor α s₁) (h : s₁.size = s₂.size) :

Tensor α s₂

Reshape with an explicit equality rewrite (sometimes easier for the elaborator).

Instances For

source

def Spec.Tensor.sequenceFin {α : Type} {s : Shape} {n : ℕ} (f : Fin n → Option (Tensor α s)) :

Option (Tensor α (Shape.dim n s))

Given a partial function Fin n → Option (Tensor α s), build a tensor if all succeed.

Instances For

source

def Spec.Tensor.broadcastFill {α : Type} [Inhabited α] (s : Shape) :

α → Tensor α s

Build a tensor filled with a constant, without using fill (used in broadcasts).

Instances For

Broadcasting #

source

def Spec.Tensor.broadcastTo {α : Type} [Inhabited α] {s₁ s₂ : Shape} :

s₁.CanBroadcastTo s₂ → Tensor α s₁ → Tensor α s₂

Broadcast a tensor along a Shape.CanBroadcastTo proof (spec-level analogue of torch.broadcast_to).

Instances For

Broadcasted maps #

source

def Spec.Tensor.broadcastLike {α : Type} [Inhabited α] {s : Shape} (_template : Tensor α s) (t : Tensor α Shape.scalar) :

Tensor α s

Broadcast a scalar tensor to match a template tensor's shape.

This is a small convenience wrapper used by specs that want "like" broadcasting without spelling out the Shape.CanBroadcastTo evidence.

Instances For

source

def Spec.Tensor.mapScalarLeft {α : Type} (f : α → α → α) (x : α) {s : Shape} :

Tensor α s → Tensor α s

Helper: map a scalar on the left over any tensor shape.

Instances For

source

def Spec.Tensor.mapScalarRight {α : Type} (f : α → α → α) (y : α) {s : Shape} :

Tensor α s → Tensor α s

Helper: map a scalar on the right over any tensor shape.

Instances For

source

def Spec.Tensor.broadcastMapTo {α : Type} [Inhabited α] (f : α → α → α) {s₁ s₂ t : Shape} (cbx : s₁.CanBroadcastTo t) (cby : s₂.CanBroadcastTo t) :

Tensor α s₁ → Tensor α s₂ → Tensor α t

Binary element-wise operation with broadcasting to an explicit target shape.

This is the helper you typically want in spec code:

pick the output shape t,
broadcast each operand to t,
then map2_spec the pointwise operation.

PyTorch analogy: f(x, y) where x and/or y are broadcastable to a common shape. We make the common shape explicit instead of "discovering" it, because at the spec layer we want:

predictable typing,
a single source of truth for what the output shape is.

Instances For

Reductions #

source

@[irreducible]

def Spec.Tensor.tensorFoldlSpec {α β : Type} (f : β → α → β) (init : β) {s : Shape} :

Tensor α s → β

Left fold over all tensor elements.

Instances For

source

@[irreducible]

def Spec.Tensor.tensorFoldlSpec.go {α β : Type} (f : β → α → β) (n : ℕ) (s : Shape) (values : Fin n → Tensor α s) (i : ℕ) (acc : β) :

Instances For

source

@[irreducible]

def Spec.Tensor.tensorFoldrSpec {α β : Type} (f : α → β → β) (init : β) {s : Shape} :

Tensor α s → β

Right fold over all tensor elements.

Instances For

source

@[irreducible]

def Spec.Tensor.tensorFoldrSpec.go {α β : Type} (f : α → β → β) (n : ℕ) (s : Shape) (values : Fin n → Tensor α s) (i : ℕ) (acc : β) :

Instances For

source

def Spec.Tensor.sumSpec {α : Type} [Add α] [Zero α] {s : Shape} (t : Tensor α s) :

Sum all elements of a tensor.

Instances For

source

def Spec.Tensor.prodSpec {α : Type} [Context α] {s : Shape} (t : Tensor α s) :

Product of all elements of a tensor.

Instances For

source

@[reducible, inline]

abbrev Spec.Tensor.productSpec {α : Type} [Context α] {s : Shape} (t : Tensor α s) :

Short name for prodSpec.

Instances For

source

def Spec.Tensor.countSpec {α : Type} {s : Shape} (t : Tensor α s) :

ℕ

Count the number of scalar entries in a tensor (= Shape.size).

Instances For

source

def Spec.Tensor.anySpec {α : Type} {s : Shape} (p : α → Bool) (t : Tensor α s) :

Bool

true if any entry satisfies p.

Instances For

source

def Spec.Tensor.allSpec {α : Type} {s : Shape} (p : α → Bool) (t : Tensor α s) :

Bool

true if all entries satisfy p.

Instances For

source

def Spec.Tensor.dotSpec {α : Type} [Context α] {s : Shape} (a b : Tensor α s) :

Dot product: sum (a ⊙ b).

Instances For

source

def Spec.Tensor.meanSpec {α : Type} [Context α] {s : Shape} :

Tensor α s → α

Mean of all elements (treats nested dims as one big collection).

Instances For

source

def Spec.Tensor.varianceSpec {α : Type} [Context α] {s : Shape} :

Tensor α s → α

Variance of all elements (population variance, divides by n).

Instances For

source

def Spec.Tensor.shapeAfterSum :

Shape → ℕ → Shape

Output shape after summing along axis (drops that dimension).

Instances For

source

@[simp]

theorem Spec.Tensor.shape_after_sum_dim_1 (nQ nK : ℕ) :

shapeAfterSum (Shape.dim (nQ + 1) (Shape.dim (nK + 1) Shape.scalar)) 1 = Shape.dim (nQ + 1) Shape.scalar

simp lemma: dropping axis 1 from a 2D (nQ+1)×(nK+1) shape yields (nQ+1).

source

@[simp]

theorem Spec.Tensor.shape_after_sum_dim_1_alt (nQ nK : ℕ) :

shapeAfterSum (Shape.dim nQ (Shape.dim nK Shape.scalar)) 1 = Shape.dim nQ Shape.scalar

simp lemma: dropping axis 1 from a 2D nQ×nK shape yields nQ.

source

@[simp]

theorem Spec.Tensor.shape_after_sum_dim_3_alt (b h w c : ℕ) :

shapeAfterSum (Shape.dim b (Shape.dim h (Shape.dim w (Shape.dim c Shape.scalar)))) 3 = Shape.dim b (Shape.dim h (Shape.dim w Shape.scalar))

simp lemma: dropping axis 3 from a 4D b×h×w×c shape yields b×h×w.

source

@[simp]

theorem Spec.Tensor.shape_after_sum_zero {n : ℕ} {s : Shape} :

shapeAfterSum (Shape.dim (n + 1) s) 0 = s

simp lemma: dropping axis 0 from a positive .dim (n+1) s yields s.

source

@[simp]

theorem Spec.Tensor.shape_after_sum_succ {n : ℕ} {s : Shape} {k : ℕ} :

shapeAfterSum (Shape.dim (n + 1) s) (k + 1) = Shape.dim (n + 1) (shapeAfterSum s k)

simp lemma: dropping axis k+1 recurses into the tail shape.

source

@[simp]

theorem Spec.Tensor.shape_after_sum_twice_zero {kH kW : ℕ} :

shapeAfterSum (Shape.dim (kH + 1) (Shape.dim (kW + 1) Shape.scalar)) 0 = Shape.dim (kW + 1) Shape.scalar

simp lemma: dropping axis 0 from a 2D (kH+1)×(kW+1) yields (kW+1).

source

@[simp]

theorem Spec.Tensor.shape_after_sum_zero_alt (n : ℕ) (inner : Shape) :

shapeAfterSum (Shape.dim n inner) 0 = inner

simp lemma: dropping axis 0 from .dim n inner yields inner (even when n=0).

source

def Spec.Tensor.canBroadcastToRefl (s : Shape) :

s.CanBroadcastTo s

Reflexive broadcast proof (s can broadcast to itself).

Instances For

source

def Spec.Tensor.shapeAfterSumBroadcastBack {s : Shape} (dim : ℕ) (valid : Shape.valid_axis_inst dim s) (wf : s.WellFormed) :

(shapeAfterSum s dim).CanBroadcastTo s

Build a broadcast proof from the reduced shape back to the original shape.

We use this when a backward pass computes something in the reduced shape (e.g. a mean/variance) and we need to broadcast it back to match the original tensor shape.

Instances For

source

def Spec.Tensor.reduceFirstDim {α : Type} {innerShape : Shape} {n : ℕ} (f : {sliceShape : Shape} → Tensor α sliceShape → α) (t : Tensor α (Shape.dim n innerShape)) :

Tensor α innerShape

Reduce a tensor of shape (n, innerShape) by applying f across the first axis.

This is the basic “reduce over axis 0” primitive that we reuse to implement broadcast-adjoints and multi-axis reducers.

Instances For

Reduce a gradient from a broadcast target shape back to the original input shape.

This is the adjoint of broadcastTo for sum-reduction: broadcast duplicates values, so the backward pass sums contributions across broadcasted dimensions.

PyTorch analogy: this is the logic behind "sum over broadcasted dimensions" that happens in autograd for expand + elementwise ops.

source

def Spec.Tensor.reduceFromBroadcastTo {α : Type} [Add α] [Zero α] {s₁ s₂ : Shape} :

s₁.CanBroadcastTo s₂ → Tensor α s₂ → Tensor α s₁

Adjoint of broadcastTo under sum-reduction: collapse broadcasted dimensions by summing.

Instances For

source

def Spec.Tensor.reduceDim {α : Type} {s : Shape} (f : {sliceShape : Shape} → Tensor α sliceShape → α) (axis : ℕ) (x : Tensor α s) (_h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Generic reduction along a (provably reducible) axis.

reduce_dim f axis x applies f to the slices along axis, and returns a tensor whose shape is shape_after_sum s axis (i.e. that axis is dropped).

Instances For

source

def Spec.Tensor.reduceDim.aux {α : Type} (f : {sliceShape : Shape} → Tensor α sliceShape → α) {inShape outShape : Shape} (axisAdjusted : ℕ) (h_eq : outShape = shapeAfterSum inShape axisAdjusted) (t : Tensor α inShape) :

Tensor α outShape

Instances For

source

def Spec.Tensor.reduceSum {α : Type} [Add α] [Zero α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Sum-reduction along a given axis.

Instances For

source

def Spec.Tensor.reduceSumAuto {α : Type} [Add α] [Zero α] {s : Shape} (axis : ℕ) [h : Shape.valid_axis_inst axis s] (t : Tensor α s) :

Tensor α (shapeAfterSum s axis)

Sum-reduction along axis, with axis validity inferred via valid_axis_inst.

Instances For

source

def Spec.Tensor.reduceProd {α : Type} [Context α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Product-reduction along a given axis.

Instances For

source

def Spec.Tensor.reduceProdAuto {α : Type} [Context α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.valid_axis axis s) :

Tensor α (shapeAfterSum s axis)

Product-reduction along axis when you already have a valid_axis proof.

Instances For

source

def Spec.Tensor.getDimSize :

Shape → ℕ → Option ℕ

Get the runtime size of the k-th dimension (0-based), if it exists.

Instances For

source

def Spec.Tensor.reduceMean {α : Type} [Context α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Mean-reduction along a given axis.

Instances For

source

def Spec.Tensor.reduceMeanAuto {α : Type} [Context α] {s : Shape} (axis : ℕ) (h : Shape.valid_axis_inst axis s) (t : Tensor α s) :

Tensor α (shapeAfterSum s axis)

Mean-reduction along axis, with axis validity provided as a typeclass argument.

Instances For

source

def Spec.Tensor.reduceSumSquared {α : Type} [Context α] {n : ℕ} {s : Shape} (axis : ℕ) (t : Tensor α (Shape.dim n s)) (h : Shape.reducibleAlong axis (Shape.dim n s)) :

Tensor α (shapeAfterSum (Shape.dim n s) axis)

Sum of squares reduced along an axis (helper for variance).

Instances For

source

def Spec.Tensor.reduceVar {α : Type} [Context α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Variance-reduction along a given axis (population variance, divides by n).

Instances For

source

def Spec.Tensor.reduceVarAuto {α : Type} [Context α] {s : Shape} (axis : ℕ) (h : Shape.valid_axis_inst axis s) (t : Tensor α s) :

Tensor α (shapeAfterSum s axis)

Variance-reduction along axis, with axis validity provided as a typeclass argument.

Instances For

source

def Spec.Tensor.reduceMin {α : Type} [Context α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Min-reduction along a given axis.

Instances For

source

@[irreducible]

def Spec.Tensor.reduceMin.loop {α : Type} [Context α] (inner : Shape) (n' : ℕ) (f : Fin n'.succ → Tensor α inner) (i : ℕ) (acc : Tensor α inner) (hi : i ≤ n') :

Tensor α inner

Instances For

source

def Spec.Tensor.reduceMax {α : Type} [Context α] {s : Shape} (axis : ℕ) (t : Tensor α s) (h : Shape.reducibleAlong axis s) :

Tensor α (shapeAfterSum s axis)

Max-reduction along a given axis.

Instances For

source

@[irreducible]

def Spec.Tensor.reduceMax.loop {α : Type} [Context α] (inner : Shape) (n' : ℕ) (f : Fin n'.succ → Tensor α inner) (i : ℕ) (acc : Tensor α inner) :

Tensor α inner

Instances For

source

def Spec.Tensor.reduceMaxAuto {α : Type} [Context α] {s : Shape} (axis : ℕ) [h : Shape.valid_axis_inst axis s] (t : Tensor α s) :

Tensor α (shapeAfterSum s axis)

Max-reduction along axis, with axis validity inferred via valid_axis_inst.

Instances For

source

def Spec.Tensor.reduceLastDim {α : Type} [Context α] {s : Shape} (f : {sliceShape : Shape} → Tensor α sliceShape → α) (x : Tensor α s) (h : Shape.reducibleAlong (s.rank - 1) s) :

Tensor α (shapeAfterSum s (s.rank - 1))

Reduce along the last axis of s (i.e. axis rank s - 1).

Instances For

source

def Spec.Tensor.reduceLastDimAuto {α : Type} [Context α] {s : Shape} (f : {sliceShape : Shape} → Tensor α sliceShape → α) (x : Tensor α s) [h : Shape.valid_axis_inst (s.rank - 1) s] :

Tensor α (shapeAfterSum s (s.rank - 1))

Like reduce_last_dim, but infers axis validity via valid_axis_inst.

Instances For

source

def Spec.Tensor.reduceMeanLast {α : Type} [Context α] {s : Shape} (x : Tensor α s) (h : Shape.reducibleAlong (s.rank - 1) s) :

Tensor α (shapeAfterSum s (s.rank - 1))

Mean-reduce along the last axis.

Instances For

source

def Spec.Tensor.reduceSumLast {α : Type} [Context α] {seqLen embedDim : ℕ} (x : Tensor α (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))) (h : Shape.reducibleAlong ((Shape.dim seqLen (Shape.dim embedDim Shape.scalar)).rank - 1) (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))) :

Tensor α (Shape.dim seqLen Shape.scalar)

Sum-reduce along the last axis of a 2D tensor (seqLen, embedDim).

Instances For

source

def Spec.Tensor.reduceProdLast {α : Type} [Context α] {seqLen embedDim : ℕ} (x : Tensor α (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))) (h : Shape.reducibleAlong ((Shape.dim seqLen (Shape.dim embedDim Shape.scalar)).rank - 1) (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))) :

Tensor α (Shape.dim seqLen Shape.scalar)

Product-reduce along the last axis of a 2D tensor (seqLen, embedDim).

Instances For

source

def Spec.Tensor.reduceMaxLast {α : Type} [Context α] {s : Shape} (x : Tensor α s) (h : Shape.reducibleAlong (s.rank - 1) s) :

Tensor α (shapeAfterSum s (s.rank - 1))

Max-reduce along the last axis.

Instances For

source

def Spec.Tensor.reduceMinLast {α : Type} [Context α] {s : Shape} (x : Tensor α s) (h : Shape.reducibleAlong (s.rank - 1) s) :

Tensor α (shapeAfterSum s (s.rank - 1))

Min-reduce along the last axis.

Instances For

source

def Spec.Tensor.reduceVarLast {α : Type} [Context α] {n : ℕ} {s : Shape} (x : Tensor α (Shape.dim n s)) (h : Shape.reducibleAlong ((Shape.dim n s).rank - 1) (Shape.dim n s)) :

Tensor α (shapeAfterSum (Shape.dim n s) ((Shape.dim n s).rank - 1))

Variance-reduce along the last axis (specialized to a leading batch dimension).

Instances For

source

def Spec.Tensor.reduceVarLastGeneral {α : Type} [Context α] {n : ℕ} {s : Shape} (x : Tensor α (Shape.dim n s)) (h : Shape.valid_axis_inst ((Shape.dim n s).rank - 1) (Shape.dim n s)) :

Tensor α (shapeAfterSum (Shape.dim n s) ((Shape.dim n s).rank - 1))

Variance-reduce along the last axis (with axis validity as a typeclass argument).

Instances For

source

def Spec.Tensor.reduceMeanLastGeneral {α : Type} [Context α] {s : Shape} (x : Tensor α s) (h : Shape.valid_axis_inst (s.rank - 1) s) :

Tensor α (shapeAfterSum s (s.rank - 1))

Mean-reduce along the last axis (with axis validity as a typeclass argument).

Instances For

source

def Spec.Tensor.reduceMeanLastGeneralWf {α : Type} [Context α] {s : Shape} (x : Tensor α s) [_h_wf : s.WellFormed] (_h_rank : s.rank > 0) (h_valid : Shape.valid_axis_inst (s.rank - 1) s) :

Tensor α (shapeAfterSum s (s.rank - 1))

Mean-reduce along the last axis, specialized for proofs that assume well-formedness.

Instances For

source

def Spec.Tensor.reduceSumLastGeneral {α : Type} [Context α] {s : Shape} (x : Tensor α s) [h : Shape.valid_axis_inst (s.rank - 1) s] :

Tensor α (shapeAfterSum s (s.rank - 1))

Sum-reduce along the last axis (with axis validity inferred via valid_axis_inst).

Instances For

source

def Spec.Tensor.matrixTransposeSpec {α : Type} {m n : ℕ} (t : Tensor α (Shape.dim m (Shape.dim n Shape.scalar))) :

Tensor α (Shape.dim n (Shape.dim m Shape.scalar))

Transpose a matrix (m×n) into (n×m).

PyTorch analogy: A.transpose(0, 1) or A.T for 2D tensors.

Instances For

source

def Spec.Tensor.transpose3DFirstToLastSpec {α : Type} {a b c : ℕ} (t : Tensor α (Shape.dim a (Shape.dim b (Shape.dim c Shape.scalar)))) :

Tensor α (Shape.dim b (Shape.dim c (Shape.dim a Shape.scalar)))

Permute a 3D tensor from (a,b,c) to (b,c,a).

Instances For

source

def Spec.Tensor.transpose3DLastToFirstSpec {α : Type} {a b c : ℕ} (t : Tensor α (Shape.dim a (Shape.dim b (Shape.dim c Shape.scalar)))) :

Tensor α (Shape.dim c (Shape.dim a (Shape.dim b Shape.scalar)))

Permute a 3D tensor from (a,b,c) to (c,a,b).

Instances For

source

def Spec.Tensor.transpose3DLastTwoSpec {α : Type} {a b c : ℕ} (t : Tensor α (Shape.dim a (Shape.dim b (Shape.dim c Shape.scalar)))) :

Tensor α (Shape.dim a (Shape.dim c (Shape.dim b Shape.scalar)))

Swap the last two axes of a 3D tensor: (a,b,c) to (a,c,b).

Instances For

source

def Spec.Tensor.swapFirstTwoSpec {α : Type} {m n : ℕ} {s : Shape} (t : Tensor α (Shape.dim m (Shape.dim n s))) :

Tensor α (Shape.dim n (Shape.dim m s))

Swap the first two dimensions of a tensor (m,n,...) to (n,m,...).

Instances For

source

def Spec.Tensor.swapAtDepthHelper {β : Type} {shape : Shape} (tensor : Tensor β shape) (d : ℕ) :

Tensor β (shape.swapAdjacentAtDepth d)

Helper for swapping adjacent dims at a given depth (see Shape.swapAdjacentAtDepth).

Instances For

source

def Spec.Tensor.swapAtDepthSpec {α : Type} {n : ℕ} {s : Shape} (t : Tensor α (Shape.dim n s)) (depth : ℕ) :

Tensor α (Shape.dim n (s.swapAdjacentAtDepth depth))

Swap adjacent dimensions at a given depth inside a leading batch dimension.

Instances For

source

def Spec.Tensor.matMulBackwardSpec {α : Type} [Context α] {m n p : ℕ} (A : Tensor α (Shape.dim m (Shape.dim n Shape.scalar))) (B : Tensor α (Shape.dim n (Shape.dim p Shape.scalar))) (dC : Tensor α (Shape.dim m (Shape.dim p Shape.scalar))) :

Tensor α (Shape.dim m (Shape.dim n Shape.scalar)) × Tensor α (Shape.dim n (Shape.dim p Shape.scalar))

Backward pass for matrix multiplication: returns (dA, dB) given dC.

PyTorch analogy: if C = A @ B, then:

dA = dC @ Bᵀ
dB = Aᵀ @ dC

Instances For

source

def Spec.Tensor.bmmSpec {α : Type} [Add α] [Mul α] [Zero α] {batch m n p : ℕ} (A : Tensor α (Shape.dim batch (Shape.dim m (Shape.dim n Shape.scalar)))) (B : Tensor α (Shape.dim batch (Shape.dim n (Shape.dim p Shape.scalar)))) :

Tensor α (Shape.dim batch (Shape.dim m (Shape.dim p Shape.scalar)))

Batched matrix multiplication: [batch,m,n] × [batch,n,p] → [batch,m,p].

Instances For

source

def Spec.Tensor.bmmBackwardSpec {α : Type} [Add α] [Mul α] [Zero α] {batch m n p : ℕ} (A : Tensor α (Shape.dim batch (Shape.dim m (Shape.dim n Shape.scalar)))) (B : Tensor α (Shape.dim batch (Shape.dim n (Shape.dim p Shape.scalar)))) (dC : Tensor α (Shape.dim batch (Shape.dim m (Shape.dim p Shape.scalar)))) :

Tensor α (Shape.dim batch (Shape.dim m (Shape.dim n Shape.scalar))) × Tensor α (Shape.dim batch (Shape.dim n (Shape.dim p Shape.scalar)))

Backward pass for batched matrix multiplication.

Instances For

source

def Spec.Tensor.matchShape {α : Type} {s : Shape} (t : Tensor α s) :

Shape → Prop

Runtime check that a tensor value matches a runtime Shape.

We use this in a few “dynamic” utilities where we have a runtime shape value and want to guard access/casts in a total way.

Instances For

source

def Spec.Tensor.concatSpec {α : Type} [Inhabited α] {n d : ℕ} (headCount : ℕ) (tensors : List (Tensor α (Shape.dim n (Shape.dim d Shape.scalar)))) (_h_len : tensors.length = headCount) :

Tensor α (Shape.dim n (Shape.dim (headCount * d) Shape.scalar))

Concatenate a list of (n,d) tensors along the last axis, producing (n, headCount*d).

This is mainly used by attention blocks that split/merge heads.

PyTorch analogy: torch.cat(heads, dim=-1) after splitting heads, followed by a reshape.

Instances For

source

def Spec.Tensor.concatSpec.buildRow {α : Type} {n d : ℕ} (i : Fin n) (ts : List (Tensor α (Shape.dim n (Shape.dim d Shape.scalar)))) :

List α

Instances For

source

def Spec.Tensor.concatVectorsSpec {α : Type} {n m : ℕ} (v1 : Tensor α (Shape.dim n Shape.scalar)) (v2 : Tensor α (Shape.dim m Shape.scalar)) :

Tensor α (Shape.dim (n + m) Shape.scalar)

Concatenate two vectors by appending v2 after v1.

Instances For

source

def Spec.Tensor.concatDim0Spec {α : Type} {n m : ℕ} {s : Shape} (t1 : Tensor α (Shape.dim n s)) (t2 : Tensor α (Shape.dim m s)) :

Tensor α (Shape.dim (n + m) s)

Concatenate along axis 0 (append t2 after t1).

Instances For

Slicing / concatenation on the leading axis #

concat_dim0_spec is the "append on axis 0" primitive that powers many higher-level utilities (sequence concatenation, channel skip connections, etc.).

For backprop and for "undoing" concatenations, it is convenient to have an explicit slice operation. We keep the API compact and index-safe:

slice_range0_spec start len selects len consecutive entries starting at start along axis 0.
concat_dim0_backward_spec is the adjoint of concat_dim0_spec (splits a gradient tensor).

source

def Spec.Tensor.sliceRange0Spec {α : Type} {n : ℕ} {s : Shape} (start len : ℕ) (h : len + start ≤ n) (t : Tensor α (Shape.dim n s)) :

Tensor α (Shape.dim len s)

Slice len entries along axis 0, starting at start.

This is the simplest "range slice" one typically needs to express:

taking the first n channels/tokens,
extracting the skip-connection half after a concat,
implementing take/drop without changing the inner shape.

The proof len + start ≤ n makes the slice total (no out-of-bounds behavior).

Instances For

source

def Spec.Tensor.concatDim0BackwardSpec {α : Type} {n m : ℕ} {s : Shape} (δ : Tensor α (Shape.dim (n + m) s)) :

Tensor α (Shape.dim n s) × Tensor α (Shape.dim m s)

Backward (adjoint) of concat_dim0_spec.

If y = concat_dim0_spec x1 x2, then in reverse-mode we split the upstream gradient δy into:

δx1 = the first n entries of δy,
δx2 = the last m entries of δy.

Instances For

source

def Spec.Tensor.sliceRange0BackwardSpec {α : Type} [Zero α] {n : ℕ} {s : Shape} (start len : ℕ) (_h : len + start ≤ n) (δ : Tensor α (Shape.dim len s)) :

Tensor α (Shape.dim n s)

Backward (adjoint) of slice_range0_spec.

If y = slice_range0_spec start len x, then slice_range0_backward_spec start len δy re-inserts the gradient into the original shape and fills everything outside the slice with zeros.

Instances For

source

def Spec.Tensor.concatSequenceSpec {α : Type} {seqLen1 seqLen2 hiddenSize : ℕ} (seq1 : Tensor α (Shape.dim seqLen1 (Shape.dim hiddenSize Shape.scalar))) (seq2 : Tensor α (Shape.dim seqLen2 (Shape.dim hiddenSize Shape.scalar))) :

Tensor α (Shape.dim (seqLen1 + seqLen2) (Shape.dim hiddenSize Shape.scalar))

Concatenate two sequences along time (axis 0), producing a longer sequence.

If seq1 : (seqLen1 x hidden) and seq2 : (seqLen2 x hidden), this returns (seqLen1 + seqLen2) x hidden by appending seq2 after seq1.

Do not confuse this with Spec.concatSequenceSpec (defined in NN.Spec.Core.Sequence), which concatenates along the feature dimension for same-length sequences.

Instances For

source

def Spec.Tensor.concatSequenceInnerSpec {α : Type} {seqLen hiddenSize1 hiddenSize2 : ℕ} (seq1 : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize1 Shape.scalar))) (seq2 : Tensor α (Shape.dim seqLen (Shape.dim hiddenSize2 Shape.scalar))) :

Tensor α (Shape.dim seqLen (Shape.dim (hiddenSize1 + hiddenSize2) Shape.scalar))

Concatenate two sequences along the feature dimension (inner axis).

Instances For

source

def Spec.Tensor.expandToColSpec {α : Type} {n : ℕ} {s : Shape} (t : Tensor α (Shape.dim n s)) :

Tensor α (Shape.dim n (Shape.dim 1 s))

Expand a (n, s) tensor into (n, 1, s) by inserting a trailing dimension of size 1.

PyTorch analogy: t.unsqueeze(-1) for a rank-1 outer dimension (or unsqueeze(dim=1) in 2D terms).

Instances For

source

def Spec.Tensor.expandToColSpecAlt {α : Type} {n : ℕ} (v : Tensor α (Shape.dim n Shape.scalar)) :

Tensor α (Shape.dim n (Shape.dim 1 Shape.scalar))

Same as expand_to_col_spec, specialized to vectors.

Instances For

source

def Spec.Tensor.squeezeColSpec {α : Type} {n : ℕ} {s : Shape} (t : Tensor α (Shape.dim n (Shape.dim 1 s))) :

Tensor α (Shape.dim n s)

Squeeze a (n,1,s) tensor back into (n,s) by dropping the singleton dimension.

Instances For

source

def Spec.Tensor.squeezeColSpecAlt {α : Type} {n : ℕ} (t : Tensor α (Shape.dim n (Shape.dim 1 Shape.scalar))) :

Tensor α (Shape.dim n Shape.scalar)

Same as squeeze_col_spec, specialized to vectors.

Instances For

source

def Spec.Tensor.unsqueezeSpec {α : Type} {n : ℕ} {s : Shape} (t : Tensor α (Shape.dim n s)) (_dim : ℕ) :

Tensor α (Shape.dim n (Shape.dim 1 s))

Unsqueeze (insert a singleton dim). Currently implemented as expand_to_col_spec.

Core uses singleton insertion mainly for column vectors, so this operation is specialized to that use case. General axis insertion can extend this definition.

Instances For

source