Shape #

Additional analytic (HasFDerivAt) tape nodes for shape permutations.

These nodes are linear/isometric and are useful for models that do explicit reshaping and dimension permutations (e.g. Multi-Head Attention head splitting/combining).

source

theorem Proofs.Autograd.TapeNodes.ShapeOps.inner_castVec_left {n m : ℕ} (h : n = m) (x : Vec n) (y : Vec m) :

inner ℝ (castVec h x) y = inner ℝ x (castVec ⋯ y)

Move castVec across the left argument of an inner product.

source

theorem Proofs.Autograd.TapeNodes.ShapeOps.castVec_proof_irrel {n m : ℕ} (h₁ h₂ : n = m) (v : Vec n) :

castVec h₁ v = castVec h₂ v

castVec is proof-irrelevant in its equality argument.

reshape is linear: on vectors it is just a type cast along Shape.size equality. We implement it as a Node to keep the DAG theorem applicable.

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.reshape {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} (idx : Idx Γ s₁) (h : s₁.size = s₂.size) :

Node Γ s₂

reshape node: reinterpret the same underlying coordinates as a different shape.

This is only definable when Shape.size s₁ = Shape.size s₂; at the vector level it is a cast.

PyTorch analogue: view/reshape operations that do not change the total number of elements. https://pytorch.org/docs/stable/tensor_view.html

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.reshapeFderiv {Γ : List Spec.Shape} {s₁ s₂ : Spec.Shape} (idx : Idx Γ s₁) (h : s₁.size = s₂.size) :

NodeFDerivCorrect (reshape idx h)

NodeFDerivCorrect for reshape (it is linear/isometric).

Instances For

flatten is a specialization of reshape to the canonical vector shape (.dim (Shape.size s) .scalar).

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.flatten {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) :

Node Γ (Spec.Shape.dim s.size Spec.Shape.scalar)

flatten node: specialization of reshape to the canonical vector shape (.dim (Shape.size s) .scalar).

PyTorch analogue: flatten when applied to a contiguous tensor. https://pytorch.org/docs/stable/generated/torch.flatten.html

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.flattenFderiv {Γ : List Spec.Shape} {s : Spec.Shape} (idx : Idx Γ s) :

NodeFDerivCorrect (flatten idx)

NodeFDerivCorrect for flatten.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.reindexVec {n m : ℕ} (e : Fin n ≃ Fin m) :

Vec n → Vec m

Reindex a vector along a Fin equivalence (coordinate permutation/renaming).

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.reindexLin {n m : ℕ} (e : Fin n ≃ Fin m) :

Vec n →L[ℝ ] Vec m

The linear map induced by reindexVec.

Instances For

source

@[simp]

theorem Proofs.Autograd.TapeNodes.ShapeOps.reindexLin_apply {n m : ℕ} (e : Fin n ≃ Fin m) (v : Vec n) :

(reindexLin e) v = reindexVec e v

source

theorem Proofs.Autograd.TapeNodes.ShapeOps.inner_reindex_left {n m : ℕ} (e : Fin n ≃ Fin m) (x : Vec n) (y : Vec m) :

inner ℝ (reindexVec e x) y = inner ℝ x (reindexVec e.symm y)

Move reindexVec across the left argument of an inner product.

source

def Proofs.Autograd.TapeNodes.ShapeOps.swapFirstTwoEquiv (m n k : ℕ) :

Fin (m * (n * k)) ≃ Fin (n * (m * k))

Underlying coordinate permutation for swapping the first two axes of a 3D tensor.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.swapFirstTwo3d {Γ : List Spec.Shape} {m n : ℕ} {rest : Spec.Shape} (idx : Idx Γ (Spec.Shape.dim m (Spec.Shape.dim n rest))) :

Node Γ (Spec.Shape.dim n (Spec.Shape.dim m rest))

Swap the first two axes of a 3D tensor shape: .dim m (.dim n rest) ↦ .dim n (.dim m rest).

This is implemented as a coordinate permutation (a linear isometry).

PyTorch analogue: transpose(0, 1) on a 3D tensor. https://pytorch.org/docs/stable/generated/torch.transpose.html

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.swapFirstTwo3dFderiv {Γ : List Spec.Shape} {m n : ℕ} {rest : Spec.Shape} (idx : Idx Γ (Spec.Shape.dim m (Spec.Shape.dim n rest))) :

NodeFDerivCorrect (swapFirstTwo3d idx)

NodeFDerivCorrect for swap_first_two3d (linear coordinate permutation).

Instances For

source

def Proofs.Autograd.TapeNodes.ShapeOps.transposeLastTwoEquiv (a b c : ℕ) :

Fin (a * (b * (c * 1))) ≃ Fin (a * (c * (b * 1)))

Underlying coordinate permutation for transposing the last two axes of a 3D tensor.

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.transpose3dLastTwo {Γ : List Spec.Shape} {a b c : ℕ} (idx : Idx Γ (Spec.Shape.dim a (Spec.Shape.dim b (Spec.Shape.dim c Spec.Shape.scalar)))) :

Node Γ (Spec.Shape.dim a (Spec.Shape.dim c (Spec.Shape.dim b Spec.Shape.scalar)))

Transpose the last two axes of a 3D tensor: .dim a (.dim b (.dim c .scalar)) ↦ .dim a (.dim c (.dim b .scalar)).

This is another coordinate permutation used in attention (switching K to Kᵀ while keeping head/batch axes).

Instances For

source

noncomputable def Proofs.Autograd.TapeNodes.ShapeOps.transpose3dLastTwoFderiv {Γ : List Spec.Shape} {a b c : ℕ} (idx : Idx Γ (Spec.Shape.dim a (Spec.Shape.dim b (Spec.Shape.dim c Spec.Shape.scalar)))) :

NodeFDerivCorrect (transpose3dLastTwo idx)

NodeFDerivCorrect for transpose3d_last_two (linear coordinate permutation).

Instances For

TorchLean API

NN.Proofs.Autograd.Tape.Nodes.Shape

Shape #