Permutation Equivariance of Self-Attention (No Positional Encoding) #

Self-attention (without positional information) should be equivariant to permutations of the token axis:

If we reorder the input tokens, the output is reordered in the same way.

This file formalizes that statement for TorchLean’s spec-layer Spec.selfAttention over ℝ.

The helper reindexing operations below are intentionally proof-local. They describe how this proof permutes tensor axes, but they are not part of the general Spec.Tensor API; reusable tensor operations should live under NN.Spec, while model theorems and their proof scaffolding live here.

Token reindexing #

Because spec tensors are functions out of Fin n, a token permutation is just reindexing the outer axis.

Softmax equivariance #

TorchLean's spec softmax on vectors is implemented in a stabilized way (x ↦ exp(x - m) / Σ exp(x - m)), but over ℝ it agrees with the plain exp(x) / Σ exp(x) formula. This lets us prove permutation equivariance without reasoning about how the stabilizing shift m is chosen.

TorchLean API

NN.Proofs.Models.Attention.PermutationEquivariance

Permutation Equivariance of Self-Attention (No Positional Encoding) #

Token reindexing #

Softmax equivariance #

Linear algebra: matmul/transpose commute with permutations #

Main theorem: self-attention equivariance #