BugZoo: KV-cache contracts #

LLM inference-engine bug reports include cache-shift bugs, RoPE/position mismatches, shape mistakes, and resource/configuration errors. TorchLean does not verify a full serving engine, paged attention allocator, or multi-GPU scheduler. The useful first step is still precise: represent the cache update as a typed tensor operation and prove the append invariant we rely on.

Reference:

Liu et al., "A First Look at Bugs in LLM Inference Engines", 2025.

This file proves the cache append boundary. A stronger future theorem should connect cached decode to full-sequence attention:

decodeWithCache prefix newToken = fullAttention (prefix ++ [newToken])

under the same mask, RoPE/position encoding, and numeric semantics.

source

structure NN.Examples.BugZoo.KVCache.Cache (α : Type) (seqLen headDim : ℕ) :

Type

A key/value cache with an explicit sequence length and head dimension.

keys : Spec.Tensor α (Spec.Shape.dim seqLen (Spec.Shape.dim headDim Spec.Shape.scalar))
Cached key vectors, indexed by time.
values : Spec.Tensor α (Spec.Shape.dim seqLen (Spec.Shape.dim headDim Spec.Shape.scalar))
Cached value vectors, indexed by time.

Instances For

source

def NN.Examples.BugZoo.KVCache.singletonToken {α : Type} {headDim : ℕ} (x : Spec.Tensor α (Spec.Shape.dim headDim Spec.Shape.scalar)) :

Spec.Tensor α (Spec.Shape.dim 1 (Spec.Shape.dim headDim Spec.Shape.scalar))

View one token vector as a length-one sequence.

Instances For

source

def NN.Examples.BugZoo.KVCache.appendToken {α : Type} {seqLen headDim : ℕ} (past : Spec.Tensor α (Spec.Shape.dim seqLen (Spec.Shape.dim headDim Spec.Shape.scalar))) (newToken : Spec.Tensor α (Spec.Shape.dim headDim Spec.Shape.scalar)) :

Spec.Tensor α (Spec.Shape.dim (seqLen + 1) (Spec.Shape.dim headDim Spec.Shape.scalar))

Append one token vector to a sequence cache along the time axis.

Instances For

source

def NN.Examples.BugZoo.KVCache.appendKV {α : Type} {seqLen headDim : ℕ} (cache : Cache α seqLen headDim) (newKey newValue : Spec.Tensor α (Spec.Shape.dim headDim Spec.Shape.scalar)) :

Cache α (seqLen + 1) headDim

Append both key and value vectors to the KV cache.

Instances For

source

theorem NN.Examples.BugZoo.KVCache.appendKV_last_key {α : Type} {seqLen headDim : ℕ} (cache : Cache α seqLen headDim) (newKey newValue : Spec.Tensor α (Spec.Shape.dim headDim Spec.Shape.scalar)) :

Spec.getAtSpec (appendKV cache newKey newValue).keys ⟨seqLen, ⋯⟩ = newKey

The newly appended key is exactly the final key in the updated cache.

source

theorem NN.Examples.BugZoo.KVCache.appendKV_last_value {α : Type} {seqLen headDim : ℕ} (cache : Cache α seqLen headDim) (newKey newValue : Spec.Tensor α (Spec.Shape.dim headDim Spec.Shape.scalar)) :

Spec.getAtSpec (appendKV cache newKey newValue).values ⟨seqLen, ⋯⟩ = newValue

The newly appended value is exactly the final value in the updated cache.