Embedding #

Spec-layer embedding primitives.

We model embeddings through single-scalar one-hot tensors: inputs have the same scalar type α as the embedding matrix, so they compose cleanly with the rest of the tensor language.

If you want index-based embeddings (integer token ids) in runtime graphs, that lives at the TorchLean/session layer via Nat channels; the spec layer stays purely numeric by default.

References / analogies:

In most ML frameworks, an embedding table is a matrix W : (vocab x embedDim) and an index-based lookup returns W[token_id]. One-hot embeddings are the equivalent linear map oneHot @ W (this file).
Bengio et al., "A Neural Probabilistic Language Model" (2003) for the classic embedding-table framing in neural language models.
Mikolov et al., "Efficient Estimation of Word Representations in Vector Space" (2013) for the modern word-embedding perspective.
PyTorch API docs:
- torch.nn.Embedding: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
- torch.nn.functional.one_hot: https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html

source

structure Spec.EmbeddingSpec (vocab embedDim : ℕ) (α : Type) :

Type

Standard embedding weight matrix: vocab × embedDim.

W : Tensor α (Shape.dim vocab (Shape.dim embedDim Shape.scalar))
W.

Instances For

source

def Spec.embeddingOnehotSpec {α : Type} [Context α] {vocab embedDim seqLen : ℕ} (emb : EmbeddingSpec vocab embedDim α) (oneHot : Tensor α (Shape.dim seqLen (Shape.dim vocab Shape.scalar))) :

Tensor α (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))

Embed a batch/sequence of one-hot vectors:

oneHot : (seqLen × vocab) and W : (vocab × embedDim) gives (seqLen × embedDim).

Instances For

Gradients #

embedding_onehot_spec is matrix multiplication:

Y = oneHot @ W.

So the reverse-mode derivatives are the standard ones:

dOneHot = dY @ Wᵀ
dW = oneHotᵀ @ dY

Even though "true" one-hot tensors are often treated as non-differentiable in practice, having a named VJP is useful for:

treating embeddings as a pure linear map in proofs,
debugging equivalences (one-hot vs index-based embeddings),
and keeping this layer consistent with the rest of the spec library.

source

def Spec.embeddingOnehotBackwardSpec {α : Type} [Context α] {vocab embedDim seqLen : ℕ} (emb : EmbeddingSpec vocab embedDim α) (oneHot : Tensor α (Shape.dim seqLen (Shape.dim vocab Shape.scalar))) (dY : Tensor α (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))) :

Tensor α (Shape.dim seqLen (Shape.dim vocab Shape.scalar)) × Tensor α (Shape.dim vocab (Shape.dim embedDim Shape.scalar))

Backward/VJP for embedding_onehot_spec: returns (dOneHot, dW).

Instances For

TorchLean API

NN.Spec.Layers.Embedding

Embedding #

Gradients #