TorchLean API

NN.Spec.Layers.Embedding

Embedding #

Spec-layer embedding primitives.

We model embeddings through single-scalar one-hot tensors: inputs have the same scalar type α as the embedding matrix, so they compose cleanly with the rest of the tensor language.

If you want index-based embeddings (integer token ids) in runtime graphs, that lives at the TorchLean/session layer via Nat channels; the spec layer stays purely numeric by default.

References / analogies:

structure Spec.EmbeddingSpec (vocab embedDim : ) (α : Type) :

Standard embedding weight matrix: vocab × embedDim.

Instances For
    def Spec.embeddingOnehotSpec {α : Type} [Context α] {vocab embedDim seqLen : } (emb : EmbeddingSpec vocab embedDim α) (oneHot : Tensor α (Shape.dim seqLen (Shape.dim vocab Shape.scalar))) :
    Tensor α (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))

    Embed a batch/sequence of one-hot vectors:

    oneHot : (seqLen × vocab) and W : (vocab × embedDim) gives (seqLen × embedDim).

    Instances For

      Gradients #

      embedding_onehot_spec is matrix multiplication:

      Y = oneHot @ W.

      So the reverse-mode derivatives are the standard ones:

      Even though "true" one-hot tensors are often treated as non-differentiable in practice, having a named VJP is useful for:

      def Spec.embeddingOnehotBackwardSpec {α : Type} [Context α] {vocab embedDim seqLen : } (emb : EmbeddingSpec vocab embedDim α) (oneHot : Tensor α (Shape.dim seqLen (Shape.dim vocab Shape.scalar))) (dY : Tensor α (Shape.dim seqLen (Shape.dim embedDim Shape.scalar))) :
      Tensor α (Shape.dim seqLen (Shape.dim vocab Shape.scalar)) × Tensor α (Shape.dim vocab (Shape.dim embedDim Shape.scalar))

      Backward/VJP for embedding_onehot_spec: returns (dOneHot, dW).

      Instances For