k‑Nearest Neighbors (kNN) (spec model) #

This file provides a small kNN classifier/regressor baseline:

a dataset is a list of (featureVector, label) pairs,
prediction is based on the k closest points under a chosen distance.

We include deterministic tie‑breaking so results are stable (useful for regression tests and formal reasoning).

References:

Cover and Hart (1967), "Nearest Neighbor Pattern Classification": https://ieeexplore.ieee.org/document/1053964

PyTorch / sklearn analogies:

In the Python ecosystem this is closest to sklearn.neighbors.KNeighborsClassifier / sklearn.neighbors.KNeighborsRegressor.
kNN is not typically an nn.Module in PyTorch, but it is a common baseline for "classic ML" comparisons and for sanity checks.

Model container #

source

structure Spec.KNN (α β : Type) (n : ℕ) :

Type

A small kNN model container (parameters + stored dataset).

This is a lazy model: inference consults the stored dataset at query time, rather than learning weights.

k : ℕ
Number of neighbors to consult.
dataset : List (Tensor α (Shape.dim n Shape.scalar) × β)
Training data: feature vectors paired with labels/targets.

Instances For

Neighbor selection #

The key technical detail here is deterministic tie-breaking: when two points are at exactly the same distance, we prefer the earlier point in the dataset. This makes evaluation stable and keeps formal reasoning about the classifier simpler.

source

def Spec.findKNearest (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (knn : KNN α β n) (input : Tensor α (Shape.dim n Shape.scalar)) :

List (Tensor α (Shape.dim n Shape.scalar) × β)

Find the k nearest neighbors under Euclidean distance.

PyTorch/sklearn analogy: Euclidean L2 distance is the default for many baseline kNN examples.

Instances For

source

def Spec.findKNearestWithDistance (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (distanceFn : Tensor α (Shape.dim n Shape.scalar) → Tensor α (Shape.dim n Shape.scalar) → α) (knn : KNN α β n) (input : Tensor α (Shape.dim n Shape.scalar)) :

List (Tensor α (Shape.dim n Shape.scalar) × β)

Find the k nearest neighbors under a user-provided distance function.

Notes:

The distance value is only used for ranking neighbors. It does not need to satisfy metric axioms, but it should be consistent with "smaller means closer".
We keep deterministic tie-breaking via dataset order.

Instances For

Classification #

source

def Spec.classify (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [BEq β] [Hashable β] [Inhabited β] (knn : KNN α β n) (input : Tensor α (Shape.dim n Shape.scalar)) :

Majority vote among the neighbors.

Tie-breaking: if multiple labels have the same maximum count, we prefer the label of the closest neighbor. This is deterministic and matches common "stable mode" implementations.

Instances For

source

def Spec.classifyRBMap (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [Ord β] [Inhabited β] (knn : KNN α β n) (input : Tensor α (Shape.dim n Shape.scalar)) :

Option β

Classification using an RBMap for label counts.

This variant avoids hashing, at the cost of requiring an Ord β instance.

Instances For

Regression #

source

def Spec.predict (α : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (knn : KNN α α n) (input : Tensor α (Shape.dim n Shape.scalar)) :

Unweighted kNN regression: average of the neighbor targets.

Instances For

source

def Spec.predictWeighted (α : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (knn : KNN α α n) (input : Tensor α (Shape.dim n Shape.scalar)) :

Weighted kNN regression using inverse-distance weights.

We use weights w_i = 1 / d(x, x_i). If a neighbor is exactly at distance 0 we give it a large weight so it dominates the average.

This is a common heuristic and matches what many "classic ML" baselines do in practice.

Instances For

Helpers #

source

def Spec.KNN.fromData (α β : Type) (n k : ℕ) (data : List (Tensor α (Shape.dim n Shape.scalar) × β)) :

KNN α β n

Constructor helper (explicit arguments keep elaboration simple in demos).

Instances For

source

def Spec.batchPredict (α : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] (knn : KNN α α n) (inputs : List (Tensor α (Shape.dim n Shape.scalar))) :

List α

Batch regression: map predict over a list of inputs.

Instances For

source

def Spec.batchClassify (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [Hashable β] [Inhabited β] [BEq β] (knn : KNN α β n) (inputs : List (Tensor α (Shape.dim n Shape.scalar))) :

List β

Batch classification: map classify over a list of inputs.

Instances For

source

def Spec.classifyWithDistance (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [BEq β] [Hashable β] [Inhabited β] (distanceFn : Tensor α (Shape.dim n Shape.scalar) → Tensor α (Shape.dim n Shape.scalar) → α) (knn : KNN α β n) (input : Tensor α (Shape.dim n Shape.scalar)) :

kNN classification using an explicit distance function (for non-Euclidean metrics).

Instances For

source

def Spec.classifyWithConfidence (α β : Type) (n : ℕ) [Context α] [DecidableRel fun (x1 x2 : α) => x1 > x2] [BEq β] [Hashable β] [Inhabited β] (knn : KNN α β n) (input : Tensor α (Shape.dim n Shape.scalar)) :

β × α

kNN classification plus a simple confidence score (maxVotes / k).

This returns the winning label together with a scalar in [0,1] that measures what fraction of the neighbors agreed with the winner. This is a heuristic, but it is useful for demos and baselines.

Instances For

TorchLean API

NN.Spec.Models.Knn

k‑Nearest Neighbors (kNN) (spec model) #

Model container #

Neighbor selection #

Classification #

Regression #

Helpers #