k‑Nearest Neighbors (kNN) (spec model) #
This file provides a small kNN classifier/regressor baseline:
- a dataset is a list of
(featureVector, label)pairs, - prediction is based on the
kclosest points under a chosen distance.
We include deterministic tie‑breaking so results are stable (useful for regression tests and formal reasoning).
References:
- Cover and Hart (1967), "Nearest Neighbor Pattern Classification": https://ieeexplore.ieee.org/document/1053964
PyTorch / sklearn analogies:
- In the Python ecosystem this is closest to
sklearn.neighbors.KNeighborsClassifier/sklearn.neighbors.KNeighborsRegressor. - kNN is not typically an
nn.Modulein PyTorch, but it is a common baseline for "classic ML" comparisons and for sanity checks.
Model container #
A small kNN model container (parameters + stored dataset).
This is a lazy model: inference consults the stored dataset at query time, rather than learning
weights.
- k : ℕ
Number of neighbors to consult.
- dataset : List (Tensor α (Shape.dim n Shape.scalar) × β)
Training data: feature vectors paired with labels/targets.
Instances For
Neighbor selection #
The key technical detail here is deterministic tie-breaking: when two points are at exactly the same distance, we prefer the earlier point in the dataset. This makes evaluation stable and keeps formal reasoning about the classifier simpler.
Find the k nearest neighbors under Euclidean distance.
PyTorch/sklearn analogy: Euclidean L2 distance is the default for many baseline kNN examples.
Instances For
Find the k nearest neighbors under a user-provided distance function.
Notes:
- The distance value is only used for ranking neighbors. It does not need to satisfy metric axioms, but it should be consistent with "smaller means closer".
- We keep deterministic tie-breaking via dataset order.
Instances For
Classification #
Majority vote among the neighbors.
Tie-breaking: if multiple labels have the same maximum count, we prefer the label of the closest neighbor. This is deterministic and matches common "stable mode" implementations.
Instances For
Classification using an RBMap for label counts.
This variant avoids hashing, at the cost of requiring an Ord β instance.
Instances For
Regression #
Unweighted kNN regression: average of the neighbor targets.
Instances For
Weighted kNN regression using inverse-distance weights.
We use weights w_i = 1 / d(x, x_i). If a neighbor is exactly at distance 0 we give it a large
weight so it dominates the average.
This is a common heuristic and matches what many "classic ML" baselines do in practice.
Instances For
Helpers #
Constructor helper (explicit arguments keep elaboration simple in demos).
Instances For
Batch regression: map predict over a list of inputs.
Instances For
kNN classification using an explicit distance function (for non-Euclidean metrics).
Instances For
kNN classification plus a simple confidence score (maxVotes / k).
This returns the winning label together with a scalar in [0,1] that measures what fraction of the
neighbors agreed with the winner. This is a heuristic, but it is useful for demos and baselines.