Multinomial Naive Bayes (small baseline) #

This file is intentionally "small and executable": it uses String features/labels and Lean's HashMap for counting. It is not a tensor‑indexed spec model like most of NN/Spec/Models/*, but it lives here as a simple baseline / example.

Probabilities are computed in log space (via MathFunctions.log) to avoid underflow.

PyTorch note: PyTorch does not provide a Naive Bayes classifier in torch.nn; the closest ecosystem analogue is scikit-learn’s MultinomialNB. TorchLean keeps this file mainly as a readable reference and a quick baseline for demos/tests.

What "training" means here #

Naive Bayes is a counting model: training is just collecting label and feature counts from the dataset. The API keeps fitting and inference separate, so examples can show exactly where counts are learned and where predictions are made.

The API exposes an explicit fit step that produces a Model, plus:

predictModel for inference using the fitted counts
negLogLikelihood as a standard training objective (useful for evaluation/comparison)

source

structure NaiveBayes.Example :

Type

One training example: a bag-of-words feature multiset and a class label.

features : List String
features.
label : String
Label.

Instances For

source

@[implicit_reducible]

instance NaiveBayes.instReprExample :

Repr Example

source

def NaiveBayes.instReprExample.repr :

Example → ℕ → Std.Format

Instances For

Fitted model #

Model stores the counts and some precomputed bookkeeping derived from the dataset. Nothing here depends on the scalar type α; we only need α when we turn counts into smoothed probabilities (log-space scores).

source

structure NaiveBayes.Model :

Type

Fitted multinomial Naive Bayes model.

This stores raw counts plus a little derived bookkeeping (labels, vocab, totalExamples). Scoring functions turn these counts into Laplace-smoothed log probabilities on demand.