Multinomial Naive Bayes (small baseline) #
This file is intentionally "small and executable": it uses String features/labels and Lean's
HashMap for counting. It is not a tensor‑indexed spec model like most of NN/Spec/Models/*,
but it lives here as a simple baseline / example.
Probabilities are computed in log space (via MathFunctions.log) to avoid underflow.
PyTorch note:
PyTorch does not provide a Naive Bayes classifier in torch.nn; the closest ecosystem analogue is
scikit-learn’s MultinomialNB. TorchLean keeps this file mainly as a readable reference and a
quick baseline for demos/tests.
What "training" means here #
Naive Bayes is a counting model: training is just collecting label and feature counts from the dataset. The API keeps fitting and inference separate, so examples can show exactly where counts are learned and where predictions are made.
The API exposes an explicit fit step that produces a Model, plus:
predictModelfor inference using the fitted countsnegLogLikelihoodas a standard training objective (useful for evaluation/comparison)
Fitted model #
Model stores the counts and some precomputed bookkeeping derived from the dataset.
Nothing here depends on the scalar type α; we only need α when we turn counts into smoothed
probabilities (log-space scores).
Fitted multinomial Naive Bayes model.
This stores raw counts plus a little derived bookkeeping (labels, vocab, totalExamples).
Scoring functions turn these counts into Laplace-smoothed log probabilities on demand.
- labelCounts : Std.HashMap String ℕ
label Counts.
- featureCounts : Std.HashMap String (Std.HashMap String ℕ)
feature Counts.
- totalCounts : Std.HashMap String ℕ
total Counts.
labels.
vocab.
- totalExamples : ℕ
total Examples.
Instances For
Fit a naive Bayes model by collecting counts from the dataset.
Instances For
Scoring and prediction #
We use the standard multinomial NB scoring rule (with Laplace smoothing):
- prior:
(count(label)+1) / (N + nLabels) - conditional:
(count(feature,label)+1) / (totalFeatures(label) + vocabSize)
Scores are in log space. For prediction we only need relative ordering.
Training objective (negative log-likelihood) #
This is the standard objective used to evaluate NB models:
- Σ log P(y_i | x_i)
Even though we don't optimize it with gradients (NB training is closed-form counting), having this objective is useful for:
- checking improvements (smoothing choices, feature engineering)
- comparing NB against other baselines
- unit tests / smoke tests