TorchLean API

NN.Spec.Models.NaiveBayes

Multinomial Naive Bayes (small baseline) #

This file is intentionally "small and executable": it uses String features/labels and Lean's HashMap for counting. It is not a tensor‑indexed spec model like most of NN/Spec/Models/*, but it lives here as a simple baseline / example.

Probabilities are computed in log space (via MathFunctions.log) to avoid underflow.

PyTorch note: PyTorch does not provide a Naive Bayes classifier in torch.nn; the closest ecosystem analogue is scikit-learn’s MultinomialNB. TorchLean keeps this file mainly as a readable reference and a quick baseline for demos/tests.

What "training" means here #

Naive Bayes is a counting model: training is just collecting label and feature counts from the dataset. The API keeps fitting and inference separate, so examples can show exactly where counts are learned and where predictions are made.

The API exposes an explicit fit step that produces a Model, plus:

One training example: a bag-of-words feature multiset and a class label.

Instances For
    @[implicit_reducible]

    Fitted model #

    Model stores the counts and some precomputed bookkeeping derived from the dataset. Nothing here depends on the scalar type α; we only need α when we turn counts into smoothed probabilities (log-space scores).

    Fitted multinomial Naive Bayes model.

    This stores raw counts plus a little derived bookkeeping (labels, vocab, totalExamples). Scoring functions turn these counts into Laplace-smoothed log probabilities on demand.

    Instances For

      Fit a naive Bayes model by collecting counts from the dataset.

      Instances For

        Scoring and prediction #

        We use the standard multinomial NB scoring rule (with Laplace smoothing):

        Scores are in log space. For prediction we only need relative ordering.

        def NaiveBayes.score {α : Type} [Context α] (m : Model) (input : List String) (lbl : String) :
        α

        Unnormalized log score log P(lbl) + Σ log P(f|lbl) for a bag of features.

        Instances For
          def NaiveBayes.predictModel (m : Model) (input : List String) (α : Type) [Context α] :

          Predict a label using a fitted model.

          Instances For

            Training objective (negative log-likelihood) #

            This is the standard objective used to evaluate NB models:

            - Σ log P(y_i | x_i)

            Even though we don't optimize it with gradients (NB training is closed-form counting), having this objective is useful for:

            def NaiveBayes.negLogLikelihood {α : Type} [Context α] (m : Model) (data : List Example) :
            α

            Negative log-likelihood of the dataset under the fitted model.

            Instances For
              def NaiveBayes.predict (data : List Example) (input : List String) (α : Type) [Context α] :

              Backwards-compatible "fit + predict" entrypoint.

              This re-fits the model on every call. For performance and clarity, prefer: let m := fit data; predictModel m input α.

              Instances For