TorchLean API

NN.Spec.Models.NaiveBayes

Multinomial Naive Bayes #

This module gives a pure multinomial Naive Bayes classifier over String features and labels, using Lean's HashMap for the fitted count tables. It is not a tensor-indexed neural model like most of NN/Spec/Models/*; it is a non-neural baseline that keeps the training and prediction semantics explicit.

Probabilities are computed in log space (via MathFunctions.log) to avoid underflow.

Ecosystem note: PyTorch does not provide a Naive Bayes classifier in torch.nn; the closest ecosystem analogue is scikit-learn’s MultinomialNB.

What "training" means here #

Naive Bayes is a counting model: training is just collecting label and feature counts from the dataset. The API keeps fitting and inference separate, so examples can show exactly where counts are learned and where predictions are made.

The API exposes an explicit fit step that produces a Model, plus:

One training example: a bag-of-words feature multiset and a class label.

Instances For
    @[implicit_reducible]

    Fitted model #

    Model stores the counts and some precomputed bookkeeping derived from the dataset. Nothing here depends on the scalar type α; we only need α when we turn counts into smoothed probabilities (log-space scores).

    Fitted multinomial Naive Bayes model.

    This stores raw counts plus a little derived bookkeeping (labels, vocab, totalExamples). Scoring functions turn these counts into Laplace-smoothed log probabilities on demand.

    Instances For

      Fit a naive Bayes model by collecting counts from the dataset.

      Instances For

        Scoring and prediction #

        We use the standard multinomial NB scoring rule (with Laplace smoothing):

        Scores are in log space. For prediction we only need relative ordering.

        def NaiveBayes.score {α : Type} [Context α] (m : Model) (input : List String) (lbl : String) :
        α

        Unnormalized log score log P(lbl) + Σ log P(f|lbl) for a bag of features.

        Instances For
          def NaiveBayes.predictModel (m : Model) (input : List String) (α : Type) [Context α] :

          Predict a label using a trained Naive Bayes model.

          Instances For

            Training objective (negative log-likelihood) #

            This is the standard objective used to evaluate NB models:

            - Σ log P(y_i | x_i)

            Even though we don't optimize it with gradients (NB training is closed-form counting), having this objective is useful for:

            def NaiveBayes.negLogLikelihood {α : Type} [Context α] (m : Model) (data : List Example) :
            α

            Negative log-likelihood of the dataset under the trained Naive Bayes model.

            Instances For