Random Forest #

This file defines a lightweight random-forest wrapper around a list of decision trees from NN.Spec.Module.DecisionTree, plus a couple of standard aggregation strategies:

classification: majority vote
regression: average

This is a small reference implementation intended to be:

deterministic (no RNG hidden inside the definitions),
easy to read and audit,
suitable for examples and baseline comparisons.

Ecosystem analogies:

scikit-learn: RandomForestClassifier / RandomForestRegressor
PyTorch: there is no built-in random forest in torch.nn; forests are typically handled via scikit-learn or dedicated libraries.

source

structure random_forest.RandomForest (α : Type) :

Type

Random-forest container: a list of trees.

Aggregation (vote/average/etc.) is handled by predict so the container stays generic.

trees : List (DecisionTree α)
Forest members. We keep a List for simplicity.

Instances For

source

def random_forest.predict {a : Type} (forest : RandomForest a) (decisionFn : String → Bool) (aggregateFn : List a → a) :

Function to predict using all trees in the forest and combine results with a given aggregation function. For classification, this would typically be a majority vote. For regression, this would be an average.

Instances For

source

def random_forest.majorityVote {a : Type} [Ord a] [Inhabited a] (predictions : List a) :

Option a

For classification tasks, majority vote aggregation function. Returns the most frequent element in the list of predictions. Works with any type that supports ordering.

Instances For

source

def random_forest.average {α : Type} [Zero α] [Add α] [Div α] [Coe ℕ α] (predictions : List α) :

For regression tasks, average aggregation function.

Instances For

Numeric random forest (spec baseline) #

The RandomForest above wraps the symbolic DecisionTree from NN.Spec.Module.DecisionTree, where splits are keyed by String feature names and an external decisionFn : String → Bool decides the branch. That is handy for demos, but it is not something we can “train” without providing feature-value semantics.

For a more classical baseline, we also provide a numeric random forest built on the Spec.DecisionTreeSpec representation used by gradient_boosted_trees.lean:

features are indexed by Nat
splits compare a feature value to a threshold
training uses a deterministic greedy CART-style routine (MSE) and a deterministic “bootstrap” resampling scheme (rotation) so it stays pure and reproducible.

This is still a spec/reference implementation: correctness and readability come first.

source

structure random_forest.Numeric.RegressionForestSpec (α : Type) (nTrees maxDepth nFeatures : ℕ) :

Type

A regression random forest: an ensemble of regression trees averaged at inference time.

trees : Spec.Tensor (Spec.DecisionTreeSpec α maxDepth) (Spec.Shape.dim nTrees Spec.Shape.scalar)
trees.

Instances For

source

def random_forest.Numeric.regressionForestForwardSpec {α : Type} [Context α] {nTrees maxDepth nFeatures : ℕ} (model : RegressionForestSpec α nTrees maxDepth nFeatures) (x : Spec.Tensor α (Spec.Shape.dim nFeatures Spec.Shape.scalar)) :

Spec.Tensor α Spec.Shape.scalar

Forward pass: average tree predictions.

This corresponds to RandomForestRegressor.predict (mean over tree outputs).

Instances For

source

def random_forest.Numeric.regressionForestFitRegressionMseSpec {α : Type} [Context α] {batch nTrees maxDepth nFeatures : ℕ} (x : Spec.Tensor α (Spec.Shape.dim batch (Spec.Shape.dim nFeatures Spec.Shape.scalar))) (y : Spec.Tensor α (Spec.Shape.dim batch Spec.Shape.scalar)) (hBatch : batch ≠ 0) :

RegressionForestSpec α nTrees maxDepth nFeatures

Fit a regression random forest by training each tree on a deterministic “bootstrap” resample.

Why rotate instead of RNG-based bootstrapping?

it keeps the whole spec deterministic and pure (important for proofs/verification)
it still exercises the same API shape as bagging: each tree sees a different multiset

If you want true randomness, treat this as the spec and implement an executable wrapper that supplies a randomized index mapping.

Instances For

Classification forest (Gini) #

This mirrors the regression forest, but uses the classifier-tree type from NN/Spec/Models/GradientBoostedTrees.lean so leaf values can be arbitrary labels (β).

source

structure random_forest.Numeric.ClassificationForestSpec (α β : Type) (nTrees maxDepth nFeatures : ℕ) :

Type

A classification random forest: an ensemble of classifier trees (majority vote).

trees : Spec.Tensor (Spec.DecisionTreeClassifierSpec α β maxDepth) (Spec.Shape.dim nTrees Spec.Shape.scalar)
trees.

Instances For

source

def random_forest.Numeric.classificationForestPredictSpec {α : Type} [Context α] {β : Type} [DecidableEq β] [Inhabited β] {nTrees maxDepth nFeatures : ℕ} (model : ClassificationForestSpec α β nTrees maxDepth nFeatures) (x : Spec.Tensor α (Spec.Shape.dim nFeatures Spec.Shape.scalar)) :

Predict by majority vote across trees.

Instances For

source

def random_forest.Numeric.classificationForestFitClassificationGiniSpec {α : Type} [Context α] {β : Type} [DecidableEq β] [Inhabited β] {batch nTrees maxDepth nFeatures : ℕ} (x : Spec.Tensor α (Spec.Shape.dim batch (Spec.Shape.dim nFeatures Spec.Shape.scalar))) (y : List β) (hBatch : batch ≠ 0) :

ClassificationForestSpec α β nTrees maxDepth nFeatures

Fit a classification forest using deterministic “rotation bootstraps” and Gini-based CART trees.

Instances For

TorchLean API

NN.Spec.Models.RandomForest

Random Forest #

Numeric random forest (spec baseline) #

Classification forest (Gini) #