Random Forest #
This file defines a lightweight random-forest wrapper around a list of decision trees from
NN.Spec.Module.DecisionTree, plus a couple of standard aggregation strategies:
- classification: majority vote
- regression: average
This is a small reference implementation intended to be:
- deterministic (no RNG hidden inside the definitions),
- easy to read and audit,
- suitable for examples and baseline comparisons.
Ecosystem analogies:
- scikit-learn:
RandomForestClassifier/RandomForestRegressor - PyTorch: there is no built-in random forest in
torch.nn; forests are typically handled via scikit-learn or dedicated libraries.
Random-forest container: a list of trees.
Aggregation (vote/average/etc.) is handled by predict so the container stays generic.
- trees : List (DecisionTree α)
Forest members. We keep a
Listfor simplicity.
Instances For
Function to predict using all trees in the forest and combine results with a given aggregation function. For classification, this would typically be a majority vote. For regression, this would be an average.
Instances For
Numeric random forest (spec baseline) #
The RandomForest above wraps the symbolic DecisionTree from NN.Spec.Module.DecisionTree, where
splits are keyed by String feature names and an external decisionFn : String → Bool decides the
branch. That is handy for demos, but it is not something we can “train” without providing
feature-value semantics.
For a more classical baseline, we also provide a numeric random forest built on the
Spec.DecisionTreeSpec representation used by gradient_boosted_trees.lean:
- features are indexed by
Nat - splits compare a feature value to a threshold
- training uses a deterministic greedy CART-style routine (MSE) and a deterministic “bootstrap” resampling scheme (rotation) so it stays pure and reproducible.
This is still a spec/reference implementation: correctness and readability come first.
A regression random forest: an ensemble of regression trees averaged at inference time.
- trees : Spec.Tensor (Spec.DecisionTreeSpec α maxDepth) (Spec.Shape.dim nTrees Spec.Shape.scalar)
trees.
Instances For
Forward pass: average tree predictions.
This corresponds to RandomForestRegressor.predict (mean over tree outputs).
Instances For
Fit a regression random forest by training each tree on a deterministic “bootstrap” resample.
Why rotate instead of RNG-based bootstrapping?
- it keeps the whole spec deterministic and pure (important for proofs/verification)
- it still exercises the same API shape as bagging: each tree sees a different multiset
If you want true randomness, treat this as the spec and implement an executable wrapper that supplies a randomized index mapping.
Instances For
Classification forest (Gini) #
This mirrors the regression forest, but uses the classifier-tree type from
NN/Spec/Models/GradientBoostedTrees.lean so leaf values can be arbitrary labels (β).
A classification random forest: an ensemble of classifier trees (majority vote).
- trees : Spec.Tensor (Spec.DecisionTreeClassifierSpec α β maxDepth) (Spec.Shape.dim nTrees Spec.Shape.scalar)
trees.
Instances For
Predict by majority vote across trees.
Instances For
Fit a classification forest using deterministic “rotation bootstraps” and Gini-based CART trees.