Evaluation helpers #
These utilities aggregate per-sample or per-batch StepReports into a single
mean report. Metrics are matched by name and position.
Metric aggregation #
Report sums (for weighted aggregation) #
An accumulator for averaging StepReports.
Instead of keeping a list of all reports and reducing at the end, we maintain:
count: how many samples contributed,lossSum: the sum of losses (optionally weighted by batch size),metricsSum: a pointwise sum of named metrics.
This is the same idea as computing streaming averages in a typical PyTorch evaluation loop.
- count : ℕ
Number of samples represented by this accumulator.
- lossSum : a
Sum of losses, already weighted by sample count for batch reports.
Pointwise sum of metrics; names must stay aligned across additions.
Instances For
Start an accumulator from a single-sample report.
Instances For
Start an accumulator from a batch report, weighted by the number of samples in the batch.
This is the appropriate constructor when evalBatch returns means over the batch, but we want
the final mean to weight by the number of items in each batch.
Instances For
Dataset evaluation #
Evaluate a list of samples and average their reports.
This is the “for sample in dataset: compute report; take mean” pattern.
Instances For
Evaluate a Dataset by converting to a list and calling evalList.
Instances For
Evaluate a list of non-empty batches and compute a weighted mean report.
Each batch contributes proportionally to its length (so small last-batches do not distort the average).
Instances For
Batch a dataset and then call evalBatches.