CSV loader tutorial (transforms + minibatches + scheduler) #
This tutorial mirrors the "data first" workflow people expect from PyTorch:
- Load a dataset from disk (CSV).
- Build a transform pipeline (
Data.Transforms.Compose). - Wrap the per-sample dataset in a minibatch loader (
Data.batchLoader). - Train with a learning-rate scheduler.
Generate a small deterministic regression dataset with
python3 NN/Examples/Data/generate_toy_data.py:
NN/Examples/Data/toy_regression.csvwith rowsx1,x2,y(25 samples).
Build:
lake build NN.Examples.Data.Loaders.Csv
The tutorial code is compiled with the rest of TorchLean. For command-line model training, use the
torchlean executable examples in NN/Examples/Models.
Optional flags (tutorial-specific):
--data-dir PATH(default:NN/Examples/Data)--csv PATH(override the CSV file)--seed S(controls shuffling and model initialization)--batch N--epochs E
Public API used here:
Data.fromCsvSupervisedData.Transforms.ComposeData.batchLoadertrain.fitLoaderWithtrain.stepEpochLR
def
NN.Examples.Data.Loaders.Csv.loadDataset
(csvPath : System.FilePath)
{α : Type}
[API.Semantics.Scalar α]
[API.Runtime.Scalar α]
:
Load the CSV dataset, then apply a small input transform pipeline.
The transform pipeline is written once for the chosen scalar type α:
- normalize (here: mean=0, std=1, so it is an easy-to-read "template"), then
- scale inputs by
0.5.