Datasets, Loaders, and File Sources #

This module is TorchLean's public data layer. The intended workflow is:

Convert outside-world datasets to canonical .npy tensors or small numeric CSV files.
Describe those files with TensorSource, SupervisedSource, or LabeledSource.
Load them into shape-typed TorchLean tensors and datasets.
Train with batchLoader / BatchLoader.epoch, the public trainer.train path, or TorchLean.Trainer.trainDataset when an advanced runner loop is still the right tool.

We keep the implementation small and predictable:

datasets are in-memory and pure (often backed by List)
loader shuffling is seed-driven and reproducible
.npy is the canonical numeric interchange format
CSV is supported for small tabular data
MATLAB .mat, PyTorch .pt/.pth, NumPy .npz, and image folders should be converted to .npy with scripts/datasets/torchlean_data_convert.py
there are no multiprocessing workers, memory maps, or pinned-memory support

PyTorch Mapping #

This is inspired by torch.utils.data:

Dataset, DataLoader: https://pytorch.org/docs/stable/data.html
TensorDataset: https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset
DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

TorchLean’s key difference is that samples typically carry type-level shapes (via TensorPack), so many helpers here are shape-aware by construction.

Main Entry Points #

TensorSource: one file plus expected dimensions.
SupervisedSource: two batched tensors, X : (N, xDims...) and Y : (N, yDims...).
LabeledSource: batched inputs plus integer class labels, one-hot encoded on load.
TabularSupervisedSource: one CSV table with input columns followed by target columns.
batchLoader: deterministic, typed minibatching.

For examples and conversion commands, see NN/Examples/Data/README.md.

source

@[reducible, inline]

abbrev NN.API.Data.TensorDataset (α : Type) (shapes : List Spec.Shape) :

Type

Typed analogue of PyTorch's TensorDataset.

In TorchLean, a sample is usually a TensorPack α shapes, i.e. a shape-tracked tuple of tensors.

Instances For

source

def NN.API.Data.fromList {a : Type} (xs : List a) :

Dataset a

Build a dataset from an explicit list of samples.

Instances For

source

def NN.API.Data.requireFiles (exeName : String) (paths : List System.FilePath) (hint : String := "") :

IO Unit

Require that all paths exist, otherwise raise a user-facing error with a shared hint.

Instances For

source

def NN.API.Data.requireFile (exeName label : String) (path : System.FilePath) (hint : String := "") :

IO Unit

Require one named data file to exist.

Instances For

source

def NN.API.Data.requirePairedFiles (exeName xLabel : String) (xPath : System.FilePath) (yLabel : String) (yPath : System.FilePath) (hint : String := "") :

IO Unit

Require paired supervised input/target files to exist.

Instances For

source

def NN.API.Data.writeCsv (path : System.FilePath) (header : List String) (rows : List (List String)) :

IO Unit

Write a small CSV file, creating the parent directory if needed.

Instances For

source

def NN.API.Data.writePredictionCsv1D {n : ℕ} (path : System.FilePath) (input target prediction : Spec.Tensor Float (Tensor.Shape.Vec n)) :

IO Unit

Write a one-dimensional prediction probe CSV.

Rows are i,x,input,target,prediction, where x = i/(n-1) for n > 1. This writes the compact prediction table used by plotting examples such as 1D operator learning.

Instances For

source

def NN.API.Data.toList {a : Type} (ds : Dataset a) :

List a

Materialize a dataset as a list.

Instances For

source

@[simp]

theorem NN.API.Data.toList_fromList {a : Type} (xs : List a) :

toList (fromList xs) = xs

Converting a list to a dataset and back yields the original list.

source

def NN.API.Data.size {a : Type} (ds : Dataset a) :

ℕ

Number of elements in the dataset.

Instances For

source

@[simp]

theorem NN.API.Data.size_fromList {a : Type} (xs : List a) :

size (fromList xs) = xs.length

The size of a dataset built from a list is the list length.

source

def NN.API.Data.isEmpty {a : Type} (ds : Dataset a) :

Bool

Whether the dataset is empty.

Instances For

source

def NN.API.Data.cycleList {a : Type} (xs : List a) (h : xs ≠ []) :

ℕ → a

Build a cycling index function for a nonempty list.

cycleList xs h i returns xs[i % xs.length].

This is useful for in-memory datasets where a fixed-step “PyTorch-like” loop should avoid repeated Option handling.

Instances For

source

def NN.API.Data.cycleListOrError {a : Type} (xs : List a) (err : String := "empty list") :

Except String (ℕ → a)

Like cycleList, but fail with a message if the list is empty.

Fixed-step dataset code can check emptiness once and then index without Option.

Instances For

source

def NN.API.Data.cycleDataset {a : Type} (ds : Dataset a) (h : ds.data.size ≠ 0) :

ℕ → a

Build a cycling index function for a nonempty dataset.

cycleDataset ds h i returns ds[i % ds.size].

This is the dataset analogue of cycleList. It avoids per-step Option handling in fixed-step training loops.

Instances For

source

def NN.API.Data.cycleDatasetOrError {a : Type} (ds : Dataset a) (err : String := "empty dataset") :

Except String (ℕ → a)

Like cycleDataset, but fail with a message if the dataset is empty.

This is the preferred helper for “PyTorch-style” fixed-step loops over in-memory datasets.

Instances For

source

def NN.API.Data.get? {a : Type} (ds : Dataset a) (i : ℕ) :

Option a

Safe indexing into a dataset.

Instances For

source

def NN.API.Data.firstArrayOrError {a : Type} (xs : Array a) (err : String := "empty array") :

Except String a

Return the first array element, or a caller-provided error when the array is empty.

Instances For

source

def NN.API.Data.map {a b : Type} (f : a → b) (ds : Dataset a) :

Dataset b

Map a dataset elementwise (pure, deterministic).

Instances For

source

def NN.API.Data.append {a : Type} (x y : Dataset a) :

Dataset a

Append two datasets, preserving order: all samples from x followed by all samples from y.

Instances For

source

def NN.API.Data.splitAt {a : Type} (n : ℕ) (ds : Dataset a) :

Dataset a × Dataset a

Split a dataset at position n (prefix, suffix).

Instances For

source

def NN.API.Data.shuffle {a : Type} (seed : ℕ) (ds : Dataset a) :

ℕ × Dataset a

Shuffle a dataset deterministically, returning the updated RNG seed and the shuffled dataset.

This is used to implement DataLoader.shuffle behavior in a purely functional way.

Instances For

source

def NN.API.Data.shuffled {a : Type} (seed : ℕ) (ds : Dataset a) :

Dataset a

Deterministically shuffle a dataset when the caller does not need the updated seed.

Instances For

source

def NN.API.Data.randomSplitAt {a : Type} (seed n : ℕ) (ds : Dataset a) :

ℕ × Dataset a × Dataset a

Shuffle once and then split at n.

This is a small building block for train/val splits.

Instances For

source

def NN.API.Data.batches {a : Type} (tag : String) (batchSize : ℕ) (ds : Dataset a) :

Except String (List (List a))

Split a dataset into equal-sized minibatches (as lists), dropping the final partial batch.

This is a low-level helper; ordinary loader code should use DataLoader.epoch or Data.batchedSupervised.

Instances For

source

def NN.API.Data.batchesArray {a : Type} (tag : String) (batchSize : ℕ) (ds : Dataset a) :

Except String (List (Array a))

Like batches, but return each minibatch as an Array instead of a List.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.RawDataLoader (a : Type) :

Type

Untyped analogue of PyTorch's torch.utils.data.DataLoader.

This is the deterministic, purely-functional loader provided by the TorchLean runtime.

Instances For

source

def NN.API.Data.loader {a : Type} (ds : Dataset a) (batchSize : ℕ) (shuffle : Bool := false) (seed : ℕ := 0) (dropLast : Bool := false) :

RawDataLoader a

Construct a RawDataLoader from a dataset.

If shuffle := true, shuffling is deterministic w.r.t. seed. If dropLast := true, incomplete final batches are discarded.

Instances For

source

def NN.API.Data.epoch {a : Type} (name : String) (dl : RawDataLoader a) :

Except String (RawDataLoader a × List (List a))

Run one epoch worth of minibatching and return:

an updated loader (with the new seed), and
the list of minibatches.

Instances For

source

def NN.API.Data.epochCollate {a b : Type} (name : String) (dl : RawDataLoader a) (collate : List a → Except String b) :

Except String (RawDataLoader a × List b)

Like epoch, but apply a user-provided collate function to each minibatch.

This is the TorchLean analogue of PyTorch's collate_fn= option.

Instances For

source

structure NN.API.Data.BatchLoader (α : Type) (n : ℕ) (σ τ : Spec.Shape) :

Type

Typed wrapper around RawDataLoader for supervised samples.

The batch size n is reflected in the type, and BatchLoader.epoch returns fully-collated dim n minibatches (so dropLast=true is required).

raw : RawDataLoader (SupervisedSample α σ τ)
Raw underlying data.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.AnyBatchLoader (α : Type) (σ τ : Spec.Shape) :

Type

Existential wrapper for loaders when the batch size is chosen at runtime.

Instances For

Note on default arguments:

The underlying CSV loaders take an opts : CsvOptions := {} argument. If we write abbrev fromCsvRows := readCsvFloatRows, Lean will apply the default argument and fromCsvRows will no longer accept opts.

So we eta-expand here to keep the public surface configurable.

source

@[reducible, inline]

abbrev NN.API.Data.fromCsvRows (path : System.FilePath) (opts : CsvOptions := { }) :

IO (Runtime.Autograd.Result (List (List Float)))

Read a CSV file as a list of rows of floats.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromCsvPairs (path : System.FilePath) (opts : CsvOptions := { }) :

IO (Runtime.Autograd.Result (Dataset (Float × Float)))

Read a CSV file as (x, y) float pairs.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromCsvVectors (path : System.FilePath) (n : ℕ) (opts : CsvOptions := { }) :

IO (Runtime.Autograd.Result (Dataset (Spec.Tensor Float (Spec.Shape.dim n Spec.Shape.scalar))))

Read a CSV file as length-n float vectors.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromNpy (path : System.FilePath) :

IO (Runtime.Autograd.Result Runtime.Autograd.Train.NpyData)

Read a .npy file into a TorchLean dataset.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromNpyVector (path : System.FilePath) (n : ℕ) :

IO (Runtime.Autograd.Result (Spec.Tensor Float (Spec.Shape.dim n Spec.Shape.scalar)))

Read a .npy file as a vector dataset.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromNpyMatrix (path : System.FilePath) (m n : ℕ) :

IO (Runtime.Autograd.Result (Spec.Tensor Float (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))))

Read a .npy file as a matrix dataset.

Instances For

source

def NN.API.Data.availableNpyRows (path : System.FilePath) (tailShape : List ℕ) (expectedDesc : String) :

IO (Except String ℕ)

Read the row count from an .npy file and check its trailing shape.

For a batched tensor with shape (N, d₁, ..., dₖ), this returns N when the trailing dimensions match tailShape.

Instances For

source

def NN.API.Data.supervised {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {σ τ : Spec.Shape} (xs : List (Spec.Tensor Float σ × Spec.Tensor Float τ)) :

Dataset (TorchLean.TensorPack α [σ, τ])

Convert a list of (x, y) float tensors into a dataset of TorchLean supervised samples.

This casts float data into the selected scalar backend α and packs it into a TensorPack α [σ, τ].

Instances For

source

def NN.API.Data.labeled {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {σ : Spec.Shape} (classes : ℕ) (xs : List (Spec.Tensor Float σ × ℕ)) :

Dataset (TorchLean.TensorPack α [σ, Tensor.Shape.Vec classes])

Convert a list of (x, label) pairs into a dataset of one-hot classification samples.

Labels are given as Nat and converted to one-hot targets of shape Vec classes.

Instances For

TensorDataset (dim0 batching) #

PyTorch's TensorDataset concept is: given one or more tensors that share the same size(0), build a dataset of samples by slicing each tensor along dimension 0.

In TorchLean we do the same thing, but with shapes tracked in the type:

a batched tensor has shape .dim n σ,
slicing at i : Fin n yields a sample of shape σ,
and a batch of multiple tensors is represented as a TensorPack.

source

def NN.API.Data.unbatchTListDim0 {β : Type} {n : ℕ} {ss : List Spec.Shape} :

TorchLean.TensorPack β (List.map (fun (s : Spec.Shape) => Spec.Shape.dim n s) ss) → Fin n → TorchLean.TensorPack β ss

Slice a batched TensorPack along dimension 0.

If a sample is represented as a shape-indexed tuple TensorPack β ss, then a minibatch of size n is TensorPack β (ss.map (fun s => .dim n s)). This function picks a batch index i : Fin n and returns the corresponding single sample.

Instances For

source

def NN.API.Data.castTListOfFloat {α : Type} [Runtime.Scalar α] {ss : List Spec.Shape} :

TorchLean.TensorPack Float ss → TorchLean.TensorPack α ss

Convert a shape-indexed TensorPack of Float tensors to the runtime scalar type α.

Instances For

source

def NN.API.Data.tensorDatasetDim0 {β : Type} {n : ℕ} {ss : List Spec.Shape} (xs : TorchLean.TensorPack β (List.map (fun (s : Spec.Shape) => Spec.Shape.dim n s) ss)) :

Dataset (TorchLean.TensorPack β ss)

Build a dataset by slicing a batched TensorPack along dim0.

This is the TorchLean analogue of PyTorch's TensorDataset(t1, t2, ...).

Instances For

source

def NN.API.Data.tensorDatasetDim0F {α : Type} [Runtime.Scalar α] {n : ℕ} {ss : List Spec.Shape} (xs : TorchLean.TensorPack Float (List.map (fun (s : Spec.Shape) => Spec.Shape.dim n s) ss)) :

Dataset (TorchLean.TensorPack α ss)

Float-to-α variant of tensorDatasetDim0, for data loaded from disk.

Instances For

source

def NN.API.Data.supervisedDim0 {α : Type} {n : ℕ} {σ τ : Spec.Shape} (X : Spec.Tensor α (Spec.Shape.dim n σ)) (Y : Spec.Tensor α (Spec.Shape.dim n τ)) :

Dataset (TorchLean.TensorPack α [σ, τ])

Supervised dataset from two batched tensors X : (n, σ) and Y : (n, τ) by slicing dim0.

This is the common regression/supervised-learning case: the TorchLean analogue of TensorDataset(X, Y) in PyTorch.

Instances For

source

def NN.API.Data.supervisedDim0F {α : Type} [Runtime.Scalar α] {n : ℕ} {σ τ : Spec.Shape} (X : Spec.Tensor Float (Spec.Shape.dim n σ)) (Y : Spec.Tensor Float (Spec.Shape.dim n τ)) :

Dataset (TorchLean.TensorPack α [σ, τ])

Float-to-α variant of supervisedDim0, for data loaded from disk.

Instances For

Higher-level loaders (PyTorch-style ergonomics) #

These are convenience helpers on top of the low-level CSV/NPY readers so example code can stay "data first" without re-implementing row splitting and casting at every call site.

source

def NN.API.Data.fromNpyTensorND (path : System.FilePath) (dims : List ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load an N-D tensor from a .npy file, checking the on-disk shape matches dims.

Instances For

source

def NN.API.Data.fromNpyTensorNDPrefixDim0 (path : System.FilePath) (dims : List ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load an N-D tensor from a .npy file, allowing the file to contain more rows on dim 0.

This is the dataset-loader analogue of taking tensor[:n] in PyTorch. The rank and trailing dimensions must still match exactly; only the leading dimension may be larger than requested.

We use this for dataset sources rather than the stricter fromNpyTensorND because an exported dataset usually has a fixed full size, while local runs often request a bounded prefix. For example, a CIFAR file may have shape (50000, 3, 32, 32) while an example command asks for n = 80; the resulting TorchLean tensor has type-level shape (80, 3, 32, 32).

This is still a checked loader, not an implicit reshape:

rank must agree;
all trailing dimensions must agree;
the file must contain at least the requested number of rows;
only C-order NPY files can be prefix-loaded efficiently by the low-level parser.

Instances For

source

def NN.API.Data.fromNpyImage (path : System.FilePath) (c h w : ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.Shape.Image c h w)))

Load an image tensor from a .npy file, checking it has shape (C, H, W).

Instances For

source

def NN.API.Data.fromNpyImages (path : System.FilePath) (n c h w : ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.Shape.Images n c h w)))

Load a batch of images from a .npy file, checking it has shape (N, C, H, W).

Instances For

source

def NN.API.Data.natLabelOfFloat (tag : String) (classes : ℕ) (x : Float) :

Except String ℕ

Parse a float-encoded class label as a Nat in [0, classes).

Instances For

source

def NN.API.Data.labeledDim0 {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (tag : String) (classes : ℕ) {n : ℕ} {σ : Spec.Shape} (X : Spec.Tensor Float (Spec.Shape.dim n σ)) (y : Spec.Tensor Float (Tensor.Shape.Vec n)) :

Except String (Dataset (TorchLean.TensorPack α [σ, Tensor.Shape.Vec classes]))

Labeled dataset from a batched tensor X : (n, σ) and a label vector y : (n,).

Labels are stored as floats (common when exporting from NumPy); we validate each label is an integer in [0, classes), then one-hot encode it.

Instances For

source

def NN.API.Data.fromNpySupervised {α : Type} [Runtime.Scalar α] (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.shapeOfDims xDims, Tensor.shapeOfDims yDims])))

Load a supervised dataset from two .npy files containing batched arrays:

X.npy has shape (n, xDims...)
Y.npy has shape (n, yDims...)

and we build a dataset by slicing along dim0.

Instances For

source

def NN.API.Data.fromNpyLabeled {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (xPath yPath : System.FilePath) (n : ℕ) (xDims : List ℕ) (classes : ℕ) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.shapeOfDims xDims, Tensor.Shape.Vec classes])))

Load a labeled classification dataset from two .npy files:

X.npy has shape (n, xDims...)
y.npy has shape (n,) with float-encoded integer labels in [0, classes)

and we build a dataset by slicing along dim0 and one-hot encoding the labels.

Instances For

source

def NN.API.Data.fromCsvSupervised {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (path : System.FilePath) (inDim outDim : ℕ) (opts : CsvOptions := { }) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.Shape.Vec inDim, Tensor.Shape.Vec outDim])))

Load a supervised dataset from a CSV with inDim + outDim columns per row:

x1, ..., x_inDim, y1, ..., y_outDim.

Instances For

source

def NN.API.Data.fromCsvLabeled {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (path : System.FilePath) (inDim classes : ℕ) (opts : CsvOptions := { }) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.Shape.Vec inDim, Tensor.Shape.Vec classes])))

Load a labeled dataset from a CSV with inDim + 1 columns per row:

x1, ..., x_inDim, label where label is in {0, ..., classes-1}.

Instances For

Unified file-source layer #

The lower-level helpers above stay close to file formats (fromNpyTensorND, fromCsvRows, fromNpySupervised, ...). The definitions below give examples and applications a single scheme:

describe each tensor as a TensorSource;
load it as a typed TorchLean tensor;
build supervised/labeled datasets by slicing dim0, just like PyTorch TensorDataset.

Policy for external ecosystems:

NumPy .npy is the canonical interchange format for numeric tensors.
CSV is supported for small tabular data.
MATLAB .mat, PyTorch checkpoints, HDF5, Parquet, and image archives should be converted by a small preparation script into .npy tensors plus metadata. The Lean runtime loader intentionally handles a small deterministic interchange format rather than every external binary format.

source

inductive NN.API.Data.TensorFormat :

Type

File formats supported directly by the Lean-side unified data-source loader.

npy : TensorFormat
NumPy .npy, supporting the subset decoded by fromNpyTensorND.
csv : TensorFormat
Numeric CSV table. CSV sources are interpreted as 2D tensors [rows, cols].

Instances For

source

@[implicit_reducible]

instance NN.API.Data.instBEqTensorFormat :

BEq TensorFormat

source

def NN.API.Data.instBEqTensorFormat.beq :

TensorFormat → TensorFormat → Bool

Instances For

source

@[implicit_reducible]

instance NN.API.Data.instReprTensorFormat :

Repr TensorFormat

source

def NN.API.Data.instReprTensorFormat.repr :

TensorFormat → ℕ → Std.Format

Instances For

source

def NN.API.Data.TensorFormat.extension :

TensorFormat → String

Human-facing extension used by messages and examples.

Instances For

source

structure NN.API.Data.TensorSource :

Type

Description of one tensor stored on disk.

dims is the expected tensor shape. NPY can load any rank supported by tensorND; CSV is treated as a numeric table and therefore expects dims = [rows, cols].

path : System.FilePath
Path to the file.
dims : List ℕ
Expected dimensions.
format : TensorFormat
Direct Lean-side format. External formats should be preconverted to .npy.
csvOptions : CsvOptions
CSV parsing options, used only when format = .csv.

Instances For

source

def NN.API.Data.TensorSource.loadCsvTensorND (path : System.FilePath) (dims : List ℕ) (opts : CsvOptions := { }) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load a numeric CSV table as a tensor.

Supported shapes:

[rows, cols]: ordinary numeric table,
[n]: either one column with n rows or one row with n columns.

Instances For

source

def NN.API.Data.TensorSource.loadFloatAs (format : TensorFormat) (path : System.FilePath) (dims : List ℕ) (opts : CsvOptions := { }) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load a Float tensor from a path/format/dimension tuple.

Instances For

source

def NN.API.Data.TensorSource.loadFloatPrefixDim0As (format : TensorFormat) (path : System.FilePath) (dims : List ℕ) (opts : CsvOptions := { }) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load a Float tensor, allowing NPY files to contain more rows than requested on dim 0.

TensorSource.loadFloatAs is exact: the file shape must equal dims. This prefix variant is for dataset-style sources where dims starts with the number of rows requested by the current run. CSV sources remain exact because CSV has no binary prefix contract; NPY sources use fromNpyTensorNDPrefixDim0.

Instances For

source

def NN.API.Data.TensorSource.loadFloat (src : TensorSource) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims src.dims)))

Load a TensorSource as a Float tensor with the statically reflected shapeOfDims src.dims.

Instances For

source

structure NN.API.Data.SupervisedSource :

Type

Two tensor sources representing supervised data:

x must have shape (n, xDims...),
y must have shape (n, yDims...).

n : ℕ
Number of samples along dim0.
xDims : List ℕ
Per-sample input dimensions.
yDims : List ℕ
Per-sample target dimensions.
x : TensorSource
Source for the batched input tensor.
y : TensorSource
Source for the batched target tensor.

Instances For

source

def NN.API.Data.SupervisedSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) (csvOptions : CsvOptions := { }) :

SupervisedSource

Construct a supervised source from paths using the same file format for x and y.

Instances For

source

def NN.API.Data.SupervisedSource.load {α : Type} [Runtime.Scalar α] (src : SupervisedSource) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.shapeOfDims src.xDims, Tensor.shapeOfDims src.yDims])))

Load a supervised dataset by slicing dim0 from the two tensors.

This is the preferred public loader for regression/operator-learning examples, regardless of whether the backing files are .npy or small numeric CSV tables.

Instances For

source

def NN.API.Data.supervisedNpySource (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) :

SupervisedSource

Paired .npy source for supervised regression or operator-learning datasets.

Instances For

source

def NN.API.Data.loadSupervisedNpyFloatSamples (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) :

IO (Except String (Array (SupervisedSample Float (Tensor.shapeOfDims xDims) (Tensor.shapeOfDims yDims))))

Load paired .npy files as concrete Float supervised samples.

This is useful for reporting, custom evaluation loops, and native kernels that need concrete Float tensors outside the public trainer facade.

Instances For

source

structure NN.API.Data.LabeledSource :

Type

Two tensor sources representing labeled classification data:

x must have shape (n, xDims...),
y must have shape (n,) and contain integer-valued labels.

n : ℕ
Number of samples along dim0.
xDims : List ℕ
Per-sample input dimensions.
classes : ℕ
Number of classes for one-hot targets.
x : TensorSource
Source for the batched input tensor.
y : TensorSource
Source for the label vector.

Instances For

source

def NN.API.Data.LabeledSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ℕ) (xDims : List ℕ) (classes : ℕ) (csvOptions : CsvOptions := { }) :

LabeledSource

Construct a labeled source from paths using the same file format for x and y.

Instances For

source

def NN.API.Data.LabeledSource.load {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (src : LabeledSource) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.shapeOfDims src.xDims, Tensor.Shape.Vec src.classes])))

Load a labeled classification dataset by slicing dim0 and one-hot encoding labels.

For CSV label vectors, store labels as a single-column table with dims = [n, 1] and use a custom TensorSource if needed; the path constructor above is aimed at .npy label vectors.

Instances For

source

structure NN.API.Data.TabularSupervisedSource :

Type

Single-table supervised CSV source.

Use this when one CSV row contains both input and target columns: x1, ..., x_inDim, y1, ..., y_outDim.

path : System.FilePath
CSV file path.
inDim : ℕ
Number of input feature columns.
outDim : ℕ
Number of target columns.
csvOptions : CsvOptions
CSV parsing options.

Instances For

source

def NN.API.Data.TabularSupervisedSource.load {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (src : TabularSupervisedSource) :

IO (Except String (Dataset (TorchLean.TensorPack α [Tensor.Shape.Vec src.inDim, Tensor.Shape.Vec src.outDim])))

Load a single-table supervised CSV source.

Instances For

source

def NN.API.Data.supervisedRows {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {n inDim outDim : ℕ} (X : Spec.Tensor Float (Tensor.Shape.Mat n inDim)) (Y : Spec.Tensor Float (Tensor.Shape.Mat n outDim)) :

Dataset (TorchLean.TensorPack α [Tensor.Shape.Vec inDim, Tensor.Shape.Vec outDim])

Build a supervised dataset from two matrices X : n×inDim and Y : n×outDim by pairing rows.

This is the TorchLean analogue of PyTorch's TensorDataset(X, Y) for simple regression.

Instances For

source

def NN.API.Data.collateSupervised {α : Type} {σ τ : Spec.Shape} (n : ℕ) (batch : List (TorchLean.TensorPack α [σ, τ])) :

Except String (TorchLean.TensorPack α [Spec.Shape.dim n σ, Spec.Shape.dim n τ])

Collate a length-n supervised batch into a single sample with a leading batch axis.

If your samples are (x : σ, y : τ), the collated sample is:

xBatch : (n × σ) and
yBatch : (n × τ)

In shapes: TensorPack α [dim n σ, dim n τ].

Instances For

source

def NN.API.Data.chunkN {a : Type} (n : ℕ) (xs : List a) :

List (List a)

Split a list into consecutive length-n chunks, dropping any final short chunk.

Instances For

source

def NN.API.Data.chunkN.go {a : Type} (n : ℕ) (xs : List a) (fuel : ℕ) :

List (List a)

Instances For

source

def NN.API.Data.batchedSupervised {α : Type} {σ τ : Spec.Shape} (n : ℕ) (ds : Dataset (TorchLean.TensorPack α [σ, τ])) :

Except String (Dataset (TorchLean.TensorPack α [Spec.Shape.dim n σ, Spec.Shape.dim n τ]))

Turn a per-sample supervised dataset into a dataset of fixed-size minibatches.

This is useful for metrics (meanLossDataset, accuracy, etc.) when your model expects a leading batch axis.

Notes:

This drops the final partial batch (PyTorch drop_last=True behavior).
Batches are formed in dataset order (shuffling is the loader's job).

Instances For

source

def NN.API.Data.BatchLoader.dataset {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

Dataset (SupervisedSample α σ τ)

Extract the underlying per-sample dataset from a typed BatchLoader.

Instances For

source

def NN.API.Data.BatchLoader.batchSize {α : Type} {n : ℕ} {σ τ : Spec.Shape} (_dl : BatchLoader α n σ τ) :

ℕ

The batch size n carried in the type of a BatchLoader.

Instances For

source

def NN.API.Data.BatchLoader.shuffled {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

Bool

Whether the loader is configured to shuffle samples each epoch.

Instances For

source

def NN.API.Data.BatchLoader.seed {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

ℕ

RNG seed used for shuffling (if enabled).

Instances For

source

def NN.API.Data.BatchLoader.batchDataset {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

Except String (Dataset (sample.Batch α n σ τ))

Materialize the dataset as a dataset of full minibatches (dropping any final partial batch).

Instances For

source

def NN.API.Data.BatchLoader.epoch {α : Type} {n : ℕ} {σ τ : Spec.Shape} (name : String) (dl : BatchLoader α n σ τ) :

Except String (BatchLoader α n σ τ × List (sample.Batch α n σ τ))

Run one epoch: return the updated loader state and a list of typed minibatches.

Instances For

source

def NN.API.Data.BatchLoader.epochCollate {α β : Type} {n : ℕ} {σ τ : Spec.Shape} (name : String) (dl : BatchLoader α n σ τ) (f : sample.Batch α n σ τ → Except String β) :

Except String (BatchLoader α n σ τ × List β)

Like epoch, but post-process each minibatch with a user-supplied collate/transform f.

Instances For

source

def NN.API.Data.BatchLoader.nonemptyEpoch {α : Type} {n : ℕ} {σ τ : Spec.Shape} (name : String) (dl : BatchLoader α n σ τ) :

Except String (BatchLoader α n σ τ × List (sample.Batch α n σ τ))

Run one epoch and require at least one full typed minibatch.

This is the shared checked boundary for examples that need a nonempty list of full batches. It keeps the "drop partial batches, but fail if nothing remains" policy with the loader API rather than repeating it in each dataset-specific helper.

Instances For

source

def NN.API.Data.BatchLoader.firstFullBatch {α : Type} {n : ℕ} {σ τ : Spec.Shape} (name : String) (dl : BatchLoader α n σ τ) :

Except String (sample.Batch α n σ τ)

Run one epoch and return its first full typed minibatch.

Instances For

source

def NN.API.Data.batchLoader {α : Type} {σ τ : Spec.Shape} (ds : Dataset (SupervisedSample α σ τ)) (batchSize : ℕ) (shuffle : Bool := false) (seed : ℕ := 0) (dropLast : Bool := true) :

BatchLoader α batchSize σ τ

Public loader API: supervised datasets become fixed-size minibatch loaders by default.

The underlying dataset still stores individual samples; the loader batches them and epoch returns tensors with a leading dim0 batch axis. Because the batch size is reflected in the type, the public batched path requires full batches, so dropLast defaults to true.

Instances For

source

def NN.API.Data.tabularCsvLoader {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (path : System.FilePath) (batchSize inDim outDim : ℕ) (csvOptions : CsvOptions := { }) (shuffle : Bool := true) (seed : ℕ := 0) (dropLast : Bool := true) :

IO (Except String (BatchLoader α batchSize (Tensor.Shape.Vec inDim) (Tensor.Shape.Vec outDim)))

Load a numeric supervised CSV and immediately wrap it as a typed minibatch loader.

The CSV convention is the same as TabularSupervisedSource: each row contains inDim feature columns followed by outDim target columns. This belongs in the data API rather than in an individual model file because tabular examples, benchmarks, and downstream users all need the same operation: CSV -> typed dataset -> shuffled minibatch loader.

Instances For

source

def NN.API.Data.loaderAny {α : Type} {σ τ : Spec.Shape} (ds : Dataset (SupervisedSample α σ τ)) (batchSize : ℕ) (shuffle : Bool := false) (seed : ℕ := 0) (dropLast : Bool := true) :

AnyBatchLoader α σ τ

Build a batch loader when the batch size is only known at runtime.

Instances For

TorchLean API

NN.API.Data.Core

Datasets, Loaders, and File Sources #

PyTorch Mapping #

Main Entry Points #

TensorDataset (dim0 batching) #

Higher-level loaders (PyTorch-style ergonomics) #

Unified file-source layer #