Datasets, Loaders, and File Sources #

This module is TorchLean's public data layer. The intended workflow is:

Convert outside-world datasets to canonical .npy tensors or small numeric CSV files.
Describe those files with TensorSource, SupervisedSource, or LabeledSource.
Load them into shape-typed TorchLean tensors and datasets.
Train with batchLoader / BatchLoader.epoch or the higher-level train.fit* helpers.

We keep the implementation small and predictable:

datasets are in-memory and pure (often backed by List)
loader shuffling is seed-driven and reproducible
.npy is the canonical numeric interchange format
CSV is supported for small tabular data
MATLAB .mat, PyTorch .pt/.pth, NumPy .npz, and image folders should be converted to .npy with scripts/datasets/torchlean_data_convert.py
there are no multiprocessing workers, memory maps, or pinned-memory support

PyTorch Mapping #

This is inspired by torch.utils.data:

Dataset, DataLoader: https://pytorch.org/docs/stable/data.html
TensorDataset: https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset
DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

TorchLean’s key difference is that samples typically carry type-level shapes (via TList), so many helpers here are shape-aware by construction.

Main Entry Points #

TensorSource: one file plus expected dimensions.
SupervisedSource: two batched tensors, X : (N, xDims...) and Y : (N, yDims...).
LabeledSource: batched inputs plus integer class labels, one-hot encoded on load.
TabularSupervisedSource: one CSV table with input columns followed by target columns.
batchLoader: deterministic, typed minibatching.

For examples and conversion commands, see NN/Examples/Data/README.md.

source

@[reducible, inline]

abbrev NN.API.Data.TensorDataset (α : Type) (shapes : List Spec.Shape) :

Type

Typed analogue of PyTorch's TensorDataset.

In TorchLean, a "sample" is usually a TList α shapes, i.e. a tuple of tensors whose shapes are tracked by the type-level list shapes.

Instances For

source

def NN.API.Data.fromList {a : Type} (xs : List a) :

Dataset a

Build a dataset from an explicit list of samples.

Instances For

source

def NN.API.Data.requireFiles (exeName : String) (paths : List System.FilePath) (hint : String := "") :

IO Unit

Require that all paths exist, otherwise raise a user-facing error with a shared hint.

Instances For

source

def NN.API.Data.writeCsv (path : System.FilePath) (header : List String) (rows : List (List String)) :

IO Unit

Write a small CSV file, creating the parent directory if needed.

Instances For

source

def NN.API.Data.writePredictionCsv1D {n : ℕ} (path : System.FilePath) (input target prediction : Spec.Tensor Float (Tensor.Shape.Vec n)) :

IO Unit

Write a one-dimensional prediction probe CSV.

Rows are i,x,input,target,prediction, where x = i/(n-1) for n > 1. This is intentionally simple and meant for plotting examples such as 1D operator learning.

Instances For

source

def NN.API.Data.toList {a : Type} (ds : Dataset a) :

List a

Materialize a dataset as a list.

Instances For

source

@[simp]

theorem NN.API.Data.toList_fromList {a : Type} (xs : List a) :

toList (fromList xs) = xs

Converting a list to a dataset and back yields the original list.

source

def NN.API.Data.size {a : Type} (ds : Dataset a) :

ℕ

Number of elements in the dataset.

Instances For

source

@[simp]

theorem NN.API.Data.size_fromList {a : Type} (xs : List a) :

size (fromList xs) = xs.length

The size of a dataset built from a list is the list length.

source

def NN.API.Data.isEmpty {a : Type} (ds : Dataset a) :

Bool

Whether the dataset is empty.

Instances For

source

def NN.API.Data.cycleList {a : Type} (xs : List a) (h : xs ≠ []) :

ℕ → a

Build a cycling index function for a nonempty list.

cycleList xs h i returns xs[i % xs.length].

This is useful in small in-memory demos where you want a fixed-step “PyTorch-like” loop without repeated Option handling.

Instances For

source

def NN.API.Data.cycleListOrError {a : Type} (xs : List a) (err : String := "empty list") :

Except String (ℕ → a)

Like cycleList, but fail with a message if the list is empty.

This is designed to keep tutorial code tidy: check emptiness once, then index without Option.

Instances For

source

def NN.API.Data.cycleDataset {a : Type} (ds : Dataset a) (h : ds.data.size ≠ 0) :

ℕ → a

Build a cycling index function for a nonempty dataset.

cycleDataset ds h i returns ds[i % ds.size].

This is the dataset analogue of cycleList. It avoids per-step Option handling in small demos.

Instances For

source

def NN.API.Data.cycleDatasetOrError {a : Type} (ds : Dataset a) (err : String := "empty dataset") :

Except String (ℕ → a)

Like cycleDataset, but fail with a message if the dataset is empty.

This is the preferred helper for “PyTorch-style” fixed-step loops over in-memory datasets.

Instances For

source

def NN.API.Data.get? {a : Type} (ds : Dataset a) (i : ℕ) :

Option a

Safe indexing into a dataset.

Instances For

source

def NN.API.Data.map {a b : Type} (f : a → b) (ds : Dataset a) :

Dataset b

Map a dataset elementwise (pure, deterministic).

Instances For

source

def NN.API.Data.append {a : Type} (x y : Dataset a) :

Dataset a

Concatenate two datasets.

Instances For

source

def NN.API.Data.splitAt {a : Type} (n : ℕ) (ds : Dataset a) :

Dataset a × Dataset a

Split a dataset at position n (prefix, suffix).

Instances For

source

def NN.API.Data.shuffle {a : Type} (seed : ℕ) (ds : Dataset a) :

ℕ × Dataset a

Shuffle a dataset deterministically, returning the updated RNG seed and the shuffled dataset.

This is used to implement DataLoader.shuffle behavior in a purely functional way.

Instances For

source

def NN.API.Data.shuffled {a : Type} (seed : ℕ) (ds : Dataset a) :

Dataset a

Convenience wrapper around shuffle that discards the updated seed.

Instances For

source

def NN.API.Data.randomSplitAt {a : Type} (seed n : ℕ) (ds : Dataset a) :

ℕ × Dataset a × Dataset a

Shuffle once and then split at n.

This is a small building block for train/val splits.

Instances For

source

def NN.API.Data.batches {a : Type} (tag : String) (batchSize : ℕ) (ds : Dataset a) :

Except String (List (List a))

Split a dataset into equal-sized minibatches (as lists), dropping the final partial batch.

This is a low-level helper; most users should use DataLoader.epoch or Data.batchedSupervised.

Instances For

source

def NN.API.Data.batchesArray {a : Type} (tag : String) (batchSize : ℕ) (ds : Dataset a) :

Except String (List (Array a))

Like batches, but return each minibatch as an Array instead of a List.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.RawDataLoader (a : Type) :

Type

Untyped analogue of PyTorch's torch.utils.data.DataLoader.

This is the deterministic, purely-functional loader provided by the TorchLean runtime.

Instances For

source

def NN.API.Data.loader {a : Type} (ds : Dataset a) (batchSize : ℕ) (shuffle : Bool := false) (seed : ℕ := 0) (dropLast : Bool := false) :

RawDataLoader a

Construct a RawDataLoader from a dataset.

If shuffle := true, shuffling is deterministic w.r.t. seed. If dropLast := true, incomplete final batches are discarded.

Instances For

source

def NN.API.Data.epoch {a : Type} (name : String) (dl : RawDataLoader a) :

Except String (RawDataLoader a × List (List a))

Run one epoch worth of minibatching and return:

an updated loader (with the new seed), and
the list of minibatches.

Instances For

source

def NN.API.Data.epochCollate {a b : Type} (name : String) (dl : RawDataLoader a) (collate : List a → Except String b) :

Except String (RawDataLoader a × List b)

Like epoch, but apply a user-provided collate function to each minibatch.

This is the TorchLean analogue of PyTorch's collate_fn= option.

Instances For

source

structure NN.API.Data.BatchLoader (α : Type) (n : ℕ) (σ τ : Spec.Shape) :

Type

Typed wrapper around RawDataLoader for supervised samples.

The batch size n is reflected in the type, and BatchLoader.epoch returns fully-collated dim n minibatches (so dropLast=true is required).

raw : RawDataLoader (sample.Supervised α σ τ)
Raw underlying data.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.AnyBatchLoader (α : Type) (σ τ : Spec.Shape) :

Type

Existential wrapper for loaders when the batch size is chosen at runtime.

Instances For

Note on default arguments:

The underlying CSV loaders take an opts : CsvOptions := {} argument. If we write abbrev fromCsvRows := readCsvFloatRows, Lean will apply the default argument and fromCsvRows will no longer accept opts.

So we eta-expand here to keep the public surface configurable.

source

@[reducible, inline]

abbrev NN.API.Data.fromCsvRows (path : System.FilePath) (opts : CsvOptions := { }) :

IO (Runtime.Autograd.Result (List (List Float)))

Read a CSV file as a list of rows of floats.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromCsvPairs (path : System.FilePath) (opts : CsvOptions := { }) :

IO (Runtime.Autograd.Result (Dataset (Float × Float)))

Read a CSV file as (x, y) float pairs.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromCsvVectors (path : System.FilePath) (n : ℕ) (opts : CsvOptions := { }) :

IO (Runtime.Autograd.Result (Dataset (Spec.Tensor Float (Spec.Shape.dim n Spec.Shape.scalar))))

Read a CSV file as length-n float vectors.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromNpy (path : System.FilePath) :

IO (Runtime.Autograd.Result Runtime.Autograd.Train.NpyData)

Read a .npy file into a TorchLean dataset.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromNpyVector (path : System.FilePath) (n : ℕ) :

IO (Runtime.Autograd.Result (Spec.Tensor Float (Spec.Shape.dim n Spec.Shape.scalar)))

Read a .npy file as a vector dataset.

Instances For

source

@[reducible, inline]

abbrev NN.API.Data.fromNpyMatrix (path : System.FilePath) (m n : ℕ) :

IO (Runtime.Autograd.Result (Spec.Tensor Float (Spec.Shape.dim m (Spec.Shape.dim n Spec.Shape.scalar))))

Read a .npy file as a matrix dataset.

Instances For

source

def NN.API.Data.supervised {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {σ τ : Spec.Shape} (xs : List (Spec.Tensor Float σ × Spec.Tensor Float τ)) :

Dataset (Runtime.Autograd.Torch.TList α [σ, τ])

Convert a list of (x, y) float tensors into a dataset of TorchLean supervised samples.

This casts float data into the selected scalar backend α and packs it into a TList α [σ, τ].

Instances For

source

def NN.API.Data.labeled {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {σ : Spec.Shape} (classes : ℕ) (xs : List (Spec.Tensor Float σ × ℕ)) :

Dataset (Runtime.Autograd.Torch.TList α [σ, Tensor.Shape.Vec classes])

Convert a list of (x, label) pairs into a dataset of one-hot classification samples.

Labels are given as Nat and converted to one-hot targets of shape Vec classes.

Instances For

TensorDataset (dim0 batching) #

PyTorch's TensorDataset concept is: given one or more tensors that share the same size(0), build a dataset of samples by slicing each tensor along dimension 0.

In TorchLean we do the same thing, but with shapes tracked in the type:

a batched tensor has shape .dim n σ,
slicing at i : Fin n yields a sample of shape σ,
and a batch of multiple tensors is represented as a TList.

source

def NN.API.Data.unbatchTListDim0 {β : Type} {n : ℕ} {ss : List Spec.Shape} :

Runtime.Autograd.Torch.TList β (List.map (fun (s : Spec.Shape) => Spec.Shape.dim n s) ss) → Fin n → Runtime.Autograd.Torch.TList β ss

Slice a batched TList along dimension 0.

If a sample is represented as a shape-indexed tuple TList β ss, then a minibatch of size n is TList β (ss.map (fun s => .dim n s)). This function picks a batch index i : Fin n and returns the corresponding single sample.

Instances For

source

def NN.API.Data.castTListOfFloat {α : Type} [Runtime.Scalar α] {ss : List Spec.Shape} :

Runtime.Autograd.Torch.TList Float ss → Runtime.Autograd.Torch.TList α ss

Convert a shape-indexed TList of Float tensors to the runtime scalar type α.

Instances For

source

def NN.API.Data.tensorDatasetDim0 {β : Type} {n : ℕ} {ss : List Spec.Shape} (xs : Runtime.Autograd.Torch.TList β (List.map (fun (s : Spec.Shape) => Spec.Shape.dim n s) ss)) :

Dataset (Runtime.Autograd.Torch.TList β ss)

Build a dataset by slicing a batched TList along dim0.

This is the TorchLean analogue of PyTorch's TensorDataset(t1, t2, ...).

Instances For

source

def NN.API.Data.tensorDatasetDim0F {α : Type} [Runtime.Scalar α] {n : ℕ} {ss : List Spec.Shape} (xs : Runtime.Autograd.Torch.TList Float (List.map (fun (s : Spec.Shape) => Spec.Shape.dim n s) ss)) :

Dataset (Runtime.Autograd.Torch.TList α ss)

Float-to-α variant of tensorDatasetDim0, for data loaded from disk.

Instances For

source

def NN.API.Data.supervisedDim0 {α : Type} {n : ℕ} {σ τ : Spec.Shape} (X : Spec.Tensor α (Spec.Shape.dim n σ)) (Y : Spec.Tensor α (Spec.Shape.dim n τ)) :

Dataset (Runtime.Autograd.Torch.TList α [σ, τ])

Supervised dataset from two batched tensors X : (n, σ) and Y : (n, τ) by slicing dim0.

This is the common regression/supervised-learning case: the TorchLean analogue of TensorDataset(X, Y) in PyTorch.

Instances For

source

def NN.API.Data.supervisedDim0F {α : Type} [Runtime.Scalar α] {n : ℕ} {σ τ : Spec.Shape} (X : Spec.Tensor Float (Spec.Shape.dim n σ)) (Y : Spec.Tensor Float (Spec.Shape.dim n τ)) :

Dataset (Runtime.Autograd.Torch.TList α [σ, τ])

Float-to-α variant of supervisedDim0, for data loaded from disk.

Instances For

Higher-level loaders (PyTorch-style ergonomics) #

These are convenience helpers on top of the low-level CSV/NPY readers so example code can stay "data first" without re-implementing row splitting and casting at every call site.

source

def NN.API.Data.fromNpyTensorND (path : System.FilePath) (dims : List ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load an N-D tensor from a .npy file, checking the on-disk shape matches dims.

Instances For

source

def NN.API.Data.fromNpyTensorNDPrefixDim0 (path : System.FilePath) (dims : List ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load an N-D tensor from a .npy file, allowing the file to contain more rows on dim 0.

This is the dataset-loader analogue of taking tensor[:n] in PyTorch. The rank and trailing dimensions must still match exactly; only the leading dimension may be larger than requested.

We use this for dataset sources rather than the stricter fromNpyTensorND because a real exported dataset usually has a fixed full size, while tutorials often ask for a small prefix during smoke tests or quick CUDA checks. For example, a CIFAR file may have shape (50000, 3, 32, 32) while a demo command asks for n = 80; the resulting TorchLean tensor has type-level shape (80, 3, 32, 32).

This is intentionally still a checked loader, not an implicit reshape:

rank must agree;
all trailing dimensions must agree;
the file must contain at least the requested number of rows;
only C-order NPY files can be prefix-loaded efficiently by the low-level parser.

Instances For

source

def NN.API.Data.fromNpyImage (path : System.FilePath) (c h w : ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.Shape.Image c h w)))

Load an image tensor from a .npy file, checking it has shape (C, H, W).

Instances For

source

def NN.API.Data.fromNpyImages (path : System.FilePath) (n c h w : ℕ) :

IO (Except String (Spec.Tensor Float (Tensor.Shape.Images n c h w)))

Load a batch of images from a .npy file, checking it has shape (N, C, H, W).

Instances For

source

def NN.API.Data.natLabelOfFloat (tag : String) (classes : ℕ) (x : Float) :

Except String ℕ

Parse a float-encoded class label as a Nat in [0, classes).

Instances For

source

def NN.API.Data.labeledDim0 {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (tag : String) (classes : ℕ) {n : ℕ} {σ : Spec.Shape} (X : Spec.Tensor Float (Spec.Shape.dim n σ)) (y : Spec.Tensor Float (Tensor.Shape.Vec n)) :

Except String (Dataset (Runtime.Autograd.Torch.TList α [σ, Tensor.Shape.Vec classes]))

Labeled dataset from a batched tensor X : (n, σ) and a label vector y : (n,).

Labels are stored as floats (common when exporting from NumPy); we validate each label is an integer in [0, classes), then one-hot encode it.

Instances For

source

def NN.API.Data.fromNpySupervised {α : Type} [Runtime.Scalar α] (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.shapeOfDims xDims, Tensor.shapeOfDims yDims])))

Load a supervised dataset from two .npy files containing batched arrays:

X.npy has shape (n, xDims...)
Y.npy has shape (n, yDims...)

and we build a dataset by slicing along dim0.

Instances For

source

def NN.API.Data.fromNpyLabeled {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (xPath yPath : System.FilePath) (n : ℕ) (xDims : List ℕ) (classes : ℕ) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.shapeOfDims xDims, Tensor.Shape.Vec classes])))

Load a labeled classification dataset from two .npy files:

X.npy has shape (n, xDims...)
y.npy has shape (n,) with float-encoded integer labels in [0, classes)

and we build a dataset by slicing along dim0 and one-hot encoding the labels.

Instances For

source

def NN.API.Data.fromCsvSupervised {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (path : System.FilePath) (inDim outDim : ℕ) (opts : CsvOptions := { }) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.Shape.Vec inDim, Tensor.Shape.Vec outDim])))

Load a supervised dataset from a CSV with inDim + outDim columns per row:

x1, ..., x_inDim, y1, ..., y_outDim.

Instances For

source

def NN.API.Data.fromCsvLabeled {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (path : System.FilePath) (inDim classes : ℕ) (opts : CsvOptions := { }) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.Shape.Vec inDim, Tensor.Shape.Vec classes])))

Load a labeled dataset from a CSV with inDim + 1 columns per row:

x1, ..., x_inDim, label where label is in {0, ..., classes-1}.

Instances For

Unified file-source layer #

The lower-level helpers above intentionally stay close to file formats (fromNpyTensorND, fromCsvRows, fromNpySupervised, ...). The definitions below give examples and applications a single scheme:

describe each tensor as a TensorSource;
load it as a typed TorchLean tensor;
build supervised/labeled datasets by slicing dim0, just like PyTorch TensorDataset.

Policy for external ecosystems:

NumPy .npy is the canonical interchange format for numeric tensors.
CSV is supported for small tabular data.
MATLAB .mat, PyTorch checkpoints, HDF5, Parquet, and image archives should be converted by a small preparation script into .npy tensors plus metadata. This keeps the Lean runtime loader small, deterministic, and auditable instead of embedding every external binary format parser.

source

inductive NN.API.Data.TensorFormat :

Type

File formats supported directly by the Lean-side unified data-source loader.

npy : TensorFormat
NumPy .npy, supporting the subset decoded by fromNpyTensorND.
csv : TensorFormat
Numeric CSV table. CSV sources are interpreted as 2D tensors [rows, cols].

Instances For

source

@[implicit_reducible]

instance NN.API.Data.instBEqTensorFormat :

BEq TensorFormat

source

def NN.API.Data.instBEqTensorFormat.beq :

TensorFormat → TensorFormat → Bool

Instances For

source

def NN.API.Data.instReprTensorFormat.repr :

TensorFormat → ℕ → Std.Format

Instances For

source

@[implicit_reducible]

instance NN.API.Data.instReprTensorFormat :

Repr TensorFormat

source

def NN.API.Data.TensorFormat.extension :

TensorFormat → String

Human-facing extension used by messages and examples.

Instances For

source

structure NN.API.Data.TensorSource :

Type

Description of one tensor stored on disk.

dims is the expected tensor shape. NPY can load any rank supported by tensorND; CSV is treated as a numeric table and therefore expects dims = [rows, cols].

path : System.FilePath
Path to the file.
dims : List ℕ
Expected dimensions.
format : TensorFormat
Direct Lean-side format. External formats should be preconverted to .npy.
csvOptions : CsvOptions
CSV parsing options, used only when format = .csv.

Instances For

source

def NN.API.Data.TensorSource.loadCsvTensorND (path : System.FilePath) (dims : List ℕ) (opts : CsvOptions := { }) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load a numeric CSV table as a tensor.

Supported shapes:

[rows, cols]: ordinary numeric table,
[n]: either one column with n rows or one row with n columns.

Instances For

source

def NN.API.Data.TensorSource.loadFloatAs (format : TensorFormat) (path : System.FilePath) (dims : List ℕ) (opts : CsvOptions := { }) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load a Float tensor from a path/format/dimension tuple.

Instances For

source

def NN.API.Data.TensorSource.loadFloatPrefixDim0As (format : TensorFormat) (path : System.FilePath) (dims : List ℕ) (opts : CsvOptions := { }) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims dims)))

Load a Float tensor, allowing NPY files to contain more rows than requested on dim 0.

TensorSource.loadFloatAs is exact: the file shape must equal dims. This prefix variant is for dataset-style sources where dims starts with the number of rows requested by the current run. CSV sources remain exact because CSV has no cheap binary prefix contract; NPY sources use fromNpyTensorNDPrefixDim0.

Instances For

source

def NN.API.Data.TensorSource.loadFloat (src : TensorSource) :

IO (Except String (Spec.Tensor Float (Tensor.shapeOfDims src.dims)))

Load a TensorSource as a Float tensor with the statically reflected shapeOfDims src.dims.

Instances For

source

structure NN.API.Data.SupervisedSource :

Type

Two tensor sources representing supervised data:

x must have shape (n, xDims...),
y must have shape (n, yDims...).

n : ℕ
Number of samples along dim0.
xDims : List ℕ
Per-sample input dimensions.
yDims : List ℕ
Per-sample target dimensions.
x : TensorSource
Source for the batched input tensor.
y : TensorSource
Source for the batched target tensor.

Instances For

source

def NN.API.Data.SupervisedSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) (csvOptions : CsvOptions := { }) :

SupervisedSource

Construct a supervised source from paths using the same file format for x and y.

Instances For

source

def NN.API.Data.SupervisedSource.load {α : Type} [Runtime.Scalar α] (src : SupervisedSource) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.shapeOfDims src.xDims, Tensor.shapeOfDims src.yDims])))

Load a supervised dataset by slicing dim0 from the two tensors.

This is the preferred public loader for regression/operator-learning examples, regardless of whether the backing files are .npy or small numeric CSV tables.

Instances For

source

structure NN.API.Data.LabeledSource :

Type

Two tensor sources representing labeled classification data:

x must have shape (n, xDims...),
y must have shape (n,) and contain integer-valued labels.

n : ℕ
Number of samples along dim0.
xDims : List ℕ
Per-sample input dimensions.
classes : ℕ
Number of classes for one-hot targets.
x : TensorSource
Source for the batched input tensor.
y : TensorSource
Source for the label vector.

Instances For

source

def NN.API.Data.LabeledSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ℕ) (xDims : List ℕ) (classes : ℕ) (csvOptions : CsvOptions := { }) :

LabeledSource

Construct a labeled source from paths using the same file format for x and y.

Instances For

source

def NN.API.Data.LabeledSource.load {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (src : LabeledSource) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.shapeOfDims src.xDims, Tensor.Shape.Vec src.classes])))

Load a labeled classification dataset by slicing dim0 and one-hot encoding labels.

For CSV label vectors, store labels as a single-column table with dims = [n, 1] and use a custom TensorSource if needed; the path constructor above is aimed at .npy label vectors.

Instances For

source

structure NN.API.Data.TabularSupervisedSource :

Type

Single-table supervised CSV source.

Use this when one CSV row contains both input and target columns: x1, ..., x_inDim, y1, ..., y_outDim.

path : System.FilePath
CSV file path.
inDim : ℕ
Number of input feature columns.
outDim : ℕ
Number of target columns.
csvOptions : CsvOptions
CSV parsing options.

Instances For

source

def NN.API.Data.TabularSupervisedSource.load {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] (src : TabularSupervisedSource) :

IO (Except String (Dataset (Runtime.Autograd.Torch.TList α [Tensor.Shape.Vec src.inDim, Tensor.Shape.Vec src.outDim])))

Load a single-table supervised CSV source.

Instances For

source

def NN.API.Data.supervisedRows {α : Type} [Semantics.Scalar α] [Runtime.Scalar α] {n inDim outDim : ℕ} (X : Spec.Tensor Float (Tensor.Shape.Mat n inDim)) (Y : Spec.Tensor Float (Tensor.Shape.Mat n outDim)) :

Dataset (Runtime.Autograd.Torch.TList α [Tensor.Shape.Vec inDim, Tensor.Shape.Vec outDim])

Build a supervised dataset from two matrices X : n×inDim and Y : n×outDim by pairing rows.

This is the TorchLean analogue of PyTorch's TensorDataset(X, Y) for simple regression.

Instances For

source

def NN.API.Data.collateSupervised {α : Type} {σ τ : Spec.Shape} (n : ℕ) (batch : List (Runtime.Autograd.Torch.TList α [σ, τ])) :

Except String (Runtime.Autograd.Torch.TList α [Spec.Shape.dim n σ, Spec.Shape.dim n τ])

Collate a length-n supervised batch into a single sample with a leading batch axis.

If your samples are (x : σ, y : τ), the collated sample is:

xBatch : (n × σ) and
yBatch : (n × τ)

In shapes: TList α [dim n σ, dim n τ].

Instances For

source

def NN.API.Data.chunkN {a : Type} (n : ℕ) (xs : List a) :

List (List a)

Split a list into consecutive length-n chunks, dropping any final short chunk.

Instances For

source

def NN.API.Data.chunkN.go {a : Type} (n : ℕ) (xs : List a) (fuel : ℕ) :

List (List a)

Instances For

source

def NN.API.Data.batchedSupervised {α : Type} {σ τ : Spec.Shape} (n : ℕ) (ds : Dataset (Runtime.Autograd.Torch.TList α [σ, τ])) :

Except String (Dataset (Runtime.Autograd.Torch.TList α [Spec.Shape.dim n σ, Spec.Shape.dim n τ]))

Turn a per-sample supervised dataset into a dataset of fixed-size minibatches.

This is useful for metrics (meanLossDataset, accuracy, etc.) when your model expects a leading batch axis.

Notes:

This drops the final partial batch (PyTorch drop_last=True behavior).
Batches are formed in dataset order (shuffling is the loader's job).

Instances For

source

def NN.API.Data.BatchLoader.dataset {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

Dataset (sample.Supervised α σ τ)

Extract the underlying per-sample dataset from a typed BatchLoader.

Instances For

source

def NN.API.Data.BatchLoader.batchSize {α : Type} {n : ℕ} {σ τ : Spec.Shape} (_dl : BatchLoader α n σ τ) :

ℕ

The batch size n carried in the type of a BatchLoader.

Instances For

source

def NN.API.Data.BatchLoader.shuffled {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

Bool

Whether the loader is configured to shuffle samples each epoch.

Instances For

source

def NN.API.Data.BatchLoader.seed {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

ℕ

RNG seed used for shuffling (if enabled).

Instances For

source

def NN.API.Data.BatchLoader.batchDataset {α : Type} {n : ℕ} {σ τ : Spec.Shape} (dl : BatchLoader α n σ τ) :

Except String (Dataset (sample.Batch α n σ τ))

Materialize the dataset as a dataset of full minibatches (dropping any final partial batch).

Instances For

source

def NN.API.Data.BatchLoader.epoch {α : Type} {n : ℕ} {σ τ : Spec.Shape} (name : String) (dl : BatchLoader α n σ τ) :

Except String (BatchLoader α n σ τ × List (sample.Batch α n σ τ))

Run one epoch: return the updated loader state and a list of typed minibatches.

Instances For

source

def NN.API.Data.BatchLoader.epochCollate {α β : Type} {n : ℕ} {σ τ : Spec.Shape} (name : String) (dl : BatchLoader α n σ τ) (f : sample.Batch α n σ τ → Except String β) :

Except String (BatchLoader α n σ τ × List β)

Like epoch, but post-process each minibatch with a user-supplied collate/transform f.

Instances For

source

def NN.API.Data.batchLoader {α : Type} {σ τ : Spec.Shape} (ds : Dataset (sample.Supervised α σ τ)) (batchSize : ℕ) (shuffle : Bool := false) (seed : ℕ := 0) (dropLast : Bool := true) :

BatchLoader α batchSize σ τ

Public loader API: supervised datasets become fixed-size minibatch loaders by default.

The underlying dataset still stores individual samples; the loader batches them and epoch returns tensors with a leading dim0 batch axis. Because the batch size is reflected in the type, the public batched path requires full batches, so dropLast defaults to true.

Instances For

source

def NN.API.Data.loaderAny {α : Type} {σ τ : Spec.Shape} (ds : Dataset (sample.Supervised α σ τ)) (batchSize : ℕ) (shuffle : Bool := false) (seed : ℕ := 0) (dropLast : Bool := true) :

AnyBatchLoader α σ τ

Build a batch loader when the batch size is only known at runtime.

Instances For