TorchLean Public Datasets #

Use this when Lean code generates samples directly rather than loading them from batched tensors, CSV, or NPY files. Sequence windows, synthetic PDE batches, and task-specific examples can keep their own sample logic while still returning a standard Trainer.Dataset.

Instances For

source

def TorchLean.Data.singleton {σ τ : Shape} (mk : {α : Type} → [Runtime.SemanticScalar α] → [Runtime.Scalar α] → SupervisedSample α σ τ) :

Trainer.Dataset σ τ

Build a singleton dataset from one runtime-polymorphic supervised sample.

Small examples can use the Trainer.Dataset surface without fixing the runtime scalar or backend in the dataset definition itself.

Instances For

source

def TorchLean.Data.singletonFrom {ρ : Type} {σ τ : Shape} (arg : ρ) (mk : {α : Type} → [Runtime.SemanticScalar α] → [Runtime.Scalar α] → ρ → SupervisedSample α σ τ) :

Trainer.Dataset σ τ

Build a singleton dataset by feeding one explicit argument into a runtime-polymorphic sample constructor.

Use this when the sample construction depends on one user-facing payload, such as a prompt string or one file-backed record.

Instances For

source

def TorchLean.Data.ioSingletonFloat {σ τ : Shape} (mk : IO (SupervisedSample Float σ τ)) :

Trainer.Dataset σ τ

Build a singleton dataset from one Float sample produced inside IO.

Use this when the sample comes from a file-backed or runtime-loaded Float boundary. The public trainer still owns the scalar/backend choice through Trainer.RunConfig and Trainer.TrainOptions.

Instances For

source

def TorchLean.Data.floatSamples {σ τ : Shape} (samples : List (SupervisedSample Float σ τ)) :

Trainer.Dataset σ τ

Runtime-polymorphic dataset from an in-memory list of Float supervised samples.

Several examples build their training windows in ordinary Float first because the source is text, CSV, NPY, or another external boundary. This constructor keeps those examples on the public trainer surface: the samples are still cast to the runtime-selected scalar at training time, so callers do not have to write their own scalar-polymorphic dataset adapter.

Instances For

source

def TorchLean.Data.floatSampleArray {σ τ : Shape} (samples : Array (SupervisedSample Float σ τ)) :

Trainer.Dataset σ τ

Array form of floatSamples.

Instances For

source

def TorchLean.Data.batchDataset {σ τ : Shape} (batch : ℕ) (data : Trainer.Dataset σ τ) (shuffle : Bool := true) (seed : ℕ := 0) (dropLast : Bool := true) :

Trainer.Dataset (Spec.Shape.dim batch σ) (Spec.Shape.dim batch τ)

Convert an unbatched supervised dataset into a fixed-size batched dataset.

This is the public "minibatch the dataset first" adapter for examples that want the model itself to own the batch axis. The returned dataset stores samples of shape dim batch σ and dim batch τ, so it can be passed directly to Trainer.new with a batched model.

Instances For

source

def TorchLean.Data.randomSplitDataset {σ τ : Shape} (trainSize : ℕ) (data : Trainer.Dataset σ τ) (seed : ℕ := 0) :

Trainer.Dataset σ τ × Trainer.Dataset σ τ

Split a public dataset into deterministic train/test views.

This is the dataset-level analogue of torch.utils.data.random_split: the split happens after the trainer materializes the runtime scalar, but callers stay on ordinary Trainer.Dataset values.

Instances For

source

def TorchLean.Data.tabularCsvDataset (path : System.FilePath) (batch inDim outDim : ℕ) (csvOptions : CsvOptions := { }) (shuffle : Bool := true) (seed : ℕ := 0) (dropLast : Bool := true) :

Trainer.Dataset (Shape.mat batch inDim) (Shape.mat batch outDim)

Load a numeric CSV table as a dataset of fixed-size tabular regression batches.

Each CSV row is interpreted as inDim feature columns followed by outDim target columns. The returned dataset already has the leading batch dimension expected by a model with input shape Shape.mat batch inDim and output shape Shape.mat batch outDim.

Instances For

source

def TorchLean.Data.supervisedDataset (src : SupervisedSource) :

Trainer.Dataset (Shape.ofDims src.xDims) (Shape.ofDims src.yDims)

Runtime-polymorphic supervised regression dataset from a tensor source.

This is the public file-data analogue of torch.utils.data.TensorDataset(X, Y) for examples whose targets are tensors rather than class labels. The source records where batched features and targets live; the trainer materializes them at the selected scalar type.

Instances For

source

def TorchLean.Data.supervisedNpyDataset (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) :

Trainer.Dataset (Shape.ofDims xDims) (Shape.ofDims yDims)

Runtime-polymorphic supervised dataset backed by paired .npy files.

This is the common file-backed regression/operator-learning path: xPath stores samples with shape xDims, and yPath stores matching targets with shape yDims.

Instances For

source

def TorchLean.Data.labeledDataset (src : LabeledSource) :

Trainer.Dataset (Shape.ofDims src.xDims) (Shape.vec src.classes)

Runtime-polymorphic one-hot classification dataset from a tensor source.

This is the public file-data analogue of torch.utils.data.TensorDataset: the source records where features and integer labels live, and the trainer materializes them at the selected scalar type.

Instances For

source

@[reducible, inline]

abbrev TorchLean.Data.SupervisedSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ℕ) (xDims yDims : List ℕ) :

SupervisedSource

Construct a supervised source from paths using the same file format for x and y.

Instances For

source

def TorchLean.Data.SupervisedSource.load {α : Type} [Runtime.Scalar α] (src : SupervisedSource) :

IO (Except String (Runtime.Autograd.Train.Dataset (NN.API.TorchLean.TensorPack α [NN.Tensor.shapeOfDims src.xDims, NN.Tensor.shapeOfDims src.yDims])))

Load a supervised dataset by slicing dim0 from the two tensors.

This is the preferred public loader for regression/operator-learning examples, regardless of whether the backing files are .npy or small numeric CSV tables.

Instances For

source

@[reducible, inline]

abbrev TorchLean.Data.LabeledSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ℕ) (xDims : List ℕ) (classes : ℕ) :

LabeledSource

Construct a labeled source from paths using the same file format for x and y.

Instances For

source

def TorchLean.Data.LabeledSource.load {α : Type} [Runtime.SemanticScalar α] [Runtime.Scalar α] (src : LabeledSource) :

IO (Except String (Runtime.Autograd.Train.Dataset (NN.API.TorchLean.TensorPack α [NN.Tensor.shapeOfDims src.xDims, NN.Tensor.Shape.Vec src.classes])))

Load a labeled classification dataset by slicing dim0 and one-hot encoding labels.

For CSV label vectors, store labels as a single-column table with dims = [n, 1] and use a custom TensorSource if needed; the path constructor above is aimed at .npy label vectors.

Instances For

TorchLean API

NN.API.Public.Facade.Data.Datasets

TorchLean Public Datasets #