TorchLean API

NN.API.Public.Facade.Data.Datasets

TorchLean Public Datasets #

Core public dataset constructors and file-backed dataset loaders.

@[reducible, inline]
Instances For
    @[reducible, inline]

    Stateful minibatch loader used by public dataset APIs.

    Instances For

      Runtime-polymorphic supervised dataset for Float tensors.

      Use this for most tutorials and file-loader paths: Float data is cast into whichever runtime scalar the command selected with --dtype.

      Instances For
        def TorchLean.Data.samples {σ τ : Shape} (mk : {α : Type} → [Runtime.SemanticScalar α] → [Runtime.Scalar α] → List (SupervisedSample α σ τ)) :

        Runtime-polymorphic supervised dataset from an explicit sample builder.

        Use this when Lean code generates samples directly rather than loading them from batched tensors, CSV, or NPY files. Sequence windows, synthetic PDE batches, and task-specific examples can keep their own sample logic while still returning a standard Trainer.Dataset.

        Instances For
          def TorchLean.Data.singleton {σ τ : Shape} (mk : {α : Type} → [Runtime.SemanticScalar α] → [Runtime.Scalar α] → SupervisedSample α σ τ) :

          Build a singleton dataset from one runtime-polymorphic supervised sample.

          Small examples can use the Trainer.Dataset surface without fixing the runtime scalar or backend in the dataset definition itself.

          Instances For
            def TorchLean.Data.singletonFrom {ρ : Type} {σ τ : Shape} (arg : ρ) (mk : {α : Type} → [Runtime.SemanticScalar α] → [Runtime.Scalar α] → ρSupervisedSample α σ τ) :

            Build a singleton dataset by feeding one explicit argument into a runtime-polymorphic sample constructor.

            Use this when the sample construction depends on one user-facing payload, such as a prompt string or one file-backed record.

            Instances For

              Build a singleton dataset from one Float sample produced inside IO.

              Use this when the sample comes from a file-backed or runtime-loaded Float boundary. The public trainer still owns the scalar/backend choice through Trainer.RunConfig and Trainer.TrainOptions.

              Instances For

                Runtime-polymorphic dataset from an in-memory list of Float supervised samples.

                Several examples build their training windows in ordinary Float first because the source is text, CSV, NPY, or another external boundary. This constructor keeps those examples on the public trainer surface: the samples are still cast to the runtime-selected scalar at training time, so callers do not have to write their own scalar-polymorphic dataset adapter.

                Instances For

                  Array form of floatSamples.

                  Instances For
                    def TorchLean.Data.batchDataset {σ τ : Shape} (batch : ) (data : Trainer.Dataset σ τ) (shuffle : Bool := true) (seed : := 0) (dropLast : Bool := true) :

                    Convert an unbatched supervised dataset into a fixed-size batched dataset.

                    This is the public "minibatch the dataset first" adapter for examples that want the model itself to own the batch axis. The returned dataset stores samples of shape dim batch σ and dim batch τ, so it can be passed directly to Trainer.new with a batched model.

                    Instances For
                      def TorchLean.Data.randomSplitDataset {σ τ : Shape} (trainSize : ) (data : Trainer.Dataset σ τ) (seed : := 0) :

                      Split a public dataset into deterministic train/test views.

                      This is the dataset-level analogue of torch.utils.data.random_split: the split happens after the trainer materializes the runtime scalar, but callers stay on ordinary Trainer.Dataset values.

                      Instances For
                        def TorchLean.Data.tabularCsvDataset (path : System.FilePath) (batch inDim outDim : ) (csvOptions : CsvOptions := { }) (shuffle : Bool := true) (seed : := 0) (dropLast : Bool := true) :
                        Trainer.Dataset (Shape.mat batch inDim) (Shape.mat batch outDim)

                        Load a numeric CSV table as a dataset of fixed-size tabular regression batches.

                        Each CSV row is interpreted as inDim feature columns followed by outDim target columns. The returned dataset already has the leading batch dimension expected by a model with input shape Shape.mat batch inDim and output shape Shape.mat batch outDim.

                        Instances For

                          Runtime-polymorphic supervised regression dataset from a tensor source.

                          This is the public file-data analogue of torch.utils.data.TensorDataset(X, Y) for examples whose targets are tensors rather than class labels. The source records where batched features and targets live; the trainer materializes them at the selected scalar type.

                          Instances For
                            def TorchLean.Data.supervisedNpyDataset (xPath yPath : System.FilePath) (n : ) (xDims yDims : List ) :

                            Runtime-polymorphic supervised dataset backed by paired .npy files.

                            This is the common file-backed regression/operator-learning path: xPath stores samples with shape xDims, and yPath stores matching targets with shape yDims.

                            Instances For

                              Runtime-polymorphic one-hot classification dataset from a tensor source.

                              This is the public file-data analogue of torch.utils.data.TensorDataset: the source records where features and integer labels live, and the trainer materializes them at the selected scalar type.

                              Instances For
                                @[reducible, inline]
                                abbrev TorchLean.Data.SupervisedSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ) (xDims yDims : List ) :

                                Construct a supervised source from paths using the same file format for x and y.

                                Instances For

                                  Load a supervised dataset by slicing dim0 from the two tensors.

                                  This is the preferred public loader for regression/operator-learning examples, regardless of whether the backing files are .npy or small numeric CSV tables.

                                  Instances For
                                    @[reducible, inline]
                                    abbrev TorchLean.Data.LabeledSource.ofPaths (format : TensorFormat) (xPath yPath : System.FilePath) (n : ) (xDims : List ) (classes : ) :

                                    Construct a labeled source from paths using the same file format for x and y.

                                    Instances For

                                      Load a labeled classification dataset by slicing dim0 and one-hot encoding labels.

                                      For CSV label vectors, store labels as a single-column table with dims = [n, 1] and use a custom TensorSource if needed; the path constructor above is aimed at .npy label vectors.

                                      Instances For