NPY loaders for typed training tensors #
This module implements the small, explicit .npy subset that TorchLean's native training
examples need:
- NumPy format versions 1 and 2;
- little-endian
float32andfloat64payloads (<f4,<f8); - C-order arrays directly, and Fortran-order arrays converted to C-order at load time;
- typed 1D and 2D tensor views for vectors and matrices.
The loader stays narrow. It is a runtime bridge for trusted experiment artifacts, not
a general NumPy parser and not part of the formal tensor semantics. Keeping it here, under
Runtime.Autograd.Train, makes that boundary visible while still giving examples a convenient path
from Python-produced arrays into TorchLean tensors.
Reference:
- NumPy
.npyformat documentation: https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html
In-memory representation of a loaded .npy file in TorchLean's supported subset.
values is always flattened in C-order. If the source file declares fortran_order = True, we
reorder the payload during parsing and store fortran := false in the returned value so downstream
tensor loaders never have to reason about storage order.
- dtype : String
Dtype string as stored in the header, for example
"<f4"or"<f8". Logical array shape as stored in the header.
- fortran : Bool
Whether the returned flat payload is still Fortran-ordered. This loader returns
false. Flattened numeric payload, converted to Lean
Floatvalues.
Instances For
Prefix products of a shape list.
For a shape [d₀, d₁, d₂], this returns [1, d₀, d₀*d₁], which are exactly the
Fortran-order strides. We use these strides to convert Fortran storage into TorchLean's ordinary
C-order flattening convention.
Instances For
Convert a linear C-order index to the corresponding linear Fortran-order index.
Both indices describe the same multi-dimensional coordinate. The difference is only how the coordinate is flattened into a one-dimensional payload.
Instances For
Reorder a Fortran-ordered flat array into C-order.
The function is total and defensive: if the file payload is malformed and an index is missing, the
missing element is filled with 0.0. The parser checks payload length before calling this function,
so that fallback should not happen for accepted files.
Instances For
Parse the NumPy header dictionary.
We only need three standard fields: descr, fortran_order, and shape. The header format is a
Python-literal dictionary padded to an alignment boundary, so this parser stays field-oriented
rather than trying to become a full Python parser.
Instances For
Parsed .npy metadata needed before reading the numeric payload.
- descr : String
Dtype descriptor from the header, for example
"<f4"or"<f8". - fortran : Bool
Whether the on-disk payload is Fortran-ordered.
Logical array shape from the header.
- dataStart : ℕ
Byte offset where the numeric payload begins.
Instances For
Read and validate the NumPy magic/version/header block shared by all NPY loaders.
Instances For
Byte width for the dtypes supported by TorchLean's NPY loader.
Instances For
Parse the bytes of a .npy file into NpyData.
The parser rejects unsupported dtypes, malformed headers, and truncated payloads. That makes loader failures explicit at the trust boundary instead of silently producing tensors with the wrong shape or partial data.
Instances For
Parse only the requested leading rows of a C-order .npy array.
This supports large exported tensors kept on disk while a run uses only the first n rows. The rank
and trailing dimensions must match exactly; only dim 0 may be larger than requested.
The implementation shares header and dtype parsing with parseNpy, then decodes only the requested
prefix. This avoids building a full Array Float when a command asks for a small leading slice of a
real image or sequence dataset.
Why C-order only? In row-major NPY files, the first n rows are physically contiguous, so the
prefix is exactly the first n * trailingSize elements. In Fortran-order files the same logical
prefix is interleaved across the payload, so prefix decoding would be unsound. Rather than
silently returning bad rows, we reject Fortran-order prefix loading and ask callers to convert the
array to C-order first.
Instances For
Read a .npy file from disk and parse it as NpyData.
Instances For
Read a .npy file but decode only the requested leading rows.
This is the file-system wrapper around parseNpyPrefixDim0. It still reads the file bytes into
memory, but it avoids building a full Array Float for rows the run did not ask to use. The
public API.Data layer uses this when a dataset source says "load the first n examples" from a
larger exported NPY tensor.
Instances For
Read a 1D .npy file as a typed TorchLean vector tensor.
The shape check is part of the loader contract: files with the wrong logical size are rejected instead of being reshaped implicitly.
Instances For
Read a 2D .npy file as a typed TorchLean matrix tensor.
The returned matrix uses the same row-major indexing convention as the rest of the runtime tensor helpers.