NPY loaders for typed training tensors #
This module implements the small, explicit .npy subset that TorchLean's native training
examples need:
- NumPy format versions 1 and 2;
- little-endian
float32andfloat64payloads (<f4,<f8); - C-order arrays directly, and Fortran-order arrays converted to C-order at load time;
- typed 1D and 2D tensor views for vectors and matrices.
The loader deliberately stays narrow. It is a runtime bridge for trusted experiment artifacts, not
a general NumPy parser and not part of the formal tensor semantics. Keeping it here, under
Runtime.Autograd.Train, makes that boundary visible while still giving examples a convenient path
from Python-produced arrays into TorchLean tensors.
Reference:
- NumPy
.npyformat documentation: https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html
In-memory representation of a loaded .npy file in TorchLean's supported subset.
values is always flattened in C-order. If the source file declares fortran_order = True, we
reorder the payload during parsing and store fortran := false in the returned value so downstream
tensor loaders never have to reason about storage order.
- dtype : String
Dtype string as stored in the header, for example
"<f4"or"<f8". Logical array shape as stored in the header.
- fortran : Bool
Whether the returned flat payload is still Fortran-ordered. This loader returns
false. Flattened numeric payload, converted to Lean
Floatvalues.
Instances For
Prefix products of a shape list.
For a shape [d₀, d₁, d₂], this returns [1, d₀, d₀*d₁], which are exactly the
Fortran-order strides. We use these strides to convert Fortran storage into TorchLean's ordinary
C-order flattening convention.
Instances For
Convert a linear C-order index to the corresponding linear Fortran-order index.
Both indices describe the same multi-dimensional coordinate. The difference is only how the coordinate is flattened into a one-dimensional payload.
Instances For
Reorder a Fortran-ordered flat array into C-order.
The function is total and defensive: if the file payload is malformed and an index is missing, the
missing element is filled with 0.0. The parser checks payload length before calling this function,
so that fallback should not happen for accepted files.
Instances For
Parse the NumPy header dictionary.
We only need three standard fields: descr, fortran_order, and shape. The header format is a
Python-literal dictionary padded to an alignment boundary; this parser is intentionally
field-oriented rather than a full Python parser.
Instances For
Parse the bytes of a .npy file into NpyData.
The parser rejects unsupported dtypes, malformed headers, and truncated payloads. That makes loader failures explicit at the trust boundary instead of silently producing tensors with the wrong shape or partial data.
Instances For
Parse only the requested leading rows of a C-order .npy array.
This supports the common tutorial workflow where a large exported tensor is kept on disk but a run
uses only the first n rows. The rank and trailing dimensions must match exactly; only dim 0 may be
larger than requested.
The implementation intentionally repeats the small NPY header checks instead of calling parseNpy
and slicing afterwards. parseNpy decodes the entire data payload; that is fine for small examples
but wasteful when a command asks for a quick prefix of a real image or sequence dataset. Here we
read the header, validate that the file layout is compatible with the requested type-level shape,
and then decode exactly expectedShape.product elements.
Why C-order only? In row-major NPY files, the first n rows are physically contiguous, so the
prefix is exactly the first n * trailingSize elements. In Fortran-order files the same logical
prefix is interleaved across the payload, so a cheap prefix decode would be wrong. Rather than
silently returning bad rows, we reject Fortran-order prefix loading and ask callers to convert the
array to C-order first.
Instances For
Read a .npy file from disk and parse it as NpyData.
Instances For
Read a .npy file but decode only the requested leading rows.
This is the file-system wrapper around parseNpyPrefixDim0. It still reads the file bytes into
memory, but it avoids building a full Array Float for rows the run did not ask to use. The
public API.Data layer uses this when a dataset source says "load the first n examples" from a
larger exported NPY tensor.
Instances For
Read a 1D .npy file as a typed TorchLean vector tensor.
The shape check is part of the loader contract: files with the wrong logical size are rejected instead of being reshaped implicitly.
Instances For
Read a 2D .npy file as a typed TorchLean matrix tensor.
The returned matrix uses the same row-major indexing convention as the rest of the runtime tensor helpers.