PCA (spec model) #
Principal Component Analysis is represented as a linear projection onto learned components, plus an explicit mean for centering.
This file models only the transform (and inverse transform), not the procedure that learns principal components from data.
PyTorch / ecosystem analogies:
- scikit-learn:
sklearn.decomposition.PCA(fit + transform) - PyTorch:
torch.pca_lowrankortorch.linalg.svd(common building blocks)
References (background, not required to read the code):
- Pearson (1901), "On Lines and Planes of Closest Fit to Systems of Points in Space". https://doi.org/10.1080/14786440109462720
- Hotelling (1933), "Analysis of a complex of statistical variables into principal components". https://doi.org/10.2307/2333955
Parameters for PCA as a linear map plus centering.
We store:
components : outDim × inDim(rows are principal directions),mean : inDim(for centering),explained_variance : outDim(eigenvalues for the selected components).
This matches the typical PCA API: you can transform to outDim coordinates and inverse back
to inDim.
- components : Tensor α (Shape.dim outDim (Shape.dim inDim Shape.scalar))
components.
- mean : Tensor α (Shape.dim inDim Shape.scalar)
mean.
- explained_variance : Tensor α (Shape.dim outDim Shape.scalar)
explained variance.
Instances For
Forward pass: center and project: y = components · (x - mean).
Instances For
Batched forward pass: apply pca_forward_spec to each row.
Instances For
Inverse transform: reconstruct x ≈ componentsᵀ · y + mean.
Instances For
VJP contribution for components: outer product dL/dy ⊗ (x - mean).
Instances For
VJP contribution for mean: dL/dmean = -componentsᵀ · dL/dy.
Instances For
VJP contribution for input: dL/dx = componentsᵀ · dL/dy.
Instances For
Full backward pass returning (dComponents, dMean, dInput).
Instances For
Fit PCA using the (scaled) covariance matrix and eigendecomposition.
Algorithm:
- compute the mean and center the data,
- form the covariance matrix
C = (1/(n-1)) Xᵀ X, - compute eigenpairs of
C, - take the top
nComponentseigenvectors, - orient eigenvectors deterministically (sign convention) so results are reproducible.
Note: this is a spec/reference implementation. In numerical libraries, PCA is often implemented via SVD for stability and performance.
Instances For
Apply a fitted PCA transform to a batch of samples.
Instances For
Reconstruction error: ||x - inverse(transform(x))||_2^2 (sum of squared coordinates).
PyTorch analogy: torch.sum((x - x_hat) ** 2).
Instances For
Explained variance (eigenvalues of the selected components).
If you want the ratio (normalized to sum to 1), you need to divide by the total variance of the
original data; this file keeps just the raw eigenvalues.
Instances For
Cumulative explained variance (prefix sums of explained_variance).