Graph neural network layers (spec layer) #
We provide a couple of small, standard GNN building blocks that show up in lots of papers and PyTorch GNN libraries:
- a basic "message passing / neighbor aggregation" primitive, and
- a GCN-style graph convolution layer.
Message passing (the common core idea) #
Most GNN layers have the same shape of computation:
- aggregate neighbor features using the graph structure, then
- optionally apply a learnable transformation and a nonlinearity.
In this file the aggregation step is written with a matrix A : (n×n):
Agg(A, H) = A · H.
This captures many common conventions:
- if
Ais the raw adjacency, you are summing neighbors, - if
Ais normalized (e.g.D^{-1/2} (A + I) D^{-1/2}), you are doing the "GCN normalization" flavor, - if
Aincludes edge weights, you are doing a weighted sum.
GCN layer (one very common choice) #
We model a GCN-style layer as:
where:
A : (n×n)is an adjacency-like matrix (often normalized, and often with self-loops),H : (n×inDim)are node features,W : (inDim×outDim)andb : outDimare trainable parameters.
PyTorch mental picture:
- This is the algebraic core of what libraries like PyTorch Geometric call
GCNConvonce you pick a concrete choice ofA(raw adjacency,D^{-1/2} (A + I) D^{-1/2}, etc.) and batch conventions.
Why only these two right now:
- GCN + plain aggregation are enough to cover a lot of examples and give us something we can reason about cleanly.
- We do plan to add other families (GraphSAGE, GAT, generic MPNNs). Those require more choices (per-edge features, masking/batching conventions, and tie-ins to attention-style ops), so we want to introduce them carefully instead of piling on half-finished variants.
Neighbor aggregation / message passing via a graph matrix: Agg(A, X) = A · X.
This is the reusable "mix neighbors" step. The semantics are entirely determined by A
(raw adjacency, normalized adjacency, weighted adjacency, etc.).
Instances For
Backward/VJP for message_passing_spec: returns (dA, dX).
Instances For
Parameters/data for a single GCN-style layer.
We bundle A with the layer because many code paths treat A as a fixed input per graph, while
others treat it as a parameter (e.g. learned normalization). Keeping it in the record makes both
uses explicit.
- A : Tensor α (Shape.dim n (Shape.dim n Shape.scalar))
A.
- W : Tensor α (Shape.dim inDim (Shape.dim outDim Shape.scalar))
W.
- b : Tensor α (Shape.dim outDim Shape.scalar)
b.
Instances For
Forward spec for a GCN-style layer: Y = A · X · W + b.
Notes:
- The bias
bis broadcast across thennodes (row-wise add). - Any normalization/self-loop convention belongs in the choice of
Asupplied to the layer.
Instances For
Gradients #
For the simple GCN-style layer
the reverse-mode derivatives are the standard matrix calculus ones:
dW = (A·X)ᵀ · dYdb = Σᵢ dYᵢ(sum across the node axis)dX = Aᵀ · (dY · Wᵀ)dA = (dY · Wᵀ) · Xᵀ
We include dA because in some setups the adjacency/normalization is also:
- treated as an input you want sensitivities for, or
- treated as a parameter (e.g. learned edge weights / learned normalization).
Backward/VJP spec for gcn_layer_spec.
Returns (dA, dW, db, dX) in that order.