Reinforcement Learning Runtime Entrypoint #
This is the runtime umbrella for typed reinforcement-learning utilities. The modules here are executable infrastructure: they define small MDP/bandit interfaces, value-learning and policy-gradient helpers, replay buffers, checked rollout boundaries, Gymnasium subprocess sessions, and PPO rollout infrastructure.
The split is intentional:
Runtime.RL.Corecontains tensor-shaped rollout/loss helpers shared by algorithms;Runtime.RL.Replaycontains typed bounded replay buffers for off-policy algorithms;Runtime.RL.Algorithms.*contains pure/mostly-pure bandit, tabular, value-learning, and policy-gradient equations;Runtime.RL.Boundaryrecords host-side rollout contracts before converting observations and rewards into TorchLean tensors;Runtime.RL.Gymnasiumis an external-process bridge and therefore a trust boundary;Runtime.RL.PolicyGradient.Autogradcontains differentiable actor/critic losses over TorchLean refs; model architectures remain in GraphSpec/API model layers;Runtime.RL.PPOcontains rollout/sample construction;Runtime.RL.Numerics.*contains optional checked float32 and interval diagnostics for RL recursions.