TorchLean API

Docs Home Guide Examples Graphs

NN.Runtime.RL.Boundary.Core

RL Trust Boundary (External Rollouts) #

TorchLean’s RL update rules (Bellman backups, returns/GAE, PPO objectives, etc.) are defined in Lean and can therefore be reasoned about and proved correct inside Lean. When you collect experience with an external environment (for example Python Gymnasium), you cross a trust boundary:

we do not (and generally cannot) prove the external environment satisfies Markov / measurability / boundedness assumptions, and
we do not prove the external runtime returns numerically well-behaved floating-point values.

This module implements a practical middle ground: a strict, explicit contract for externally provided rollouts and a checker that turns “assumptions” into “checked preconditions”.

The boundary emits Spec.RL.ObservedTransition as the validated output type. This lets downstream training code share one common input type for both:

Lean-native environments via Spec.RL.rolloutFrom, and
external rollouts after passing this contract check.

References #

Gymnasium API (reset/step, terminated vs truncated): https://gymnasium.farama.org/
The original Gym API paper (background on the env interface): https://arxiv.org/abs/1606.01540
Schulman et al., "Proximal Policy Optimization Algorithms" (2017): https://arxiv.org/abs/1707.06347
Trust-boundary pattern used elsewhere in TorchLean (e.g. the Arb oracle): NN.Floats.Arb.

Basic numeric checks #

We keep checks in Bool form and then use Except String wrappers for nice error messages. This makes it easy to integrate with IO loaders (JSON, sockets, shared-memory, ...).

def Runtime.RL.Boundary.isFiniteFloat (x : Float) :

true iff a Float is neither NaN nor ±Inf.

Instances For

def Runtime.RL.Boundary.tensorAll {α : Type} {s : Spec.Shape} (p : α → Bool) :

Spec.Tensor α s → Bool

Apply a scalar boolean predicate to every entry of a tensor.

Instances For

def Runtime.RL.Boundary.tensorFinite {s : Spec.Shape} (t : Spec.Tensor Float s) :

true iff every entry of the tensor is finite.

Instances For

def Runtime.RL.Boundary.tensorInClosedInterval {s : Spec.Shape} (lo hi : Float) (t : Spec.Tensor Float s) :

true iff every entry of the tensor lies in [lo, hi].

Instances For

Contract for external discrete-action rollouts #

This contract is intentionally narrow:

it does not try to verify the environment dynamics,
it checks only syntactic and numeric sanity properties that can be validated locally,
it is designed to fail fast with a human-readable error.

structure Runtime.RL.Boundary.Contract (obsShape : Spec.Shape) (nActions : ℕ) :

Contract for validating externally provided Gym-style transitions.

checkObsFinite : Bool
Check that observations contain no NaNs/Infs.
checkRewardFinite : Bool
Check that rewards contain no NaNs/Infs.
obsRange? : Option (Float × Float)
Optional observation range check [lo, hi].
rewardRange? : Option (Float × Float)
Optional reward range check [lo, hi].
requireExclusiveDoneFlags : Bool
Optional strictness: reject steps with both terminated=true and truncated=true.

Instances For

def Runtime.RL.Boundary.instReprContract.repr {obsShape✝ : Spec.Shape} {nActions✝ : ℕ} :

Contract obsShape✝ nActions✝ → ℕ → Std.Format

Instances For

@[implicit_reducible]

instance Runtime.RL.Boundary.instReprContract {obsShape✝ : Spec.Shape} {nActions✝ : ℕ} :

Repr (Contract obsShape✝ nActions✝)

@[reducible, inline]

abbrev Runtime.RL.Boundary.Transition (obsShape : Spec.Shape) (nActions : ℕ) :

Validated transition type for discrete-action rollouts.

Instances For

def Runtime.RL.Boundary.Transition.done {obsShape : Spec.Shape} {nActions : ℕ} (t : Transition obsShape nActions) :

Gymnasium-style done flag: terminated || truncated.

Instances For

Checkers #

def Runtime.RL.Boundary.checkAction (nActions action : ℕ) :

Except String (Fin nActions)

Convert a raw action index into Fin nActions with a range check.

Instances For

Internal check helpers #

These small routines are factored out so the exported checkers can stay readable and can assemble good error messages.

def Runtime.RL.Boundary.Internal.checkFloatFinite (field : String) (x : Float) :

Except String Unit

Check that a Float is finite (not NaN/Inf), producing an error tagged with field.

Instances For

def Runtime.RL.Boundary.Internal.checkTensorFinite {s : Spec.Shape} (field : String) (t : Spec.Tensor Float s) :

Except String Unit

Check that every tensor entry is finite, producing an error tagged with field.

Instances For

def Runtime.RL.Boundary.Internal.checkTensorRange {s : Spec.Shape} (field : String) (lo hi : Float) (t : Spec.Tensor Float s) :

Except String Unit

Check that every tensor entry lies in [lo, hi], producing an error tagged with field.

Instances For

def Runtime.RL.Boundary.Internal.checkFloatRange (field : String) (lo hi x : Float) :

Except String Unit

Check that a scalar lies in [lo, hi], producing an error tagged with field.

Instances For

def Runtime.RL.Boundary.checkObservation {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (field : String := "observation") (obs : Spec.Tensor Float obsShape) :

Except String Unit

Validate one observation tensor against the observation-related part of a Contract.

This is mainly used to validate reset observations coming from external Gymnasium-like bridges.

Instances For

def Runtime.RL.Boundary.checkReward {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (reward : Float) :

Except String Unit

Validate one reward against the reward-related part of a Contract.

Instances For

def Runtime.RL.Boundary.checkDoneFlags {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (terminated truncated : Bool) :

Except String Unit

Validate the Gymnasium-style done flags against a Contract.

Instances For

def Runtime.RL.Boundary.checkTransitionFin {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (observation nextObservation : Spec.Tensor Float obsShape) (action : Fin nActions) (reward : Float) (terminated truncated : Bool) :

Except String (Transition obsShape nActions)

Validate a transition when the action is already range-checked (Fin nActions).

This avoids re-checking the action range at call sites that already work with Fin actions.

Instances For

def Runtime.RL.Boundary.checkTransition {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (observation nextObservation : Spec.Tensor Float obsShape) (action : ℕ) (reward : Float) (terminated truncated : Bool) :

Except String (Transition obsShape nActions)

Validate one observed transition against a Contract.

Instances For

Proposition-level contract #

The checkers above are executable and return Except String ... for good runtime error messages. For formal reasoning we also want a Prop-level view that states exactly what was checked.

The main bridge lemma is in NN/Proofs/RL/Boundary.lean:

Proofs.RL.Boundary.contractHolds_of_checkTransitionFin_eq_ok turns a successful runtime check into ContractHolds.

def Runtime.RL.Boundary.ObservationHolds {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (obs : Spec.Tensor Float obsShape) :

Proposition-level version of the observation checks performed by checkObservation.

This is intentionally “syntactic”: it only states the locally checkable numeric/shape sanity conditions, not any semantic assumptions about the environment dynamics.

Instances For

def Runtime.RL.Boundary.RewardHolds {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (reward : Float) :

Proposition-level version of the reward checks performed by checkReward.

Instances For

def Runtime.RL.Boundary.DoneFlagsHolds {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (terminated truncated : Bool) :

Proposition-level version of the done-flag check performed by checkDoneFlags.

terminated and truncated follow Gymnasium’s API: terminated means the environment reached a terminal state, while truncated indicates an external truncation such as a time-limit. See the Gymnasium Env.step documentation:

https://gymnasium.farama.org/api/env/#gymnasium.Env.step

Instances For

structure Runtime.RL.Boundary.ContractHolds {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (t : Transition obsShape nActions) :

ContractHolds c t means the external transition t satisfies the Prop-level trust-boundary contract induced by c.

doneFlags : DoneFlagsHolds c t.terminated t.truncated
reward : RewardHolds c t.reward
observation : ObservationHolds c t.observation
nextObservation : ObservationHolds c t.nextObservation

Instances For

@[reducible, inline]

abbrev Runtime.RL.Boundary.ValidTransition {obsShape : Spec.Shape} {nActions : ℕ} (c : Contract obsShape nActions) (t : Transition obsShape nActions) :

Alias: Prop-level validity of a transition under c.

Instances For