RL Trust-Boundary Proofs #

NN.Runtime.RL.Boundary.Core provides executable “trust-boundary” checkers for externally supplied RL data. Those checkers return Except String ... so runtime code can fail fast with clear error messages. JSON rollout parsing lives one layer higher in NN.Runtime.RL.Boundary.Json; the proof bridge here only needs the contract and checker semantics.

For formal reasoning, we also want a Prop-level view of exactly what was checked:

This module provides the bridge lemmas that turn a successful executable check (= .ok ...) into the corresponding Prop-level statement.

References:

Meyer, Object-Oriented Software Construction (2nd ed., 1997): “Design by Contract” as a general specification pattern at software boundaries.
Gymnasium API docs (reset/step, terminated vs truncated): https://gymnasium.farama.org/
Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed., 2018), discussion of episodic termination semantics: http://incompleteideas.net/book/the-book-2nd.html

Internal helper lemmas #

These facts are purely about the small executable helpers in Runtime.RL.Boundary.Internal.

Bridge lemmas: executable checks -> Prop contract #

source

theorem Proofs.RL.Boundary.doneFlagsHolds_of_checkDoneFlags_eq_ok {obsShape : Spec.Shape} {nActions : ℕ} (c : Runtime.RL.Boundary.Contract obsShape nActions) (terminated truncated : Bool) (h : Runtime.RL.Boundary.checkDoneFlags c terminated truncated = Except.ok ()) :

Runtime.RL.Boundary.DoneFlagsHolds c terminated truncated

If the executable checker checkDoneFlags succeeds, then the Prop-level done-flag contract holds.

source

theorem Proofs.RL.Boundary.observationHolds_of_checkObservation_eq_ok {obsShape : Spec.Shape} {nActions : ℕ} (c : Runtime.RL.Boundary.Contract obsShape nActions) (field : String) (obs : Spec.Tensor Float obsShape) (h : Runtime.RL.Boundary.checkObservation c field obs = Except.ok ()) :

Runtime.RL.Boundary.ObservationHolds c obs

If the executable checker checkObservation succeeds, then the Prop-level observation contract holds.

source

theorem Proofs.RL.Boundary.rewardHolds_of_checkReward_eq_ok {obsShape : Spec.Shape} {nActions : ℕ} (c : Runtime.RL.Boundary.Contract obsShape nActions) (reward : Float) (h : Runtime.RL.Boundary.checkReward c reward = Except.ok ()) :

Runtime.RL.Boundary.RewardHolds c reward

If the executable checker checkReward succeeds, then the Prop-level reward contract holds.

source

theorem Proofs.RL.Boundary.contractHolds_of_checkTransitionFin_eq_ok {obsShape : Spec.Shape} {nActions : ℕ} (c : Runtime.RL.Boundary.Contract obsShape nActions) (observation nextObservation : Spec.Tensor Float obsShape) (action : Fin nActions) (reward : Float) (terminated truncated : Bool) (t : Runtime.RL.Boundary.Transition obsShape nActions) (h : Runtime.RL.Boundary.checkTransitionFin c observation nextObservation action reward terminated truncated = Except.ok t) :

Runtime.RL.Boundary.ContractHolds c t

If the executable checker checkTransitionFin succeeds, then the Prop-level contract holds.

This is the main “checked preconditions” bridge used by downstream RL theorems.

source

def Proofs.RL.Boundary.checkTransitionFinWithProof {obsShape : Spec.Shape} {nActions : ℕ} (c : Runtime.RL.Boundary.Contract obsShape nActions) (observation nextObservation : Spec.Tensor Float obsShape) (action : Fin nActions) (reward : Float) (terminated truncated : Bool) :

Except String { t : Runtime.RL.Boundary.Transition obsShape nActions // Runtime.RL.Boundary.ContractHolds c t }

Convenience wrapper: run the executable checker and, on success, return the transition bundled with the Prop-level contract proof.

This is the “checked preconditions” interface for downstream proofs/programs: instead of assuming a contract, you explicitly check it and obtain a usable hypothesis.

Instances For

TorchLean API

NN.Proofs.RL.Boundary

RL Trust-Boundary Proofs #

Internal helper lemmas #

Bridge lemmas: executable checks -> Prop contract #