16/04/2026 11:58 AM - Lecture
⬅️ [14/04/2026 12:03 PM - Lecture](<./14_04_2026 12_03 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>)
16/04/2026 11:58 AM - Lecture
Safe RL (Constrained MDPs)
Receive both reward and utility. Different types of cost signal. You can have multiple utility signals.
We think of utility as a negative cost. CMDP literature from way back called it utility so we do now as well.
Reward and utility are deterministic in our proofs and are bounded [0, 1].
We now have value and Q functions for each of the utilities (-costs) and the reward.
⬅️ [14/04/2026 12:03 PM - Lecture](<./14_04_2026 12_03 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>)