Skip to content

16/04/2026 11:58 AM - Lecture

⬅️ [14/04/2026 12:03 PM - Lecture](<./14_04_2026 12_03 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>)

16/04/2026 11:58 AM - Lecture

lecture-25-constrained-rl.pdf

Safe RL (Constrained MDPs)

Receive both reward and utility. Different types of cost signal. You can have multiple utility signals.

We think of utility as a negative cost. CMDP literature from way back called it utility so we do now as well.

Reward and utility are deterministic in our proofs and are bounded [0, 1].

We now have value and Q functions for each of the utilities (-costs) and the reward.


⬅️ [14/04/2026 12:03 PM - Lecture](<./14_04_2026 12_03 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>)