19/02/2026 12:02 PM - Lecture

⬅️ [17/02/2026 12:00 PM - Lecture](<./17_02_2026 12_00 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>) | [24/02/2026 12:02 PM - Lecture](<./24_02_2026 12_02 PM - Lecture.md>) ➡️

Starting with lecture 11

For discounted state distribution the P is given some x_0. If x_0 is a distribution take the expectation.

Score function simplifies nicely for the exponential family. LLMs use Gibbs so we get a de-meaned feature as the score function.

For the proof of policy gradient there are mistakes. Look at the annotated slides.

If you have a finite game then the value at the last state is independent of the next state because there is no next state so the gradient goes to 0 and the formula is still valid.

⬅️ [17/02/2026 12:00 PM - Lecture](<./17_02_2026 12_00 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>) | [24/02/2026 12:02 PM - Lecture](<./24_02_2026 12_02 PM - Lecture.md>) ➡️