19/02/2026 12:02 PM - Lecture
⬅️ [17/02/2026 12:00 PM - Lecture](<./17_02_2026 12_00 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>) | [24/02/2026 12:02 PM - Lecture](<./24_02_2026 12_02 PM - Lecture.md>) ➡️
19/02/2026 12:02 PM - Lecture
Starting with lecture 11
For discounted state distribution the P is given some x_0. If x_0 is a distribution take the expectation.
Score function simplifies nicely for the exponential family. LLMs use Gibbs so we get a de-meaned feature as the score function.
For the proof of policy gradient there are mistakes. Look at the annotated slides.
If you have a finite game then the value at the last state is independent of the next state because there is no next state so the gradient goes to 0 and the formula is still valid.
⬅️ [17/02/2026 12:00 PM - Lecture](<./17_02_2026 12_00 PM - Lecture.md>) | ⬆️ [ECE 567](<./README.md>) | [24/02/2026 12:02 PM - Lecture](<./24_02_2026 12_02 PM - Lecture.md>) ➡️