Past Exam
⬆️ [Studying](<./README.md>) | [Logistics](<./Logistics.md>) ➡️
Q1
Basically just know the formula for a SARSA update.
Q2
Much more complicated. Requires knowing the UCB algorithm.
Know how the UCB algorithm chooses an action to take.
Know how the UCB algorithm updates the mean
Note that the equation looks complicated, but it is just the equation to update the empirical mean using a sample instead of recomputing from the list of values which we are not maintaining.
Know how to compute empirical regret.
Q3
DPG (Deterministic policy gradient) and DDPG (Deep DPG)
Compute the policy gradient.
Q4
Understand how scoring works in AlphaZero.
Understand what defines the policy in AlphaZero.
⬆️ [Studying](<./README.md>) | [Logistics](<./Logistics.md>) ➡️