Logistics

⬅️ [Past Exam](<./Past Exam.md>) | ⬆️ [Studying](<./README.md>) | [General](<./General.md>) ➡️

Logistic

It is a closed-book exam. You can bring a one-page cheat sheet and use both sides.
You can bring a calculator (without WiFi). You shouldn't use your cell phone as a calculator during the exam. You will be asked to put your cellphone and any other devices with Internet access in your backpack and leave your backpack at the front.
Scratch papers will be distributed at the beginning of the exam. You should not use your own scratch papers during the exam.
Please bring your U-M ID card to the exam.
You will be seated according to a seating chart that will be available at the exam.

Markov chains and stationary distributions
Value iteration and policy iteration algorithms
Q-learning and SARSA updates
Definition of contraction mapping and application of contraction mapping for proving convergence
Update of Q-learning with linear function approximation
Double estimator and update of DDQN
Score function definition and policy gradient theorem
REINFORCE and variance reduction
Natural policy gradient and its connection to soft policy iteration.
Performance difference lemma
UCB algorithm for multi-armed bandits
MCTS for Alpha-Zero
Potential-based reward shaping
DPO and DPO loss
Zeroth-order optimization
Contrastive learning
Relative value function and Blackwell optimality
Duality and KKT conditions

⬅️ [Past Exam](<./Past Exam.md>) | ⬆️ [Studying](<./README.md>) | [General](<./General.md>) ➡️