Logistics
⬅️ [Past Exam](<./Past Exam.md>) | ⬆️ [Studying](<./README.md>) | [General](<./General.md>) ➡️
Logistic
- It is a closed-book exam. You can bring a one-page cheat sheet and use both sides.
- You can bring a calculator (without WiFi). You shouldn't use your cell phone as a calculator during the exam. You will be asked to put your cellphone and any other devices with Internet access in your backpack and leave your backpack at the front.
- Scratch papers will be distributed at the beginning of the exam. You should not use your own scratch papers during the exam.
- Please bring your U-M ID card to the exam.
- You will be seated according to a seating chart that will be available at the exam.
Topics:
- Markov chains and stationary distributions
- Value iteration and policy iteration algorithms
- Q-learning and SARSA updates
- Definition of contraction mapping and application of contraction mapping for proving convergence
- Update of Q-learning with linear function approximation
- Double estimator and update of DDQN
- Score function definition and policy gradient theorem
- REINFORCE and variance reduction
- Natural policy gradient and its connection to soft policy iteration.
- Performance difference lemma
- UCB algorithm for multi-armed bandits
- MCTS for Alpha-Zero
- Potential-based reward shaping
- DPO and DPO loss
- Zeroth-order optimization
- Contrastive learning
- Relative value function and Blackwell optimality
- Duality and KKT conditions
⬅️ [Past Exam](<./Past Exam.md>) | ⬆️ [Studying](<./README.md>) | [General](<./General.md>) ➡️