TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Tr
⬅️ [TreeRPO: TREE RELATIVE POLICY OPTIMIZATION](<./TreeRPO_ TREE RELATIVE POLICY OPTIMIZATION.md>) | ⬆️ [Reading List](<./README.md>) | [SimpleStrat: Diversifying Language Model Generation with Stratification](<./SimpleStrat_ Diversifying Language Model Generation with Stratification.md>) ➡️
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
https://arxiv.org/abs/2410.12854
tpo.pdf
- [ ] @TODO Read this paper
⬅️ [TreeRPO: TREE RELATIVE POLICY OPTIMIZATION](<./TreeRPO_ TREE RELATIVE POLICY OPTIMIZATION.md>) | ⬆️ [Reading List](<./README.md>) | [SimpleStrat: Diversifying Language Model Generation with Stratification](<./SimpleStrat_ Diversifying Language Model Generation with Stratification.md>) ➡️