Diversity is all you need
⬅️ [Evolution Strategies at the Hyperscale](<./Evolution Strategies at the Hyperscale.md>) | ⬆️ [Reading List](<./README.md>) | [Decision Tree Policy Optimization](<./Decision Tree Policy Optimization.md>) ➡️
Point of method is to maximize the mutual information between states and skills
Lol what does this even mean?
Already know that a discrimination objective is equivalent to maximizing the mutual information between a skill and some aspect of the trajectory.
Use maximum entropy policies to force skill diversity.
At start of each episode, sample a skill. Skill is fed to both the policy which executes the skill and acts as part of the loss in the discriminator which predicts the skill based on the trajectory.
⬅️ [Evolution Strategies at the Hyperscale](<./Evolution Strategies at the Hyperscale.md>) | ⬆️ [Reading List](<./README.md>) | [Decision Tree Policy Optimization](<./Decision Tree Policy Optimization.md>) ➡️