Learn to create ambiguity using an adversarial cycle system

⬅️ [LLM Assistant](<./LLM Assistant.md>) | ⬆️ [Ideas](<./README.md>) | [Intentionally answering questions wrong](<./Intentionally answering questions wrong.md>) ➡️

Learn to create ambiguity using an adversarial cycle system.
One’s job is to make a question that cannot be easily answered without follow ups and the other is tasked to try to answer the question as fast as possible. However, if the question is never answerable it also gets punished so both simultaneously improve. One gets better at asking more and more difficult and confusing questions while the other learns to more quickly find the correct answer.

We use the same system as I currently am for training the policy that asks clarifying questions. A pretrained large model to take the full context and provide answers to the questions interacting with the smaller clarifying question policy that is trained using GRPO. We just add the policy that creates the original question with the ranking function being that questions that are never answerable get ranked lowest then rank by larger number of required clarifying questions being better.

It might be easiest to generate ambiguity from an existing VQA dataset using this. So condition the question generation model on the original unambiguous question and answer as well as the image. Then the score for the final inference model is dependent on how well it can recover the original answer.

⬅️ [LLM Assistant](<./LLM Assistant.md>) | ⬆️ [Ideas](<./README.md>) | [Intentionally answering questions wrong](<./Intentionally answering questions wrong.md>) ➡️