Inder/defer token

⬅️ [Intentionally answering questions wrong](<./Intentionally answering questions wrong.md>) | ⬆️ [Ideas](<./README.md>) | [How to compare types of clarifying questions?](<./How to compare types of clarifying questions_.md>) ➡️

Infer/defer token

Perhaps we train the model to first output a token for under or defer that we can take the logits over to see how confident the model is. This is strictly worse than a value function method, but might be easier to implement and use for a first paper.

So paper 1 is clarification model. Paper 2 is optimal deferral perhaps learning a distribution over future AE where the input to the algorithm is a model and the output is a function that takes the model and input and returns the AE. Paper 3 is dataset that uses what I have learned to construct a better CQ setup that includes both the questions and strategies to construct human stand-ins for this specific task verified by my distributional similarity to humans that gets explored in the first or second paper.

⬅️ [Intentionally answering questions wrong](<./Intentionally answering questions wrong.md>) | ⬆️ [Ideas](<./README.md>) | [How to compare types of clarifying questions?](<./How to compare types of clarifying questions_.md>) ➡️