Inder/defer token
Infer/defer token
Perhaps we train the model to first output a token for under or defer that we can take the logits over to see how confident the model is. This is strictly worse than a value function method, but might be easier to implement and use for a first paper.
So paper 1 is clarification model. Paper 2 is optimal deferral perhaps learning a distribution over future AE where the input to the algorithm is a model and the output is a function that takes the model and input and returns the AE. Paper 3 is dataset that uses what I have learned to construct a better CQ setup that includes both the questions and strategies to construct human stand-ins for this specific task verified by my distributional similarity to humans that gets explored in the first or second paper.