22/01/2026 8:00 AM - SK Team
Acknowledging Focus Ambiguity in Visual Questions
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
Hayeon
Active vision stuff.
How does one train such a model?
Using under-specified instructions.
Look at ContextVLA. They do compression of the vision tokens.
So what is the idea for training?
Is there a dataset for evaluating task ambiguity? Is there a measurement you are thinking about?
What is a "memory" in this case?
Past frame compression?
How do we know if ambiguous or just difficult.
"Perhaps separate into subtasks before checking if each one of those subtasks is ambiguous"
I am curious, we have some types of ambiguity in LLMs like entropy of the output distribution. Is there research in how to tell whether a VLA is uncertain?
EvoVLA long-horizon memory is something interesting to look at.
Yayuan
When did they do the rubiks cube work?
Does what people need (Visual versus text) change when people are more familiar with the domain?
Uuuh in the qualitiative results... that's croching, not knitting.