Skip to content

22/01/2026 8:00 AM - SK Team

⬅️ [20/01/2026 10:11 AM - SK Presentation Prep](<./20_01_2026 10_11 AM - SK Presentation Prep.md>) | ⬆️ [Lab Meetings](<./README.md>) | [22/01/2026 2:09 PM - Lab Meeting](<./22_01_2026 2_09 PM - Lab Meeting.md>) ➡️

Acknowledging Focus Ambiguity in Visual Questions

CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering

AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions

Hayeon

Active vision stuff.
How does one train such a model?
Using under-specified instructions.

Look at ContextVLA. They do compression of the vision tokens.

So what is the idea for training?

Is there a dataset for evaluating task ambiguity? Is there a measurement you are thinking about?

What is a "memory" in this case?

Past frame compression?

How do we know if ambiguous or just difficult.
"Perhaps separate into subtasks before checking if each one of those subtasks is ambiguous"

I am curious, we have some types of ambiguity in LLMs like entropy of the output distribution. Is there research in how to tell whether a VLA is uncertain?

EvoVLA long-horizon memory is something interesting to look at.

Yayuan

When did they do the rubiks cube work?

Does what people need (Visual versus text) change when people are more familiar with the domain?

Uuuh in the qualitiative results... that's croching, not knitting.


⬅️ [20/01/2026 10:11 AM - SK Presentation Prep](<./20_01_2026 10_11 AM - SK Presentation Prep.md>) | ⬆️ [Lab Meetings](<./README.md>) | [22/01/2026 2:09 PM - Lab Meeting](<./22_01_2026 2_09 PM - Lab Meeting.md>) ➡️