15/01/2026 2:00 PM

⬅️ [12/01/2026 10:02 AM PACES](<./12_01_2026 10_02 AM PACES.md>) | ⬆️ [Lab Meetings](<./README.md>) | [20/01/2026 10:11 AM - SK Presentation Prep](<./20_01_2026 10_11 AM - SK Presentation Prep.md>) ➡️

15/01/2026 2:00 PM

Zhi Presentation
Don't these make a lot more work for inference?
Where do the datasets come from?
I would like to see what the inputs are on the point control video slide.
If the prompt says in the middle but the auxilary inputs disagree somehow, what should the model do?
How do you propogate the features forward?

Warping the noise before the UNET is very interesting.

Have other researchers used embeddings as part of the point motion?

Does object state change actually work? I thought you said that you do not evolve the features over time? Or do you vary them linearly with time? If there are not particles on the onion, why should there be improved visuals on the chopped onions?

⬅️ [12/01/2026 10:02 AM PACES](<./12_01_2026 10_02 AM PACES.md>) | ⬆️ [Lab Meetings](<./README.md>) | [20/01/2026 10:11 AM - SK Presentation Prep](<./20_01_2026 10_11 AM - SK Presentation Prep.md>) ➡️