Skip to content

21/01/2026 4:28 PM - Lecture

⬅️ [19/01/2026 3:14 PM - PACES](<./19_01_2026 3_14 PM - PACES.md>) | ⬆️ [EECS 504](<./README.md>) | [26/01/2026 2:41 PM - PACES](<./26_01_2026 2_41 PM - PACES.md>) ➡️

03 Images as Functions LIVE.pdf

Blobs!!! (And Scale Invariant Features)

Only at specific scales do curves look like corners.

Find a scale that gives an extrema of a function. Commonly the laplacian.

Derivative of Gaussian filter is 1D so it is not rotationally invariant (so bad for images). So for 2D we will often use the zero crossings of a laplacian. A second order gaussian derivative really.

Remember why you can use a second derivative of a guassian convolved with the function as equivalent to convolving and then taking the second derivative.

If kernel is too small then we will miss superstructure of edges. Like a ridge edge/blob is two step edges at some scales. This begs the question, why do we want edge detection? What does that give us downstream? That's not the scale that allows us to do things like detect penguins.

Blobs cause large responses in the laplacian when the filter scale aligns with the scale of the blob. So how do we get a response invariant to the scale of the blob?

Filter pyramids

Doesn't give us invariance, but more robust. Vary sigma, look at the sequence. This is called scale space. Pyramids are a discrete scale space, but continuous ones are mathematically useful objects.

Gaussian Pyramid
Take full res image. Take guassian with fixed sigma. Resample every other pixel (downsample by 1/2). Repeat. Use the same sigma each time.

Why choose powers of two for sizes?

This then makes it easy to get diff of guassians (approx laplacian) by upsampling the lower layer and then subtracting it from the one above it. You get courser responses to lower frequencies low down.

Characteristic scale (of a blob) = scale that causes the largest response to the laplician.

Talk

Suyogjain

PathAI

Model the expertise of individuals.

What does this mean. In training data generation: Solutions.

Stear annotation toward blind spots of the model.
Use a qualification task where you use majority voting to check who is annotating well and who is not. But really you have to actively be checking work.

Ask annotators to check the model instead of doing it from scratch.

So they were doing active learning.

Meta

Worked on EgoExo4D

They used LLMs to generate questions and answers for VQA.

People who are asked to generate questions about images often default to really broad or boring questions.
https://ai.meta.com/datasets/plm-data/

They were using probing questions so LLM generated questions don't degrade performance.

But this is not true in my case since we are training for specifically human interaction.

But it was used as a base model using SFT. So I guess the "understanding" translated.

But perhaps this is an indication that fine tuning on the human question set would make it work well even using a different distribution LLM question.

  • [ ] @TODO Think further about how to collect VQA questions from humans given the fact that humans generally default to easy questions.
  • [ ] @TODO Consider the fact that removing the set of answers from a constrained VQA could make the question ambiguous.

⬅️ [19/01/2026 3:14 PM - PACES](<./19_01_2026 3_14 PM - PACES.md>) | ⬆️ [EECS 504](<./README.md>) | [26/01/2026 2:41 PM - PACES](<./26_01_2026 2_41 PM - PACES.md>) ➡️