PVL 04

⬅️ [PVL 05](<./PVL 05.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 03](<./PVL 03.md>) ➡️

Image Stitching & Feature Detection

Backward vs. Forward Transformation: Backward transformation is preferred over forward transformation. Forward mapping pushes source pixels into the target grid, which can leave "holes" that require arbitrary intensity distribution. Backward mapping iterates through target pixel locations, applies the inverse transform to find the source location, and uses interpolation to compute sub-pixel values.
2D Parametric Transformations: Different alignment models use different degrees of freedom: Translation (2 parameters), Euclidean (3 parameters), Similarity (4 parameters), Affine (6 parameters), and Projective/Homography (8 parameters).
Least Squares Alignment: Finding the best transformation between two sets of matched features is done by minimizing the sum of squared residuals (the error between predicted and measured locations) using linear least squares.

Dense vs. Local Features: Local features are sparse points extracted from an image, whereas dense features exist at every pixel. Local features are more efficient, and highly robust to clutter and occlusion.
Key Challenges: Good features must be highly repeatable (detectable across varying conditions) and distinctive/unique.
Invariance: A feature is invariant if it does not change under a specific set of transformations.
- Geometric Invariance: Robustness to translation, rotation, scale, and viewpoint changes.
- Photometric Invariance: Robustness to lighting and illumination changes.

The Goal: Find image patches (windows) that are locally unique and can be precisely localized.
Corners vs. Edges vs. Flat Regions:
- Corners are the best features because they contain strong gradients in at least two significantly different orientations, making them highly localizable.
- Flat regions provide no texture and cannot be localized.
- Edges suffer from the aperture problem, meaning the patch can only be localized normal to the edge direction, leaving ambiguity along the edge.
Auto-Correlation / Sum of Squared Differences (SSD): Local uniqueness is measured by shifting a window slightly and computing the SSD between the original and shifted windows. A unique window will produce a high error (large change) when moved in any direction.
Taylor Series Approximation: To estimate local uniqueness efficiently, the shifted image function is approximated using a first-order Taylor series expansion, which isolates the image gradients ($I_x$ and $I_y$).
The Structure Tensor (Auto-Correlation Matrix): The mathematical expansion yields a $2 \times 2$ matrix (often denoted $A$ or $H$) known as the structure tensor. This matrix captures the intrinsic texture structure of the window by summing the outer products of the gradients ($I_x^2$, $I_y^2$, and $I_x I_y$) over the pixels in the window.

Eigenvalue Analysis: Analyzing the eigenvalues ($\lambda_0, \lambda_1$) and eigenvectors of the structure tensor tells us everything about the uniqueness of the window.
- Two large eigenvalues: The window is a corner (rich texture, highly unique).
- One large, one small eigenvalue: The window is an edge (aperture problem).
- Two small eigenvalues: The window is a flat region.
Scoring Functions: Because computing exact eigenvalues (requiring square roots) can be expensive, several rotationally invariant scalar formulas are used to find keypoints where both eigenvalues are large:
- Shi-Tomasi: Uses the minimum eigenvalue directly: $\min(\lambda_0, \lambda_1)$.
- Harris & Stephens: Uses the determinant and trace: $\det(A) - \alpha \text{trace}(A)^2$ (where $\alpha \approx 0.06$).
- Harmonic Mean: Uses $\frac{\det A}{\text{tr} A}$, which behaves smoothly when the eigenvalues are roughly equal.

Adaptive Non-Maximal Suppression (ANMS): Standard detectors may cluster points in high-contrast areas; ANMS forces an even spatial distribution across the image by only selecting features that are a local maximum within a dynamically calculated radius $r$.
Image Pyramids for Scale Invariance: To detect features robustly across different sizes, operations are performed at multiple resolutions using pyramids.
- Gaussian Pyramid: Repeatedly blurs and sub-samples the image.
- Laplacian Pyramid: Stores the band-pass difference between the original image and an upsampled low-pass version, allowing for perfect reconstruction.

Yes, the instructor specifically mentions what to expect on an exam, as well as the main takeaway to remember from the lecture:

What will not be on the exam: When discussing the Taylor series approximation of the image function (which involves assuming the image is infinitely differentiable and dropping higher-order terms for a linear approximation), the instructor explicitly tells students not to worry, as they will not be asked to do this mathematical derivation on an exam. It is provided purely for their understanding of the underlying concepts.
What to remember from the class: The instructor emphasizes that the core topic students should remember after the class is how to find good feature points.

⬅️ [PVL 05](<./PVL 05.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 03](<./PVL 03.md>) ➡️