PVL 05
⬅️ [PVL 06](<./PVL 06.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 04](<./PVL 04.md>) ➡️
Images as Points
1. Core Concept: Images as Points vs. Functions
- Images as Functions: Maps spatial domains to a value range (intensities/color). Limits certain operations, such as effectively comparing the distance between different images.
- Images as Points: An alternative interpretation treating an entire image as a single point or a vector in a vector space.
2. The Vector Space of Images
- Definition: Let $\Omega$ define the set of all images of a certain size (same domain and range-space).
- Vector Space Axioms: The set $\Omega$ over the field of reals $\mathbb{R}$, along with vector addition and scalar multiplication, forms a vector space $V_\Omega$.
- Properties: It satisfies vector space axioms including closure under addition/scalar multiplication, commutativity, associativity, the existence of identities (a zero image $\mathbf{0}$ and a ones image $\mathbf{1}$), additive inverses, and distributivity.
3. The Cartesian Basis
- Naïve Interpretation: An $m \times n$ image can be represented as an $m \times n$ vector of coefficients (pixel values) $\alpha_1, \alpha_2, ..., \alpha_{mn}$.
- Basis Images ($V_i$): A set of images that have zeros everywhere except for a $1$ in the $i$-th pixel location.
- Reconstruction: An image $I$ can be exactly reconstructed using the weighted sum of these basis images: $I = \sum_{i=1}^{mn} V_i \alpha_i$.
- Linear Operators: For a linear operator $Q$ (e.g., rotation), you can change the basis while maintaining the same image information: $Q \circ I = \sum_{i=1}^{mn} (Q \circ V_i) \alpha_i$. The basis images are altered (e.g., rotated), but the coefficients $\alpha_i$ remain untouched.
4. Distance Between Images & The Manifold Problem
- Euclidean Distance: The simplest way to compute the distance between two image vectors $\alpha$ and $\beta$ is the $\ell_2$-norm: $||\alpha - \beta||_2$.
- The "Beer Foam" Analogy: The set of all possible pixel combinations creates a massive, messy space. However, natural valid images occupy only a very small, highly structured sub-space within it (like the thin layer of actual beer versus the large volume of a foamy head).
- Limitation of $\ell_2$-Norm: Because the sub-space of natural images is a highly nonlinear manifold, standard Euclidean distance can be a bad metric; an image might mathematically seem closer to an unrelated image than to a structurally similar one on the manifold.
5. Alternative Manually Defined Bases
- Fourier Transform: A linear transformation converting images from the spatial domain to the frequency domain using a basis of sinusoidal waves.
- Magnitude & Phase: Transforms yield complex outputs defined by Magnitude (determines frequency) and Phase (determines orientation).
- Phase Importance: All natural images have similar magnitude transforms; thus, Phase holds the most critical perceptual information (demonstrated by swapping the phase and magnitude of a cheetah and a zebra).
- Convolution Theorem: $F(H*G) = F(H)F(G)$. Convolving image $H$ by $G$ attenuates frequencies where $G$ has low power and amplifies those where it has high power.
- Support: Fourier bases have non-compact support (each basis vector is a function of the entire image).
- Haar Wavelet Transform: Unlike Fourier, Haar wavelets provide a basis with spatial compact support, meaning they are localized to specific regions rather than spanning the whole image.
- Gabor Filters: Uses sinusoidal waves overlaid with a Gaussian weighting function to achieve compact support.
6. Learning Bases
- Motivation: The Cartesian basis is highly expensive, requiring one basis image per pixel. Since natural image gradients are highly structured, we can learn significantly more compact bases.
- Methods: Instead of manually specifying bases, they can be learned from data via deep learning (where convolutional layers/kernels act as learned basis images) or through methods like Eigenfaces.
The instructor explicitly mentions that you do not need to know or remember the mathematical details of the Fourier transform for a future exam, as that topic is more related to signal processing. Additionally, they note that there is no need to deeply review the mathematical definitions of the Fourier and Haar bases because they are not central to computer vision.
⬅️ [PVL 06](<./PVL 06.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 04](<./PVL 04.md>) ➡️