PVL 06
Learning Image Bases & Dimensionality Reduction
1. Foundations of Learning a Linear Basis
While analytical bases (like Fourier or Haar) are predefined, learning a linear basis allows the data to define its own structure, which is useful when image content lives on a lower-dimensional subspace.
0D Representation (Sample Mean): The best single vector $x_0$ to represent $n$ samples minimizes the sum of squared distances $J_0(x_0) = \sum_{k=1}^n ||x_0 - x_k||^2$. The minimizer is the sample mean $m = \frac{1}{n}\sum_{k=1}^n x_k$.
1D Representation (Line): To represent data on a line $x = m + ae$, we minimize $J_1(e)$, which simplifies to maximizing $e^TSe$ subject to $||e||=1$.
* Scatter Matrix ($S$): Defined as $S = \sum_{k=1}^n (x_k - m)(x_k - m)^T$, which is a multiple of the sample covariance matrix.
Eigenvalue Problem: Solving for the optimal direction $e$ via Lagrange multipliers yields $Se = \lambda e$. The best 1D estimate is the eigenvector corresponding to the largest eigenvalue* of $S$.
2. Principal Component Analysis (PCA) & Eigenfaces
PCA extends the 1D line to a $d'$-dimensional subspace by taking the $d'$ eigenvectors of the scatter matrix with the largest eigenvalues.
Goal: Optimal for reconstruction and data compression from a low-dimensional basis, minimizing reconstruction error.
Limitation: PCA maximizes total scatter, which includes within-class scatter. Therefore, it retains unwanted variations (like lighting) and is not optimal for discrimination/classification.
Eigenfaces Algorithm:
1. Vectorize images, compute the mean face $\mu$, and subtract it from all training faces.
2. Compute the covariance/scatter matrix $S_T$.
3. Compute the top $m$ eigenvectors (Eigenfaces).
4. Project novel faces into this subspace (yielding coefficients $w_1 \dots w_k$) and use Nearest Neighbor for recognition.
Computational Trick: Since the number of images $N$ is much smaller than the number of pixels, compute the eigenvectors of $X^T X$ instead of the massive $XX^T$ covariance matrix (where $X$ contains the mean-subtracted images). The eigenvectors of the true covariance matrix are then $X$ times the eigenvectors of $X^T X$.
Lighting Hack:* Dropping the first 3 principal components in Eigenfaces often improves recognition because they primarily capture lighting variations.
3. Fisher Linear Discriminant (FLD) & Fisherfaces
FLD (or Linear Discriminant Analysis - LDA) seeks directions that are efficient for discrimination rather than just representation.
Goal: Try to "shape" the scatter to separate classes by maximizing the ratio of the between-class scatter ($S_B$) to the within-class scatter ($S_W$).
Scatter Matrices:
* $S_W = \sum_{i=1}^c S_i$ (measures variation within the same class).
* $S_B = \sum_{i=1}^c n_i(m_i - m)(m_i - m)^T$ (measures distance between class means).
Objective Function: Maximize the generalized Rayleigh quotient: $J(W) = \frac{|W^T S_B W|}{|W^T S_W W|}$.
* The Singularity Problem in Faces: For face images, $S_W$ is always singular because the number of training images is much smaller than the number of pixels (rank is at most $N-c$).
Fisherfaces Solution: Use PCA to first reduce the dimension to $N-c$ (making $S_W$ nonsingular), and then apply FLD to reduce the dimension to $c-1$.
Performance:* Fisherfaces generally outperform Eigenfaces when faces are subject to heavy lighting variations and varying expressions.
4. Alternative Linear Methods
- Correlation (Nearest Neighbor in Image Space): Images are normalized (zero mean, unit variance) to fix lighting intensity, and classification is done by nearest neighbor. Cons: Highly computationally expensive, requires huge storage, and sensitive to noise.
- Linear Subspaces: Exploits the fact that images of a Lambertian surface under varying illumination lie in a 3D linear subspace. Reconstructs tests images using a 3D basis specific to each class. Cons: Fails with self-shadowing and facial expressions, and is computationally expensive.
- IMPCA (Image PCA): Instead of vectorizing images (which leads to huge dimensions and slow computation), IMPCA computes PCA directly on 2D images using an Image Total Scatter Matrix $S_I = \sum (A_j - M)^T(A_j - M)$. Pros: Massive reduction in feature extraction time while maintaining or exceeding Eigenface/Fisherface recognition rates.
5. Locally Linear Embedding (LLE)
Used when data lives on a non-linear manifold rather than a linear subspace.
Core Assumption: Each data point and its neighbors lie on or close to a locally linear patch of the manifold.
LLE Algorithm (3 Steps):
1. Neighborhoods: Find the $K$ nearest neighbors for each data point $x_i$.
2. Weights (Local Geometry): Compute weights $W_{ij}$ that best reconstruct $x_i$ from its neighbors by minimizing the cost $J_{LLE1}(W) = \sum_i ||x_i - \sum_j W_{ij}x_j||^2$, subject to $\sum_j W_{ij} = 1$. These weights are invariant to rotation, scaling, and translation.
3. Embedding (Global Coordinates): Compute the low-dimensional vectors $y_i$ that are best reconstructed by the fixed weights $W_{ij}$, minimizing $J_{LLE2}(y) = \sum_i ||y_i - \sum_j W_{ij}y_j||^2$. Solved using the bottom $d+1$ non-zero eigenvectors of the constructed $M$ matrix.
Complexity:* Nearest neighbors $O(Dn^2)$ or $O(n \log n)$ with partitioning; Weights $O(DnK^3)$; Projection $O(dn^2)$. Involves very sparse matrices for efficiency.
The sources mention a couple of specific things regarding what is and is not required for the class, which indicates what students will not be tested on:
- Lagrangian Multipliers: When explaining how to use a Lagrangian multiplier to enforce a unit norm constraint on an objective function, the instructor clarifies that students do not need to know the deep mathematical background. The instructor explicitly states, "It's not critical that you know this. You're not going to have be asked to do this in this class".
- Supplemental Slides: There is a slide that explicitly says "STOP!" and warns students that the subsequent material (which covers other ways of learning image bases like the Fisher Linear Discriminant) is "supplemental for 504 and not required". The only exception to this rule is the material on Eigenfaces.