PVL 03
Computer Vision / Images as Functions
1. Images as Functions: Core Operations
Images can be interpreted as functions, allowing the application of mathematical operations. There are three core types of operations on images:
1. Spatial Range Operations: Integrate information over a region.
2. Range Map Operations: Compute operations at every possible location in the image domain.
3. Domain Operations: Geometrical transformations (e.g., translation, rotation).
Spatial Range Operations
A spatial range operation $f$ maps an image window/patch $W$ to a real value: $f: W \times I \rightarrow \mathbb{R}$.
Window ($W$): A subset of the image lattice. It does not need to be rectilinear or connected (e.g., superpixels are valid arbitrarily-shaped windows).
Linearity: An operation is linear if $f_W(\alpha_i I_i + \alpha_j I_j) = \alpha_i f_W(I_i) + \alpha_j f_W(I_j)$.
* Linear Examples: Summation of all pixels in a region.
* Non-Linear Examples: Max and min operators.
Haar Operator*: Characterizes region boundaries by taking the difference of summed pixel values within regions. E.g., $haar(I, {W}; {\alpha}) = \sum_i \alpha_i \text{sum}(I, W_i)$. This is widely used in AdaBoost algorithms for face detection.
Range Map Operations
Applies a spatial range operation $f$ at every possible location in the image domain $\Lambda$ to produce a new image $J$.
Single Pixel (Intensity Transformations): E.g., Negative image $J(s) = -I(s)$, or binary thresholding operators.
Discrete Convolution ($\otimes$): A range map operation applying a kernel $\kappa$ (a matrix of weights) over the image.
* Formula: $J(s, t) = \kappa \otimes I[W] = \sum_k \sum_l \kappa(k, l)I(s - k, t - l)$.
* Convolution is a linear operator.
Discrete Image Derivatives: Approximated using finite differences because $h$ is fixed to 1 pixel.
* First Derivative ($\nabla_x$): $\frac{dI(x,y)}{dx} = I(x+1, y) - I(x, y)$.
* Achieved by convolution with row-vector kernel $\kappa = [1, -1]$ for x-direction, or its transpose for y-direction.
Image Laplacians ($\nabla^2$): The second derivative operator, $\nabla^2 = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}$.
* Edges/intensity changes correspond to zero-crossings in the second derivative.
* Approximated by taking the Difference of Gaussians (e.g., subtracting an image convolved with a $\sigma=1$ Gaussian from one convolved with a $\sigma=2$ Gaussian).
2. Geometric Primitives
2D Primitives (Points, Lines, Conics)
- 2D Points:
- Inhomogeneous coordinates: $x = (x,y)$.
- Homogeneous coordinates: $\tilde{x} = (\tilde{x}, \tilde{y}, \tilde{w})$.
- Conversion back to inhomogeneous: Divide by $\tilde{w}$ to get $(x, y, 1)$.
- Points at infinity (Ideal points): Homogeneous points where $\tilde{w} = 0$.
- 2D Lines: Represented via homogeneous coordinates $\tilde{l} = (a, b, c)$.
- Line Equation: $\bar{x} \cdot \tilde{l} = ax + by + c = 0$.
- Normalized Line: $l = (\hat{n}_x, \hat{n}_y, d)$ where $||\hat{n}|| = 1$. $\hat{n}$ is the normal perpendicular to the line, and $d$ is distance to the origin.
- Intersection of two lines: $\tilde{x} = \tilde{l}_1 \times \tilde{l}_2$ (Cross Product).
- Line joining two points: $\tilde{l} = \tilde{x}_1 \times \tilde{x}_2$.
- 2D Conics: Represented by a quadric equation $\tilde{x}^T Q \tilde{x} = 0$.
3D Primitives (Points, Planes, Lines)
- 3D Points: Homogeneous vector $\tilde{x} = (\tilde{x}, \tilde{y}, \tilde{z}, \tilde{w})$.
- 3D Planes: Homogeneous vector $\tilde{m} = (a, b, c, d)$. Equation: $ax + by + cz + d = 0$.
- Plane at infinity: $\tilde{m} = (0, 0, 0, 1)$.
- 3D Lines: Constructed from two points $r = (1-\lambda)p + \lambda q$.
- If one point is at infinity $q = (\hat{d}, 0)$, then $\hat{d}$ is the direction of the line: $r = p + \lambda\hat{d}$.
- Can also be cleanly represented using Plücker coordinates via a $4 \times 4$ skew-symmetric matrix $L = \tilde{p}\tilde{q}^T - \tilde{q}\tilde{p}^T$, which yields 4 degrees of freedom.
3. Geometric Transformations
Transformations form a nested hierarchy of groups (closed under composition, possess an inverse).
2D Coordinate Transformations
Operate on 2D homogeneous coordinate vectors via $3 \times 3$ matrices.
1. Translation (2 DoF): Preserves orientation.
* $x' = x + t$ or via matrix $[I \;\; t]$.
2. Rigid / Euclidean (3 DoF): Rotation + Translation. Preserves lengths.
* Matrix $[R \;\; t]$ where $R$ is an orthonormal rotation matrix ($RR^T=I$).
3. Similarity (4 DoF): Scaled rotation. Preserves angles.
* Matrix $[sR \;\; t]$.
4. Affine (6 DoF): Preserves parallelism.
* Arbitrary $2 \times 3$ matrix $A$.
5. Projective / Homography (8 DoF): Preserves straight lines.
* $\tilde{x}' = \tilde{H}\tilde{x}$, where $\tilde{H}$ is an arbitrary $3 \times 3$ homogeneous matrix (defined up to scale). Requires dividing by the new last element $h_{20}x + h_{21}y + h_{22}$ to normalize back to inhomogeneous coordinates.
Transforming Co-vectors (Lines)
If a point is transformed by $\tilde{x}' = \tilde{H}x$, a line $\tilde{l}$ transforms as $\tilde{l}' = \tilde{H}^{-T}\tilde{l}$ (the transposed inverse).
3D Coordinate Transformations
Similar hierarchy to 2D but utilizes $4 \times 4$ matrices.
1. Translation: 3 DoF. Matrix $[I \;\; t]$.
2. Rigid (Euclidean): 6 DoF. Preserves lengths. Matrix $[R \;\; t]$.
3. Similarity: 7 DoF. Preserves angles. Matrix $[sR \;\; t]$.
4. Affine: 12 DoF. Preserves parallelism. Matrix $[A]$.
5. Projective: 15 DoF. Preserves straight lines. Homogeneous $4 \times 4$ matrix $[\tilde{H}]$.
Other Notable Transformations
- Planar Surface Flow: An 8-parameter transformation that works as a small motion approximation to a full homography.
- Bilinear Interpolant: An 8-parameter transformation interpolating the deformation of four non-collinear points (e.g., corners of a square), useful in spline interpolation.
Yes, the instructor specifically mentions something you need to remember for an exam regarding domain operations and geometric transformations.
Degrees of Freedom for Transformations
The instructor explicitly warns that you will be tested on the degrees of freedom for the various types of geometric transforms presented in the lecture. He emphasizes that even though a transformation might be represented by a 3x3 matrix, it is only constrained by a specific number of parameters (degrees of freedom), and says: "you will be asked at some point on some exam to give the degrees of freedom for some type of transform that's in these slides".
Based on the lecture and the course text, here are the transformations and their respective degrees of freedom in 2D:
Translation: 2 degrees of freedom ($t_x, t_y$).
Rigid Body / Euclidean Transform (Rotation + Translation): 3 degrees of freedom (1 for rotation angle $\theta$, 2 for translation).
Scaled Rotation / Similarity Transform: 4 degrees of freedom (1 for scale, 1 for rotation, 2 for translation).
Affine Transform: 6 degrees of freedom (allows for scaling, skewing, rotating, and translating).
Projective Transform / Homography:* 8 degrees of freedom.
A Past Exam Question (For Context)
Additionally, the instructor mentions a concept he used to ask on exams but no longer does. He previously asked students to explain how to define a range operator to mimic the "magic wand" tool from Photoshop. Because range map operations are inherently shift-invariant, they cannot actually implement a magic wand tool. He noted that he no longer gives this as an exam question because it "was not appreciated very well" by students, so he now just uses it as a teaching example in class instead.