Skip to content

PVL 02

⬅️ [PVL 03](<./PVL 03.md>) | ⬆️ [PVL Summaries](<./README.md>) | [Prompts](<./Prompts.md>) ➡️

Images as Functions

1. The Four Views of Images

Images can be conceptualized as functions mapping a domain (pixel locations) to a range (pixel values). The space of all possible images is vast (e.g., $2^{100}$ for a 10x10 binary image), but most represent "white noise". Natural images have highly structured properties, where most neighboring pixels share similar or identical values.

Images are categorized into four types based on physical existence vs. abstract models, and continuous vs. discrete domains:

A. Perfect Images (Continuous, Physical)

  • Definition: A continuous image generated by a physical process mapping points on a plane to real numbers (e.g., $\mathbb{R}^2 \rightarrow \mathbb{R}^D$).
  • Properties: Exist purely in abstraction; cannot be practically sampled or captured directly. Outputs are non-negative.
  • Lambertian Model: A classic model of continuous image formation.
    • Formula relies on the light source direction ($L$), surface normal ($N$), and surface properties/albedo ($\rho$).
    • Reflectance relates to the cosine of the angle of incident light ($L \cdot N = \cos \theta$).
    • It is independent of the viewing direction ($D$) (no specularity).

B. Digital Images (Discrete, Physical)

  • Definition: A sampled and quantized version of a perfect image. Transduces captured energy into processable signals.
  • Sampling: Specifies the domain by discretizing the continuous space ($\mathbb{R}^2$) into pixel locations on a lattice based on camera sensor properties.
  • Quantization: Specifies the range by mapping continuous energy values into discrete bins (e.g., 8-bit data, mapping values to $0-255$).
  • Bayer Filter: Hardware approach to capture RGB colors using a single sensor array.
    • Arrangement: Alternating filters, typically 2 Green, 1 Red, 1 Blue per block.
    • There are more green pixels because the human eye is more sensitive to green wavelengths.
    • Demosaicing: The algorithmic process of averaging and smoothing to convert the sparse Bayer pattern into a dense RGB color image.

C. Discrete Models (Discrete, Abstract)

  • Definition: Direct mathematical models of digital images mapping from integer pixel locations to integer values ($\mathbb{Z}^2 \rightarrow \mathbb{Z}^D$). Can be used to extract "feature images" (e.g., gradient filters).
  • The Potts Model: An energy function from statistical physics used to enforce natural image structure (smoothness).
    • Energy Function: $E = \beta \sum \text{Indicator}(I_{s,t} \neq I_{s+1, t}) + \dots$.
    • Pays a penalty ($\beta$) every time neighboring pixels differ.
    • Application (Denoising): Used as a regularizer. An energy function is minimized: $E(J) = D(I, J) + S(J)$, where $D$ is a data term (e.g., sum of squared differences ensuring output $J$ is close to noisy input $I$) and $S$ is the Potts model ensuring local smoothness to remove spot noise.
    • Implementation Trick: Looping through pixels is slow. It can be vectorized by translating the image array by one pixel, taking the difference, and summing.

D. Continuous Models (Continuous, Abstract)

  • Definition: Maps continuous domains to continuous ranges ($\mathbb{R}^2 \rightarrow \mathbb{R}^D$).
  • Applications: Used when smooth, sub-pixel boundaries are required, such as in medical imaging segmentation or rotoscoping in video editing.
  • Continuous Analog of Potts Model: Known as the weak membrane model. Instead of discrete differences, it integrates the absolute value of the gradient of the continuous image over the domain.

2. Interpolation

Used to evaluate values at continuous sub-pixel locations from a discretely sampled digital grid.
Nearest Neighbor: Snaps to the closest sampled pixel value. Results in blocky artifacts and discontinuities when zoomed in.
Bilinear Interpolation: Computes a smooth value by taking a weighted average of the 4 nearest grid points. * The weight of each neighbor is inversely proportional to its distance (or bounding area) to the target sub-pixel location.
* Uses 1D linear interpolations successively (e.g., first across the rows, then down the columns).
* Can be formulated mathematically as a convex combination using barycentric coordinates. * Calculated independently for each color channel.

Yes, the speaker specifically mentions a few key items to remember for upcoming assessments and assignments:

  • Upcoming Quiz: You will need to know how to implement the Potts model in code for a quiz "next week". The professor specifically wrote out pseudo-code during the lecture to demonstrate how to convert the mathematical energy function of the Potts model into a practical programmatic implementation.
  • Homework 2: You will likely have to implement a Bayer demosaicing algorithm (which combines sparsely sampled red, green, and blue channels into a dense color image) for your second homework assignment.
  • Course Project: You must remember to come up with a "startup pitch and prototype" for an actual application for the course project. The professor suggested ideas like automatic rotoscoping or computing the plane of a football end zone.
  • What NOT to memorize: When explaining bilinear interpolation, the professor explicitly stated that you do not need to remember the specific mathematical breakdown for calculating the weights (alpha 1 through alpha 4) as barycentric coordinates.

⬅️ [PVL 03](<./PVL 03.md>) | ⬆️ [PVL Summaries](<./README.md>) | [Prompts](<./Prompts.md>) ➡️