PVL 08

⬅️ [PVL 09](<./PVL 09.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 07](<./PVL 07.md>) ➡️

Image Segmentation as Graphs

1. Problem Definition & Energy Functions

Goal: Perform two-class (foreground/background) segmentation on an image, acting like an automatic magic wand. Given an image with $mn$ pixels, we want to find a configuration of binary variables $x_1, \dots, x_{mn}$ where $x_u = 1$ if pixel $u$ is foreground, and $0$ if it is background.
Unary Energy ($E_1$): Evaluates pixels independently using a foreground feature function $f(I(u);\theta)$ and a background function $b(I(u);\phi)$.
Formula: $E_1(x) = \sum_u x_u f(I(u);\theta) + \sum_u (1-x_u) b(I(u);\phi)$.
The Myopia Problem: Minimizing $E_1$ independently at each pixel is computationally easy, but it leads to noisy, nearsighted results because it ignores neighboring pixel contexts.
Pairwise Energy ($E_2$): Adds a regularizing term to enforce spatial continuity and model a Markov Random Field.
Formula: $E_2(x) = \sum_u x_u f(I(u);\theta) + \sum_u (1-x_u) b(I(u);\phi) + \sum_{u \sim v} |x_u - x_v|(1 - R(u,v))$.
The affinity function $R(u,v)$ characterizes the agreement or similarity between neighboring pixels $u$ and $v$.
A penalty is paid only when neighboring pixels are assigned different labels ($x_u \neq x_v$).
The penalty is higher when the pixels have similar intensities, forcing the model to reconsider placing a boundary between them.
Finding the optimal configuration $x^*$ to minimize $E_2$ is a complex combinatorial problem because changing one pixel's label has global rippling effects across its neighbors.

2. Graph Formulation

Graph Setup: The image is represented as a directed graph $G = (V, E)$.
Nodes: Every pixel is a node, plus two additional special nodes: a Source ($S$, representing foreground) and a Sink ($T$, representing background).
Edges:
Source to pixels: Directed edges with capacities based on the unary foreground/background costs.
Pixels to Sink: Directed edges with capacities based on the unary background/foreground costs.
Pixel to Pixel: Directed edges in both directions between neighboring pixels, with capacities based on the regularizer $R(u,v)$.

3. Flow Networks

Flow Definition: A real-valued function $f: V \times V \to \mathbb{R}$.
Flow Constraints:
Capacity Constraint: Flow on any edge must be non-negative and cannot exceed its capacity: $0 \le f(u,v) \le c(u,v)$.
Flow Conservation: For all nodes except $S$ and $T$, the total flow coming in must exactly equal the total flow going out.
Value of Flow $|f|$: The total net flow out of the source minus the flow into the source.
Maximum Flow Problem: Find the flow $f$ that maximizes the value $|f|$ from $S$ to $T$.

4. Ford-Fulkerson Algorithm & Residual Networks

Algorithm Idea: Initialize flow to 0, and iteratively increase the flow value by finding augmenting paths in a residual network until no more paths exist.
Residual Network ($G_f$): A dynamic graph that tracks available (residual) capacities after some flow has been pushed through the network.
Residual capacity formula: $c_f(u,v) = c(u,v) - f(u,v)$ if $(u,v) \in E$, and $c_f(u,v) = f(v,u)$ if $(v,u) \in E$.
Reversing flow: To maintain flow conservation, pushing flow in one direction requires adding "hallucinated" edges with residual capacity in the opposite direction, which allows for flow cancellation later.
Augmenting Path: A simple path from $S$ to $T$ in the residual network $G_f$ where every edge has residual capacity $c_f > 0$.
The flow can be increased by the minimum residual capacity found along that path.
Path Ordering: The order in which augmenting paths are chosen does not affect reaching the final global optimum, but it can significantly impact the speed and number of iterations required to finish.

5. Max-Flow Min-Cut Theorem

Cut: A partition of the graph's nodes into two sets, $S_{set}$ and $T_{set}$, where the source is in $S_{set}$ and the sink is in $T_{set}$.
Min-Cut: The cut across which the capacity of the severed edges is minimized over all possible cuts.
Theorem: The value of the maximum flow in the network is exactly bounded by and equal to the capacity of the minimum cut.
Extracting the Segmentation: Once the max flow is found, there are no more augmenting paths in the residual network.
The set of nodes still reachable from the Source in the residual network becomes the foreground segment ($x_u = 1$).
The unreachable nodes belong to the background segment ($x_u = 0$).
This Min-Cut mathematically guarantees the global minimum of the complex pairwise energy function $E_2$.

The sources do not explicitly list specific things that must be remembered for a test. However, the speaker does make a few comments regarding what will and will not be assessed:

General Exam Background: The speaker notes that the class is "heading towards the exam" and that the lecture material—specifically focusing on images as graphs and two-class foreground/background segmentation—will give students a "good background" for the exam and their course project.
What Not to Memorize: When briefly showing how the min cut problem can be formulated as a linear problem, the speaker explicitly tells the audience, "Don't worry about the details here here. You'll never be asked about it".

⬅️ [PVL 09](<./PVL 09.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 07](<./PVL 07.md>) ➡️