Skip to content

PVL 10

⬅️ [PVL 11](<./PVL 11.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 09](<./PVL 09.md>) ➡️

Stereo Vision and 3D Reconstruction

1. Single-View Reconstruction & The Scale Ambiguity

  • The Projection Problem: You cannot reconstruct 3D depth/shape from a single calibrated view alone because of scale ambiguity. All points on a ray from the camera center project to the exact same pixel, meaning a large object far away and a small object nearby look identical.
  • Constraining Scale: To overcome this in a single view, you must know prior information about the scene, such as the exact physical size of an object (e.g., a mailbox or a traffic light) to compute the scale of the reconstruction. (Note: The lecturer explicitly mentioned this concept as a great exam question)

2. Stereo Vision & Triangulation

  • Triangulation: The process of intersecting rays from two respective cameras ($O_1$ and $O_2$) to find the precise 3D location of a point.
  • Disparity: The distance/offset between where a point projects in the first camera and where it projects in the second camera.
  • The Depth/Disparity Equation: For simple, perfectly parallel (verged) stereo cameras, Depth ($Z$) is calculated as: $Z = \frac{f \cdot T}{d}$ where $f$ is focal length, $T$ is the baseline distance, and $d$ is disparity.
  • Key Relationship: Disparity is inversely proportional to depth. Objects closer to the camera move a greater distance across the visual field than objects far away. (Note: The lecturer explicitly promised this will be on the exam)

3. Epipolar Geometry

Epipolar geometry is the 3D geometry that describes the relationship between two camera views and a 3D scene point.
Baseline: The vector/line connecting the two camera optical centers ($O_1$ and $O_2$).
Epipolar Plane: The plane formed by the baseline and the 3D point $X$ out in the world. It intersects both image planes.
Epipole ($e$): The point where the baseline intersects the image plane. It is also the exact projection of one camera's optical center into the other camera's view. * Epipolar Line: The 2D line formed by the intersection of the epipolar plane with the image plane. All epipolar lines in an image intersect at that image's epipole.
The Epipolar Constraint: The corresponding match for a point $x_1$ in the first image must lie somewhere along its corresponding epipolar line in the second image. This reduces the problem of finding point matches from a 2D image search to a 1D line search.

4. The Essential Matrix ($E$)

  • Use Case: Used when cameras are calibrated (i.e., the intrinsic camera matrix $K$ is known).
  • Definition: $E = [t]{\times} R$, where $R$ and $t$ are the rotation and translation relating the two cameras, and $[t]{\times}$ is the skew-symmetric matrix of $t$.
  • Algebraic Epipolar Constraint: $x_1^T E x_0 = 0$, where $x_0$ and $x_1$ are the corresponding normalized image points in the two views.
  • Properties: It is a $3 \times 3$ matrix, has a rank of 2, and has 5 degrees of freedom.
  • Multiplying $E$ by a point in image 1 yields the equation for the epipolar line in image 2.

5. The Fundamental Matrix ($F$)

  • Use Case: Used when cameras are uncalibrated (i.e., operating directly in pixel coordinates without knowing $K$).
  • Definition: $F = K_1^{-T} E K_0^{-1}$.
  • Algebraic Epipolar Constraint: $p'^T F p = 0$, where $p$ and $p'$ are the corresponding pixel coordinates in the left and right images.
  • Properties: It is a $3 \times 3$ matrix, has a rank of 2, and has 7 degrees of freedom.
  • Like the Essential matrix, $F p$ computes the epipolar line in the corresponding image.

6. The Normalized Eight-Point Algorithm

This is the standard algebraic method used to estimate the Fundamental matrix $F$ (or Essential matrix $E$) from point correspondences.
1. Normalization (Crucial Step): You must first center the points around their centroid and scale them (often to unit variance) before solving. Without this, the system will be highly ill-conditioned.
2. Setup Linear System: Each of the 8 corresponding points provides a bilinear equation, forming an $A$ matrix.
3. Solve via SVD: Solve the homogeneous linear system using Singular Value Decomposition, taking the singular vector corresponding to the smallest singular value.
4. Enforce Rank-2 Constraint: Because $F$ and $E$ must be rank 2, take the SVD of the resulting matrix, set the smallest singular value in the diagonal matrix to 0, and recompose.
5. Denormalize: Undo the coordinate transformations applied in step 1 to get the final matrix.

7. Stereo Image Rectification

  • Rectification is the process of applying a homography to both images to artificially align their scanlines and make their image planes parallel.
  • Geometrically, this forces the epipoles to infinity.
  • Purpose: It makes programming correspondence matching much easier. Instead of searching along angled epipolar lines, you only need to search horizontally along parallel image scanlines.

Specific concepts to remember for the test:
Disparity is inversely proportional to depth: The speaker emphasizes that this is a "fundamental equation to remember" and explicitly states it is a "take-home message" that they promised to ask about on the exam. * Solving the scale problem in single-camera reconstruction:* When discussing how to overcome the scale ambiguity caused by single-view projection, the speaker notes that using objects of known standard dimensions (such as a mailbox or traffic light) to constrain the scale is a "great exam question".

Topics explicitly excluded from the test:
Seeing hidden images in stereograms: When showing a random dot stereogram of a "flower pot", the speaker jokes that students will fail the test if they cannot see it, but quickly clarifies that the exercise is "just a fun thing" and will not be on the final exam.
Stereo image rectification: The speaker briefly introduces the concept of stereo image rectification—the process of mathematically aligning scan lines to simplify finding correspondences—but assures the class, "I won't ask you about this on the exam or anything like that".


⬅️ [PVL 11](<./PVL 11.md>) | ⬆️ [PVL Summaries](<./README.md>) | [PVL 09](<./PVL 09.md>) ➡️