Skip to content

Question 3 example 1

⬅️ [Question 3 example 2](<./Question 3 example 2.md>) | ⬆️ [Generated Test Questions](<./README.md>) | [Question 2 example 2](<./Question 2 example 2.md>) ➡️

Problem 3: Comprehension and Application - 3D Vision and Stereo (25 pts)

Context:
You have been hired to design the visual perception system for a new autonomous delivery rover on North Campus. The rover needs to accurately estimate the distance to obstacles (like pedestrians or light poles) to navigate safely.

(A) (7 pts) Initially, your prototype uses a single, fully calibrated camera. The camera successfully detects a bounding box around a stop sign in its view. However, you find that you cannot determine the exact 3D distance to the stop sign from this single frame alone. Conceptually explain why you cannot reconstruct the 3D depth from a single view. Then, describe one piece of prior information (a constraint) about the scene you could provide to the system to overcome this ambiguity and estimate the distance.

(B) (6 pts) To improve the rover, you upgrade to a parallel stereo camera setup with a known baseline distance ($T$) and focal length ($f$). State the relationship between the measured disparity ($d$) of the stop sign and its depth ($Z$) in the real world. If a pedestrian suddenly walks much closer to the rover, what will happen to the disparity of the pedestrian in your stereo images?

(C) (8 pts) To calculate disparity, your system must find matching points (correspondences) between the left and right camera images. Explain how epipolar geometry fundamentally reduces the computational complexity of this matching problem. In your answer, define the "epipolar constraint" and explain how it restricts the search area for a corresponding point.

(D) (4 pts) Before running your correspondence matching algorithm, you apply a geometric transformation (a homography) to both images that artificially aligns their scanlines. What is the name of this process, and how does it make the correspondence search from Part C even easier to program?


Answer Key:

(A) Single View Depth & Scale Ambiguity (7 pts)
Explanation (4 pts): You cannot reconstruct 3D depth from a single view because of scale ambiguity. All points along a 3D ray extending from the camera's optical center project to the exact same 2D pixel on the image plane. Therefore, a small object very close to the camera and a large object very far away will produce the exact same image projection, making depth unrecoverable.
Constraint/Solution (3 pts): To constrain the scale in a single view, you must use prior knowledge about the exact physical size of the object in the scene. For example, if you know the standard physical dimensions of a stop sign (or a mailbox/traffic light), you can use its apparent size in the image to compute the scale and estimate its depth.

(B) Disparity and Depth (6 pts)
Relationship (3 pts): Disparity is inversely proportional to depth. The depth equation is $Z = \frac{f \cdot T}{d}$.
Application (3 pts): Because disparity is inversely proportional to depth, if a pedestrian moves closer to the cameras (depth $Z$ decreases), their disparity ($d$) will increase. They will appear to move a greater distance across the visual field between the two cameras.

(C) Epipolar Geometry & Correspondences (8 pts)
Epipolar Constraint Definition (4 pts): The epipolar constraint states that for any given point in the first image, its corresponding matching point in the second image must lie somewhere along a specific 1D line, known as the epipolar line. * Complexity Reduction (4 pts):* This reduces the correspondence matching problem from a computationally expensive 2D search across the entire second image to a much simpler 1D search strictly along the epipolar line.

(D) Image Rectification (4 pts)
Name (2 pts): Stereo image rectification.
Simplification (2 pts): Rectification forces the epipoles to infinity and makes the image planes parallel. This means you no longer have to search along angled epipolar lines; instead, you only need to search horizontally along perfectly aligned parallel image scanlines.


⬅️ [Question 3 example 2](<./Question 3 example 2.md>) | ⬆️ [Generated Test Questions](<./README.md>) | [Question 2 example 2](<./Question 2 example 2.md>) ➡️