Skip to content

Question 4 example 2

⬆️ [Generated Test Questions](<./README.md>) | [Question 4 example 1](<./Question 4 example 1.md>) ➡️

Problem 4: Integration and System Design (25 pts)

Instructor Note: As hinted in the exam review, this question is highly unconstrained and requires a conceptual understanding of the entire course. There are many possible correct answers, provided you can mathematically and conceptually justify your design choices using the techniques discussed this semester.

Scenario:
Imagine you have been hired by a robotics startup to develop a visual navigation system for an autonomous package-delivery drone operating on the University of Michigan's North Campus. The drone is equipped with a stereo pair of uncalibrated cameras.

To successfully deliver a package, the drone's vision system must simultaneously accomplish three tasks:
1. Landing Zone Segmentation: Identify and isolate a clear, obstacle-free landing zone (e.g., a grassy area or sidewalk) from the cluttered background of moving students, trees, and buildings.
2. 3D Depth & Positioning: Determine the drone's 3D position and depth relative to the ground and surrounding obstacles using the two uncalibrated cameras.
3. Building Recognition: Recognize which specific building it is looking at to confirm it has reached the correct delivery address, despite massive variations in lighting and weather.

Your Task:
Describe an end-to-end pipeline to accomplish these three tasks. For each of the three tasks, you must:
Identify the specific algorithm(s), mathematical models, or network architectures from this course that you would use.
Provide a conceptual justification for why your chosen method is appropriate for this unconstrained outdoor environment.
* State at least one potential failure case or limitation of your chosen approach.


Solution / Grading Rubric:

Because this question is open-ended, graders will accept a wide variety of answers as long as they correctly apply course concepts. Below is an example of an ideal, full-credit response:

1. Landing Zone Segmentation (8 pts)
Algorithm: Formulate the segmentation as a discrete labeling problem (Foreground vs. Background) using a Markov Random Field (MRF) and solve it using the Max-Flow Min-Cut algorithm. * Justification: Local patch-based representations alone can group disconnected regions, but graph-based methods enforce spatial continuity. By defining an energy function $E(f)$ with a unary term (evaluating if a pixel's color matches the expected grass/concrete) and a pairwise term (penalizing boundaries that do not align with visible image edges), we can robustly extract the landing zone. The Max-Flow Min-Cut theorem guarantees finding the global minimum of this complex energy function.
Failure Case: The method may fail if the foreground (landing zone) and the background obstacles share highly similar color or intensity, neutralizing the pairwise boundary penalty.

2. 3D Depth & Positioning (9 pts)
Algorithm: Because the cameras are uncalibrated, we cannot use the Essential matrix; instead, we must estimate the Fundamental matrix ($F$) using the Normalized Eight-Point Algorithm. We would extract robust local features like SIFT or Harris Corners and match them between the two views. Once $F$ is found, we can mathematically align the scanlines (stereo image rectification) and compute disparity.
Justification: The Normalized 8-Point Algorithm is required to solve the algebraic epipolar constraint ($p'^T F p = 0$) for uncalibrated cameras. Normalizing the points (centering and scaling) is crucial before using SVD, otherwise the linear system will be highly ill-conditioned. After finding correspondences, we calculate depth ($Z$), knowing that disparity is inversely proportional to depth.
Failure Case:* The aperture problem could cause issues if the features lie on flat regions or straight edges with ambiguity. Furthermore, single-view projection inherently suffers from scale ambiguity, so without an object of known physical size in the scene, the drone can only recover depth up to a scale factor.

3. Building Recognition (8 pts)
Algorithm: We would pass the segmented building region through a deep Convolutional Neural Network (CNN), such as ResNet or VGG16, using a softmax linear classifier at the end to output class scores for different North Campus buildings. * Justification: Deep learning architectures like ResNet are ideal here because linear classifiers fail when class boundaries are complex and non-linear. Convolutional layers provide shift invariance, while Max Pooling safely condenses spatial scales. Methods like PCA/Eigenfaces would be a poor choice because they maximize total scatter (including lighting variations), making them not optimal for discrimination in changing outdoor weather.
Failure Case: Deep networks require massive amounts of annotated training data and can fail to generalize if the training set does not capture the specific seasonal or lighting variations the drone experiences in real-time.


⬆️ [Generated Test Questions](<./README.md>) | [Question 4 example 1](<./Question 4 example 1.md>) ➡️