CS 180 Programming Project 4

Part A

Shoot the Pictures

(Please refer to the pictures in the section Blend the images into a Mosaic)

Recover Homographies

To recover the homographies, we need to map between the two sets of points. We construct the system of linear equation to form AH = b by utilizing the source and target coordinates. To estimate the solution, we use the least-square to get the homography vector H and then shape it to 3x3 matrix.

Warp the Images

To wrap the images, we use a similar approach as in the previous project 3. Basically, we use meshgrid to generate the x and y coordinates. Then, we convert the sample points into the homography coordinates and perform the dot product with the homography matrix.

Since the result of the multiplication is [wx, wy, w], we need to normalize it to [x, y, 1] and drop the last term. From this, we're going to use scipy's map_coordinates function to sample the points from the source image to map to its new coordinates.

Image Rectification

To rectify an image, we first need to calculate the transformation matrix of the corresponding points (manually selected pairs). The H matrix will transform the coordinates x and y into (xw, yw, w). After utilizing the normalization, we can get a transformed perspective.

To retrieve the H matrix, we uses the least-square approach to calculate the estimation. In addition, we also modified the wrapping function from Project 3 to perform the sampling in the target image space from the source image. The key difference is that we need to avoid sampling if the corresponding point lies outside the source image domain.

Tennis Court Examples

Original Image	Rectified Version
Tennis Court (Picture taken by myself)	Rectified Tennis Court

Tennis Court 2 (Picture from the Internet)	Rectified Tennis Court 2

We can see that because the first image was taken from a stronger perspective, the edges appear slightly blurry. However, the rectification generally makes sense.

Blend the images into a Mosaic

In this part, we use one picture as the target image and transform the other images to match the perspective of the target. Additionally, we add padding to the canvas for better results.

For image blending, we utilize the distance transform to assign more weight to pixels that are farther from the edges. The key idea is that the edges of each picture are likely to overlap with other images, so they should carry less weight. To blend the images based on distance, we use the following formula:

$$ \text{blended_pixel} = \frac{\text{distance1} \times \text{source} + \text{distance2} \times \text{target}}{\text{distance1} + \text{distance2}} $$

Image 1

Image 2

Image 3

Part B Feature Matching

In the second part, we aim to implement feature matching for autostitching. Building on the essential steps outlined in the MMOP paper, we will create an autostitching system. This write-up includes intermediate results for each step and details of the implementations.

Corner Detection

To find the features, we can firstly use the Harris Interest Point Detector. We directly use the course's sample code for the interest points extraction.


All points (over 20k)	Top 300 points

The implementation of the Harris Interest Point Detector produces over 20,000 interest points, which is impossible for performing feature mapping on all detected points. To address this, we can extract only the top k points based on their strength or likelihood of representing an edge. However, as shown in the graph on the right, the Harris interest points are not evenly distributed spatially. This uneven distribution makes Adaptive Non-Maximal Suppression (ANMS) particularly useful in selecting key points with better spatial coverage.

Adaptive Non-Maximal Suppression (ANMS)

For me, reading the MMOP paper can sometimes be confusing, but the ANMS algorithm is quite powerful in ensuring that interest points are evenly distributed across the space. The paper introduces the concept of a suppression radius. Within the suppression radius, no other interest points should be taken (it's like selecting the local maximum). To find the suppression radius, we search within the available interest points to extract a subset of points with a higher h value. If this subset is empty, we assign infinity as the suppression radius.

In practice, we sort the suppression radii, with larger values indicating more important points locally. We then select the top k points from this sorted list and include them in our set.

The following is result when we use c_robust = 0.9 and only extract top 30 interest points. We can see that the distribution is much more even across the image.

Feature Descriptor Extraction

The feature descriptor is simply an 8x8 patch around our interest points. One useful trick is to perform downsampling first. We can use transform.resize with anti-aliasing enabled. This allows the feature descriptor to cover a larger region, making it more robust.

Feature Matching

For each feature in the source image, we aim to find a matching feature in the target image. Similarity is measured using the L2 distance.

As outlined in the MMOP paper, we need to examine the ratio between the best match and the second-best match. To ensure accuracy, the best match should be significantly better than the second-best candidate, thereby increasing the likelihood of identifying correct matches. Additionally, to prevent multiple features from the source image from being mapped to the same feature in the target image, we use a hashset to store all previously matched features.

Here's the results when we choose r = 0.7:

4-Point RANSAC

To further reduce the errors, we're going to implement the RANSAC algorithm. Following the steps:

Randomly choose 4 points
Estimate the homography based on these 4 points
Predict on the rest of pair of correspondences, calculate the number of inliers. Keep the largest inliers set.

Repeat 1-3 4. Recalculate the homography using all inliers.

Result

Here's the result:

Image 1

Auto	Manual

Image 2

Auto	Manual

Image 3

Auto	Manual

As we can see, it produces very similar results. Our feature matching for autostitching performs relatively well on our dataset.

What have you learned?

I've learned the concept of selecting good features. Sometimes, absolute metrics don't make sense; for example, if we manually set a threshold for the L2 distance in feature matching, it can be challenging to adapt it across different images. Instead, we compare the first match to the second match. This same idea applies to the RANSAC algorithm, where we use a voting mechanism to select the best matching pairs. Understanding these algorithmic approaches has really improved my understanding of optimization and feature extraction.