Problem 1 --------- For checking your fitting code: -Make a few points that are correct by construction (e.g., generate a few points at random and then translate them by some vector t) and fit that first. Make sure the correct transformation is recovered. If it doesn't work on this data, then it's wrong. -Add noise (eg np.random.normal) to your points and make sure you get roughly the same thing. -When checking whether a homography is correct, remember that homographies are up to scale because they map homogeneous coordinates. You can get a "canonical" homography by dividing by the last element. -Make sure you divide by that last coordinate! -You can compute the homography by computing the eigenvector with minimum eigenvalue, but make sure when you get the eigenvectors, you pay attention to which dimension they're stored in: it's the ith column not the ith row for np.linalg.eig. You can also get it with SVD. You may find the following to be useful supplemental material: https://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf and Szeliski Appendix A.2 Problem 2 --------- General hints: a) One easy debugging method is to take an image, call it I, then translate it 20 pixels over with np.roll. You should then be able to get a homography that corresponds to 20 pixels over. You should also get perfect matches (apart from where the roll is). b) If you don't want to tear your hair out from tedium, you may want to cache intermediate results as each part completes things. If you're comfortable with Jupyter, this may be easy for you. If you're not using Jupyter, you can cache the descriptors in a pickle file that you save per image. For example, you may want to do something like (in pseudocode): #sorry -- corners and their descriptors are typically called keypoints #but knowing this might help you read Szeliski and other resources. def loadKeypoints(filename): if os.path.exists(filename+".keys.pck"): return pickleload(filename+".keys.pck") Image = loadImage(filename) keypoints = computeKeypoints(Image) pickle(keypoints, filename+".keys.pck") return keypoints You can cache the matches similarly. See https://docs.python.org/3/library/pickle.html if you've never used pickles before. c) For doing this, as we suggest functions to verify your implementation against. However, you should also be aware of the fact that if your parameters are wrong, the implementation might be totally correct but work poorly. General strategies for picking the parameters: -Remember that the local feature matches are based on local features, so you should expect there to be intrinsic noise. -It's easier to remove an outlier than to deal with too few inliers, so be generous in including matches in RANSAC -The pixels line up very nicely when the image are aligned, so your definition of inlier should reflect this close alignment (ok maybe not 1 pixel) -Make sure you understand the units when setting the threshold. A threshold in pixels^2 is not at all the same as a threshold for pixels. Specific hints: 2) You should use SIFT from OpenCV to start. This will make your life easier. Using your own detector adds a confounding factor that will make this more satisfying when it works, but which will also make it more difficult. 3) Again, SIFT from OpenCV will make your life easier. If you've been having issues with the homework, definitely use SIFT. Again, your own descriptor is really satisfying but again, more challenging. 4) Start with standard euclidean distance. You can do this brute-force with a loop through a list. 5) Nearest to second nearest neighbor ratio recap: Given a list of distances D (a list of floats) from a keypoint in image 1 to all the keypoints in image 2, you can compute the ratio as: D.sort(); ratio = D[0] / D[1] You should check that a few of these matches make sense before proceeding. Plot them. For structuring your code, you may want to write write a function of the form: def matchK1toK2(D1,D2): #given descriptors D1 and descriptors D2, return indices #[(i,j)] where D1[i] matches D2[j] #do both matching and verification in one step return [(i,j) ....] 6) For debugging your implementation: open up the image in some sort of image editor that lets you see what pixel your cursor is over (eg mspaint). Find four matches yourself. Then: -Your implementation of fitting H should have zero error if you fit H to the four matches. But you should have checked this in problem 1 but it's good to verify. -Your fit H should have many matches in terms of pixels within some threshold. If they don't, then your estimator is wrong, your measurement of error per-pixel is wrong. For RANSAC, it's probably better to start out with too many iterations rather than too few. Figure out how many you think you want and then just double it. 7) Make sure you use the correct transformation for this: the homography from image 1 to image 2 is the inverse of the homography from image 2 to image 1, so if you use the wrong one you'll have issues. 8) You can compute how big it should be by warping the corners (i.e., (0,0), (0,W), (H,0), (H,W) or at the boundaries of the image, not harris corners) of the image that you're warping from. Then find the min() and max() of all the corner locations of all the images. (i.e., of the coordinates of image 1 and the warped corners from image 2). Problem 3 --------- General advice: -As the homework suggests, you want to just compute the keypoints once. -If you want to keep track of whether a pixel is "valid" and coming from a picture or was created by the border fill in, you can create a mask with all ones and warp that: anything that's one is definitely from the image; anything that's zero is definitely a fill-in value; anything in between you can decide what to do with. 1) You can use your keypoint matcher code here. There aren't magical thresholds (which is true in vision in general) so you'll have to play with settings. 2) You can do this problem in one of two ways: method (a) is probably simplest but you may find method (b) simpler depending on how you think. a) You can do this recursively. You just need to have function that merges two images and their keypoints. The image is as before, and you'll have to just do bookkeeping to make sure that the location of the features/corners/keypoints after you merge them. You may also need to do bookkeeping to keep track of whether a pixel is an actual original pixel or the black fill-in value from the warping. b) You could also align them in one go to one image. i) Align all of the images that can be aligned to each other: you should have homographies that relate each image to each other image. ii) Use the homographies to find a transformation from each of the images to a single coordinate frame. You can pick any image arbitrarily as this coordinate frame. Remember you can chain homographies by multiplying the matrices. Some of the corners of the images may warp to negative values. You can make a final coordinate frame by also applying a translation to all the homographies so all the image corners (i.e., (0,0), (0,W), ...) warp to positive values. Again remember that applying one homography after another involves matrix multiplication. ii) Then warp all the images and merge them. Problem 4 --------- This is meant to be something where you can copy/paste a formula from the slides and get an answer. For picking thresholds/checking quality of the fitted plane you may want to consider the number of points in the plane. Remember that each point is a pixel in the image. What fraction of the image is it? When setting this threshold, remember that a rectangle that's 10% of the image on either side is 1% of the pixels in the image.