Tracking and Structure from Motion

Implemented by Andrew Scheff (ajscheff) for CSCI1430, Fall 2011

"It is better to be blind than to see things from only one point of view."
-Sabrina Jeffries, Romance Novelist

In this assignment, we were given the task of implementing a system for determining the 3D structure of an object and a possible camera path from a series of 2D video frames. This assignment can be broken down into 3 parts: finding interest points, tracking those features throughout the video, and then determing 3D structure from those tracked points.

Finding Interest Points

I used the provided Harris corner detector to find interest points in the first frame, and then took the 500 strongest of those interest points. Below they are shown on the first video frame:

Tracking Interest Points

To track those interest points over the video, we cannot simply find interest points again on every frame, because they might not match up pair-wise as we would like. Instead, we need to compute optical flow between each pair of frames, and use the optical flow values to move interest points across the video. I implemented the Kanade-Lucas-Tomasi tracker as described in the handout. For each pixel, it uses a first order Taylor approximation of the change in image intensity in a region around that pixel and minimizes the difference between that and the actual change in intensity from one frame to the next in the same region. The solution to this minimization is the direction that a given pixel moves from one frame to the next.

Once optical flow is computed for every pair of frames, we can track the movement of the Harris corner features from the first frame as they move across the image by simply looking up optical flow and moving the feature positions for each frame. Below are the paths that 20 random tracked features take across the video laid over the first frame.

One issue is dealing with points that move off the edge of the image as they are tracked with optical flow. I chose to just get rid of those features, below is an image that shows all the features that I removed and their paths that led them off the edge.

Structure from Motion

I implemented the structure from motion algorithm exactly as it's given in the stencil code. For eliminating affine ambiguity, I used the factorization method given in A Sequential Factorization Method for Recovering Shape and Motion from Image Streams without any differences. I got decent results which are shown below.

Thanks for the great semester. I had a lot of fun!