Project 5: Structure from Motion

Hari Narayanan

Goals

This project will recover the 3D structure of an object by analyzing a video of its movement. The basic pipeline consists of corner detection, point tracking, and finally a structure from motion algorithm.

Algorithm

The first step of the process is to select a number of keypoints and track them through the animation. We use a Harris corner detector to select feature points, and use the Kanade-Lucas-Tomasi algorithm to track them. To do this, we need to compute optic flow between consecutive frames of the animation. This process makes three central assumptions about the input:

Brightness constancy: a point has the same brightness in each frame it appears in.
Smoothness: a point doesn't move far from its position in the previous frame.
Spatial coherence: each point's neighbors move in a similar direction and magnitude as itself

Optic flow

Consider a point (x,y,t). By the brightness constancy assumption,

However, we can't solve this equation as is, so we make a linear approximation:

As we have two unknowns and only one equation, we use spatial coherence and assume that each point in a 30x30 neighborhood of (x,y) also moves with displacement <u,v>, giving us 225 equations and two unknowns. We can use the least squares projection to find an approximate solution for u and v. This is the optic flow of the animation at point (x,y,t).

Point tracking

Treating optic flow as a vector field, we can use Newton's method to figure out where each interest point is in the next frame. Since points aren't necessarily integers when we compute their displacement, we need to use interpolation. Finally, if a point moves out of the boundary of the image, we remove the entire point from the tracker.

Structure from motion

This section of the project follows the method described in Morita and Kanade, 1997. We build a 2F x P measurement matrix and decompose it into the product of a rotation matrix and M and a structure matrix S.

Notes

There were a number of nodes outside of the hotel structure indicated as interest points, so I pruned these points out before tracking.

Results

Plot of 20 random points and their tracking paths:

Plot of the points that go off-camera and their tracking paths:

Three different views of the 3D structure of the object:

X, Y, and Z plots of camera movement over time: