Optical Flow and Structure From Motion

John Thickstun

We implement a structure from motion algorithm by computing the optical flow of an video across multiple still frames, and then using this flow to derive...

We first run the Harris corners algorithm on the video's first frame to collect a set of keypoints that we will track through through successive frames:

We then compute optical flow by taking the spacial and temporal derivatives of our video. For each pixel in each frame, we compute an x/y gradient that best describes the local gradient (within a small window) of the image, as determined by leat squares. Using this optical flow data, we can compute per-frame updates for our initial keypoints, tracking them through successive frames:

Some keypoints exit the boundaries of our video. These points are discarded:

To compute structure from motion, we then combine these tracked keypoints into a measurement matrix. A singular value decomposition of this matrix allows us to compute a set of 3D coordinates and a camera matrix. Triangulating these coordinates and texture-mapping the original image onto them produces a 3D mesh:
We then compute the cross product of the x/y cameras to plot the z camera vector as it changes per frame: