Project 5

aabouche

Given an animated sequence of image, it is possible to track features and construct a 3d model from their motion. Harris corner detector and optical flow are used to identify the movement of certain points. Matrix manipulation for orthogonal projection lets you estimate the 3d coordinates of the features points.


Selecting points

Corners are the best feature points to track because if they move in any direction the patch containing the corner will change. With edges, motion parallel to the edge is not identifiable. Motion over a solid color patch has no changes in any direction . Harris corner detector is run over the first image from the sequence. The strongest, most cornerlike, features are selected. Next, any point that is too close to the border will be removed. All points that were 7 pixels within the border were removed from the selection. About 444 feature points remained in the selection (plotted below).

Optical Flow

Next, the optical flow of the image over time is calculated for each pixel. Points near each other have similar motion, thus neighboring pixels should affect each others optical flow. The optical flow is calculated for two consecutive images.

First the x and y gradients of the image are calculated. The temporal gradient is acquired by subtracting the second image from the first image. To save time in the loop over the pixels the matrices, Ixx, Iyy, Ixy, Iyx, Ixt, and Iyt are calculated ahead of time. Each of these derivatives is really a matrix with each element corresponding to the box filter applied to the product of the derivatives of the original images' pixels. the box filter used is 15x15 pixels.

Next, each pixel is looped over. Inside the loop u and v, the flow of x and y, are calculated for that pixel.
[u v]=(1/(Ixx*Iyy-Ixy*Ixy))*[Iyy -1*Ixy;-1*Ixy Ixx]*-1*[Ixt; Iyt]
Below the optical flow is displayed for each consecutive pair of images. The red parts of the image travel the most and the blue the least. The noisy areas is where no motion actually occurs.

Tracking Features

The optical flow between each frame is added to the chosen features. The animation below shows where features should fall according to the optical flow from above. The Tracking appears to work fairly well for the corner points on the house. The features selected in the lower left were not tracked very well. Probably their distance traveled over consectuve frames was too great to be tracked efficiently with the flow field. Or their neighbors did not travel enough for the optical flow of 15x15 boxes to be correct.


Since the box filter for flow field is of size 15x15 pixels, points that fall within 7 pixels of the border do not have a flow vector computed for them (not enough neighbors). So if a feature point travels into the margins or off the edge of the image, it is discarded. The feature points in the image below were discarded points. If you observe the animation above they travel close to or over the edge.

Structure From Motion

To find the 3D coordinates of the points from their position over the sequence of N frames the features are first centered for each frame. The center of the features in each frame is subtracted from each of their coordinates. Next, matrix D is constructed. D=[FeaturesY; FeaturesX] Wherein each features vector contains one coordinate (x or y) for each image feature that was tracked. Another dimension records the coordinate over the sequence of images.

Using singular value decomposition, matrices U, W and V are found from D. The Motion (M) and Shape (S) matrices are acquired from parts of U, V, and W. However, the factorization is not unique so W=M1*S=M1*A*inv(A)*S=M2*S. That is, for any invertible matrix A, another motion matrix (or shape matrix) could attain the same results. To make sure that M and S are the right M and S (M2 and S2), an invertible matrix B is found such that M2=M*S and S2=inv(A)*S. Under six metric constraints C=B*inv(B) is attained. Factorizing C under SVD will provide matrix B that will correct M and S.