Hang Su
Email:
hangsu@cs.brown.edu
Dec.
9th, 2011
|
=> |
|
Figure1 4
frames from image sequence |
|
Figure2
Reconstructed 3d structure (mirror image, correct solution see README) |
The
mission in this project is to reliably track the key points in an image sequence
and recover the corresponding 3D structure. The whole pipeline is consist of three major components: keypoint selection, feature tracking, and structure from motion.
Keypoints
are selected using Harris corner detection. For the KLT tracker used in this
project, Harris corners are optimal because both of them rely on the Hessian
matrix of local gradient has two big eigenvalues, which suggests significant
appearance changes in perpendicular directions. I choose first 500 Harris
corners as keypoints, Figure3 shows
them overlaid over the first frame. Other choices of keypoints are tried and
compared in section 4.
Figure3
500 Harris corners overlaid over the first frame (Sizes indicate the strength)
I implemented a Kanade-Lucas-Tomsi(KLT) tracker for the
keypoints detected in last step. KLT tracker uses three assumptions to make
point tracking a solvable problem. Using KLT tracker to find corresponding
pixels between successive frames, we can get optical flow for each frame. (Figure4 Figure5)
|
|
Figure4 Optical flow visualization |
Figure5 Motion magnitudes |
Finding corresponding keypoints between
successive frames we get 2D path of each keypoints through the whole sequence.
(Figure6) Some keypoints move out of
frame at some point along the sequence. (Figure7)
|
|
|
Figure6 2D path of 20 random keypoints |
Figure7 Keypoints move out of frame in the sequence |
Though not apparent in this sequence,
vast motion cannot be tracked reliably using only KLT tracker. I implemented iterative refinement method and pyramid KLT tracker to solve this
problem. Performance comparisons are shown in section 4.
The tracked keypoints provide stereo
information with which we can reconstruct the scene’s 3D model. Here we use the
method described in [1] & [2].
|
|
|
|
|
|
Figure8 Estimated 3d points (upper row) and fitted model
(lower row) in three views. Red lines in lower figures show the estimated
camera direction for each frame. |
We don't have a camera position because
we're using an orthographic camera model, but we do have camera direction.
Camera directions are plotted as a line in Figure8.
Figure9 shows the normalized camera
direction in each dimension.
|
Figure9
Camera position estimation |
4.1 Other keypoint selection
methods
[3] claims restricting smaller eigenvalue to be bigger than
certain threshold can extract more reliable keypoints than the restriction used
in Harris corner detection when these keypoints are used for tracking. As shown
in Figure10, the detection results
of Harris corner and Good features to track are quite similar,
in sense of both keypoint locations and their strength, so are the
reconstruction results.
|
|
|
Figure10 Comparison between Harris corner and Good
features to track |
Other types of features such SIFT are
not appropriate for this task. (Figure11)
|
Figure11
SIFT features |
4.2
Iterative refinement
Original KLT tracker uses only
first-order Taylor approximation. Iterative refinement can be used to eliminate
error introduced through this. The algorithm is as below: (taken from lecture
slide)
I use the termination condition that the
average change magnitude is larger than 90% of last round. In Figure12 I show some comparison between
original results and results using iterative refinement. It’s apparent that the
tracked keypoints are much more reliable with iterative refinement.
First frame |
Last frame |
3D structure |
|
|
|
|
|
|
Figure10 Comparison between original KLT tracker (upper
row) with refined version using iteration (lower row) |
4.3
Pyramid KLT tracking
Using Pyramid can improve the tracking
accuracy when there are large movements in the sequence. The framework is shown
is Figure11.
|
|
Figure11
Pyramid tracking (figure taken from Learning
OpenCV, OReilly 2008) |
|
Our dataset doesn’t have large movements.
In order to test the performance of pyramid tracker, I use only half of the
images in the sequence to make the sequence sparser. Figure12 shows the comparison between original tracker and pyramid
tracker on this sparse sequence.
|
First frame |
Last frame |
Without pyramid |
|
|
With pyramid |
|
|
Figure12 Pyramid LKT traker |
[1] C. Tomasi, and T. Kanade. Shape
and motion from image streams under orthography: a factorization method.
In IJCV, 1992.
[2] T. Morita and T. Kanade.
A sequential factorization method for recovering shape and motion from
image streams. In PAMI, 1997.
[3] J. Shi and
C. Tomasi.
Good Features to Track. In Proc. CVPR, 1994.
Last update: Dec. 9th, 2011