3D reconstruction of two calibrated cameras - where is the error in this pipeline?

Question

3D reconstruction of two calibrated cameras - where is the error in this pipeline?

There are many reports of 3D reconstruction from stereo images of known internal calibration, some of which are excellent . I read a lot of them, and based on what I read, I am trying to calculate my own reconstruction of a 3D scene using the pipeline / algorithm below. I will outline the method and then ask specific questions below.

0. Calibrate cameras:

This means getting camera calibration matrices K ₁ and K ₂ for cameras 1 and camera 2. These are 3x3 matrices encapsulating the camera’s internal parameters: focal length, main point / image offset center. They do not change, you need to do this only once, for each camera, if you do not increase or change the resolution that you record.
Do it offline. Do not argue.
I use OpenCV CalibrateCamera() and the checkerboard routines, but this functionality is also included in the Matlab Camera Calibration Toolbar . OpenCV routines seem to work well.

1. The fundamental matrix F:

Now that your cameras are set up as stereo. Define the base matrix (3x3) of this configuration using point correspondences between two images / views.
How you get the match is up to you and will greatly depend on the scene itself.
I use OpenCV findFundamentalMat() to get F, which provides several method methods (8-point algorithm, RANSAC, LMEDS).
You can test the resulting matrix by connecting it to the defining equation of the fundamental matrix: x'Fx = 0 where x 'and x are the correspondence of the image source points (x, y) in uniform coordinates (x, y, 1) and one of the three-vectors are transposed so that multiplication makes sense. The closer to zero for each match, the better F obeys it. This is equivalent to checking how well the derived F actually maps from one plane of the image to another. I get an average deviation of ~ 2px using an 8-point algorithm.

2. The essential matrix E:

Calculate the main matrix directly from F and the gauge matrices.
E = K ₂ ^T FK ₁

3. Internal restriction on E:

E must be subject to certain restrictions. In particular, if SVD is decomposed into USV.t , then its special values must be = a, a, 0 . The first two diagonal elements of S must be equal, and the third zero.
I was surprised to read here that if it is not, when you test it, you can choose to make a new Essential matrix from the previous decomposition: E_new = U * diag(1,1,0) * Vt , which, of course, guaranteed to comply with this limitation. You essentially set S = (100,010,000) artificially.

4. Full-camera projection matrices:

There are two camera projection matrices P ₁ and P ₂ . This is 3x4 and obey the relation x = PX . In addition, P = K[R|t] and therefore K_inv.P = [R|t] (where the camera calibration has been deleted).
The first matrix P ₁ (with the exception of the gauge matrix K) can be set to [I|0] , then P ₂ (excluding K) is R|t
Calculate the rotation and translation between the two cameras R, t from the decomposition E. There are two possible ways to calculate R ( U*W*Vt and U*Wt*Vt ) and two ways to calculate t (± the third column U), which means that there four combinations of Rt , only one of which is valid.
Calculate all four combinations and select the one that geometrically corresponds to the situation when the reconstructed point is in front of both cameras. I actually do this by conducting and calculating the resulting P ₂ = [R | t] and triangulating the three-dimensional position of several correspondences in normalized coordinates to ensure their positive depth (z-coordinate)

5. Triangulation in 3D

Finally, combine the reconstructed 3x4 projection matrices with their respective gauge matrices: P ' ₁ = K ₁ P ₁ and P' ₂ = K ₂ P ₂
And we triangulate the 3-dimensional coordinates of each correspondence to 2d points, respectively, for which I use the LinearLS method from here .

QUESTIONS:

Are there any gaps and / or errors in this method?
My F-matrix is apparently accurate (a deviation of 0.22% in comparison with typical coordinate values), but when testing E against x'Ex = 0 using normalized image matches, the typical error in this comparison is> 100% normalized coordinates themselves. Is the check E against xEx = 0 valid, and if so, then where is the jump in error coming from?
The error in my fundamental matrix estimation is much worse when using RANSAC than the 8pt algorithm, ± 50px in the mapping between x and x '. It bothers me deeply.
“Fulfilling the internal constraint” is still very strange with me - how can I just correctly create a new Essential matrix from the decomposition of the original?
Is there a more efficient way to determine which combination of R and t to use than to calculate P and triangulate some normalized coordinates?
My last re-projection error is hundreds of pixels in 720p images. Perhaps I'm looking at problems in calibration, P-matrix determination, or triangulation?

+6

opencv computer-vision camera-calibration 3d-reconstruction

s-low Jul 15 '15 at 12:56

source share

1 answer

who9vy · Accepted Answer · 2015-08-03T07:45:03+0000

The error in my fundamental assessment is much worse when using RANSAC than the 8pt algorithm, ± 50px when comparing between x and x '. It bothers me deeply.

Using the 8pt algorithm does not preclude the use of the RANSAC principle. When using the 8pt algorithm directly, which points do you use? You must choose 8 (good) points yourself.

In theory, you can calculate the fundamental matrix from any point correspondences, and you often get a degenerate fundamental matrix, because linear equations are not independent. Another point is that the 8pt algorithm uses an overridden system of linear equations, so that a single outlier destroys the fundamental matrix.

Have you tried to use the RANSAC result? I am sure this is one of the right solutions for F.

My F-matrix is apparently accurate (0.22% deviation in the display compared to typical coordinate values), but when testing E against x'Ex = 0 using normalized image matches, the typical error is that the display is> 100 % of the normalized coordinates themselves. Is the check E against xEx = 0 valid, and if so, then where is the jump in error from?

Again, if F is degenerate, x'Fx = 0 can be for every point correspondence.

Another reason for incorrect E may be the camera switch (K1T * E * K2 instead of K2T * E * K1). Remember to check: x'Ex = 0

“Fulfillment of the internal constraint” is still very strange with me - how can one justify just making a new core matrix from part of the decomposition of the original?

This is explained in "Multidimensional Geometry of Vision in Computer Vision" by Hartley and Sisserman. As far as I know, this is due to minimizing the Frobenius F.

You can use Google, and there are pdf resources.

Is there a more efficient way to determine which combination of R and t is than calculating P and triangulating some normalized coordinates?

No, as far as I know.

My last re-projection error is hundreds of pixels in 720p images. Am I probably looking at problems in calibration, P-matrix or triangulation?

Your solid body transform P2 is wrong because E is wrong.

3D reconstruction of two calibrated cameras - where is the error in this pipeline?

More articles: