3D reconstruction of two calibrated cameras - where is the error in this pipeline?

There are many reports of 3D reconstruction from stereo images of known internal calibration, some of which are excellent . I read a lot of them, and based on what I read, I am trying to calculate my own reconstruction of a 3D scene using the pipeline / algorithm below. I will outline the method and then ask specific questions below.

0. Calibrate cameras:

  • This means getting camera calibration matrices K 1 and K 2 for cameras 1 and camera 2. These are 3x3 matrices encapsulating the camera’s internal parameters: focal length, main point / image offset center. They do not change, you need to do this only once, for each camera, if you do not increase or change the resolution that you record.
  • Do it offline. Do not argue.
  • I use OpenCV CalibrateCamera() and the checkerboard routines, but this functionality is also included in the Matlab Camera Calibration Toolbar . OpenCV routines seem to work well.

1. The fundamental matrix F:

  • Now that your cameras are set up as stereo. Define the base matrix (3x3) of this configuration using point correspondences between two images / views.
  • How you get the match is up to you and will greatly depend on the scene itself.
  • I use OpenCV findFundamentalMat() to get F, which provides several method methods (8-point algorithm, RANSAC, LMEDS).
  • You can test the resulting matrix by connecting it to the defining equation of the fundamental matrix: x'Fx = 0 where x 'and x are the correspondence of the image source points (x, y) in uniform coordinates (x, y, 1) and one of the three-vectors are transposed so that multiplication makes sense. The closer to zero for each match, the better F obeys it. This is equivalent to checking how well the derived F actually maps from one plane of the image to another. I get an average deviation of ~ 2px using an 8-point algorithm.

2. The essential matrix E:

  • Calculate the main matrix directly from F and the gauge matrices.
  • E = K 2 T FK 1

3. Internal restriction on E:

  • E must be subject to certain restrictions. In particular, if SVD is decomposed into USV.t , then its special values ​​must be = a, a, 0 . The first two diagonal elements of S must be equal, and the third zero.
  • I was surprised to read here that if it is not, when you test it, you can choose to make a new Essential matrix from the previous decomposition: E_new = U * diag(1,1,0) * Vt , which, of course, guaranteed to comply with this limitation. You essentially set S = (100,010,000) artificially.

4. Full-camera projection matrices:

  • There are two camera projection matrices P 1 and P 2 . This is 3x4 and obey the relation x = PX . In addition, P = K[R|t] and therefore K_inv.P = [R|t] (where the camera calibration has been deleted).
  • The first matrix P 1 (with the exception of the gauge matrix K) can be set to [I|0] , then P 2 (excluding K) is R|t
  • Calculate the rotation and translation between the two cameras R, t from the decomposition E. There are two possible ways to calculate R ( U*W*Vt and U*Wt*Vt ) and two ways to calculate t (± the third column U), which means that there four combinations of Rt , only one of which is valid.
  • Calculate all four combinations and select the one that geometrically corresponds to the situation when the reconstructed point is in front of both cameras. I actually do this by conducting and calculating the resulting P 2 = [R | t] and triangulating the three-dimensional position of several correspondences in normalized coordinates to ensure their positive depth (z-coordinate)

5. Triangulation in 3D

  • Finally, combine the reconstructed 3x4 projection matrices with their respective gauge matrices: P ' 1 = K 1 P 1 and P' 2 = K 2 P 2
  • And we triangulate the 3-dimensional coordinates of each correspondence to 2d points, respectively, for which I use the LinearLS method from here .

QUESTIONS:

  • Are there any gaps and / or errors in this method?
  • My F-matrix is ​​apparently accurate (a deviation of 0.22% in comparison with typical coordinate values), but when testing E against x'Ex = 0 using normalized image matches, the typical error in this comparison is> 100% normalized coordinates themselves. Is the check E against xEx = 0 valid, and if so, then where is the jump in error coming from?
  • The error in my fundamental matrix estimation is much worse when using RANSAC than the 8pt algorithm, ± 50px in the mapping between x and x '. It bothers me deeply.
  • “Fulfilling the internal constraint” is still very strange with me - how can I just correctly create a new Essential matrix from the decomposition of the original?
  • Is there a more efficient way to determine which combination of R and t to use than to calculate P and triangulate some normalized coordinates?
  • My last re-projection error is hundreds of pixels in 720p images. Perhaps I'm looking at problems in calibration, P-matrix determination, or triangulation?
+6
source share
1 answer

The error in my fundamental assessment is much worse when using RANSAC than the 8pt algorithm, ± 50px when comparing between x and x '. It bothers me deeply.

Using the 8pt algorithm does not preclude the use of the RANSAC principle. When using the 8pt algorithm directly, which points do you use? You must choose 8 (good) points yourself.

In theory, you can calculate the fundamental matrix from any point correspondences, and you often get a degenerate fundamental matrix, because linear equations are not independent. Another point is that the 8pt algorithm uses an overridden system of linear equations, so that a single outlier destroys the fundamental matrix.

Have you tried to use the RANSAC result? I am sure this is one of the right solutions for F.

My F-matrix is ​​apparently accurate (0.22% deviation in the display compared to typical coordinate values), but when testing E against x'Ex = 0 using normalized image matches, the typical error is that the display is> 100 % of the normalized coordinates themselves. Is the check E against xEx = 0 valid, and if so, then where is the jump in error from?

Again, if F is degenerate, x'Fx = 0 can be for every point correspondence.

Another reason for incorrect E may be the camera switch (K1T * E * K2 instead of K2T * E * K1). Remember to check: x'Ex = 0

“Fulfillment of the internal constraint” is still very strange with me - how can one justify just making a new core matrix from part of the decomposition of the original?

This is explained in "Multidimensional Geometry of Vision in Computer Vision" by Hartley and Sisserman. As far as I know, this is due to minimizing the Frobenius F.

You can use Google, and there are pdf resources.

Is there a more efficient way to determine which combination of R and t is than calculating P and triangulating some normalized coordinates?

No, as far as I know.

My last re-projection error is hundreds of pixels in 720p images. Am I probably looking at problems in calibration, P-matrix or triangulation?

Your solid body transform P2 is wrong because E is wrong.

+2
source

All Articles