This question is quite old, but it is interesting and may be useful to someone.
Firstly, here is how I understood the problem presented in the question:
You have two images, i 1 and i 2 , taken with the same digital camera in two different positions. These images show a set of markers; they all lie in the common plane p m . There is also a measured object, the visible surface of which lies in the plane p o parallel to the plane of the marker, but with a slight offset. You calculated the homography H m 12 , which maps the positions of the markers in I 1 to the corresponding positions of the markers in I 2 and you measured the offset d mo between the planes p o and p m . From this, you would like to calculate the homography of H o 12 points of display of the measured object in i 1 to the corresponding points in i <sub> 2sub>.
A few notes on this issue:
First, note that homography is the relationship between image points, while the distance between the marker plane and the object plane is the distance in world coordinates. Using the latter, in order to do something about the former, it is necessary to have a metric estimate of the position of the camera, i.e. You need to determine the Euclidean and approximate position and orientation of the camera for each of the two images. The Euclidean requirement implies that a digital camera must be calibrated, which should not be a problem for an โoptical measurement systemโ. The scale requirement implies that the true three-dimensional distance between two given three-dimensional points must be known. For example, you need to know the true distance l 0 between two arbitrary markers.
Since we only need a relative camera position for each image, we can choose a three-dimensional coordinate system centered and aligned with the camera coordinate system for i 1 . Therefore, we denote the projection matrix for ฯ 1 by P 1 = K 1 * [I | 0]. Then we denote the projection matrix for ฯ 2 (in the same three-dimensional coordinate system) by P 2 = K 2 * [R 2 | t 2 ]. We will also denote the D 1 and D 2 coefficients that distort the distortion of the lens, respectively, for I 1 and I 2 .
As soon as one digital camera acquired both I 1 and I 2 , you can assume that K 1 = K 2 = K and D 1 = D 2 = D. However, if I 1 and I 2 were acquired with a long delay between acquisitions (or with other scaling, etc.), it will more accurately take into account that two different camera arrays and two sets of distortion factors are involved.
Here's how you might approach this problem:
The steps for evaluating P 1 and P 2 are as follows:
Rate K 1 , K 2 and D 1 , D 2 through digital camera calibration
Use D 1 and D 2 to correct images i 1 and i 2 for lens distortion, then determine the position of the marker in the corrected images
Calculate the fundamental matrix F 12 (points of mapping in I 1 to epilines in I 2 ) from the corresponding marker positions and derive the essential matrix E 12 = K 2 T * F 12 * K 1 to the sub>
The derivation of R 2 and t 2 from E 12 and a one-point correspondence (see this answer to the corresponding question). At the moment, you have an affine assessment of the camera pose, but not to scale, since t 2 has a unit norm.
Use the measured distance l 0 between two arbitrary markers to deduce the correct rate for t 2 .
For better accuracy, you can refine P 1 and P 2 using a bunch adjustment, with K 1 and | | t <south> 2sub> || fixed based on the corresponding marker positions in i 1 and i 2 .
At this point, you have an accurate metric estimate of the camera pose P 1 = K 1 * [I | 0] and P 2 = K 2 * [R 2 | t 2 ]. Now the steps to evaluate H o 12 :
Use D 1 and D 2 to correct images I 1 and I 2 for the lens distortion, then determine the position of the marker in the corrected images (the same as 2. above, do not repeat this) and evaluate H m 12 from these corresponding positions
Calculate the 3x1 v vector describing the plane of the markers p m , solving this linear equation: Z * H m 12 = K 2 * (R 2 - t 2 * v T ) * K 1 -1 (see chapter 13 of HZ00, result 13.5 and equation 13.2 for reference), where Z is the scaling factor. Print the distance to the beginning d m = || v || and the normal n = v / || v ||, which describe the plane of markers p m in 3D.
Since the plane of the object p o is parallel to p m , they have the same normal n. Therefore, you can derive the distance to the beginning d o for p o from the distance to the beginning d m for p m and from the measured displacement of the plane d mo , as shown below: d o = d m ยฑ d mo (the sign depends on the relative position of the planes: positive if p m is closer to the camera for i 1 than p o , negative otherwise).
From n and d o describing the plane of the object in 3D, derive the homography H o 12 = K 2 * (R 2 - t 2 * n T / d o ) * K 1 -1 (see chapter 13 of HZ00, equation 13.2)
Homography H o 12 maps the points on the measured object in I 1 to the corresponding points in I 2 , where it is assumed that both I 1 and I 2 are corrected to distort the lens. If you need to match the points from and to the original distorted image, be sure to use the distortion factors D 1 and D 2 to convert the input and output points H o <sub> 12sub>.
The link I used:
[HZ00] "Multiple geometry of vision for computer vision", R. Hartley and A. Zisserman, 2000.