OP is here. I seem to have found the answer. I have not implemented it yet, and if someone comes with a less uncomfortable solution (or finds something wrong with this), I will choose my answer instead.
Problem Statement
Let A be the source image, B be the target image, and M be a map from A-coordinates to B-coordinates, i.e.
B[k, l, :] == A(M[k, l, 0], M[k, l, 1], :) for all k, l in B coords.
... where the square brackets indicate a search for an array with integer indices, and the curly brackets indicate a search for bilinear interpolation with floating point indices. We repeat this above using more economical notation:
B = A(M)
We want to find the inverse map N that best maps B back to A:
Find N st A \approx B(N)
The problem can be indicated without reference to A or B:
Find N = argmin_N || M(N) - I_n ||
... where ||*|| indicates the Frobenius norm, and I_n is the identity map with the same dimensions as N, i.e. mapping, where:
I_n[i, j, :] == [i, j] for all i, j
Naive decision
If the values of M are integers and M is an isomorphism, then you can build N directly as:
N[M[k, l, 0], M[k, l, 1], :] = [k, l] for all k, l
Or in our simplified notation:
N[M] = I_m
... where I_m is the identity map with the same dimensions as M.
There are two problems:
- M is not an isomorphism, therefore the above will leave “holes” in N at N [i, j ::] for any [i, j] not among the values in M.
- M values are floating point coordinates [i, j], not integer coordinates. We cannot just assign a value to the bilinear interpolated value N (i, j, :) for float-valued i, j. To achieve an equivalent effect, we should instead set the values [i, j] of the four surrounding angles N [floor (i), floor (j) ,:], N [floor (i), ceil (j) ,:], N [ ceil (i), floor (j) ,:], N [ceil (i), ceil (j) ,:] such that the interpolated value N (i, j, :) is equal to the desired value [k, l], for all pixel mappings [i, j] → [k, l] in M.
Decision
Construct an empty N as a three-dimensional tensor of floats:
N = zeros(size=(A.shape[0], A.shape[1], 2))
For each coordinate [i, j] in the A-coordinate space do:
- Find the 2x2 A-coordinate grid in M that [i, j] is inside. Compute the homography matrix H, which maps these A-coordinates to the corresponding B-coordinates (given by 2x2 grid pixel indices).
- Set N [i, j ,:] = matmul (H, [i, j])
A potentially expensive step here would be to search in step 1 for a 2x2 grid of A-coordinates in M that surrounds [i, j]. The brute force search will do this whole algorithm O (n * m), where n is the number of pixels in A and m is the number of pixels in B.
To reduce this to O (n), instead, one could run a scanning line algorithm in each A-coordinate tetrahedron to identify all the integer values [i, j] that it contains. This could be pre-computed as a hash map that maps the integer values of A-coords [i, j] to the upper left corner of its circumferential four-sided B-coords [k, l].