I am trying to create a static augmented reality scene over a photograph with 4 defined correspondences between coplanar points on the plane and the image.
Here is a step by step:
- The user adds an image using the device’s camera. Suppose that it contains a rectangle captured with some perspective.
- The user determines the physical size of the rectangle lying in the horizontal plane (YOZ from the point of view of SceneKit). Suppose that the center is the world origin (0, 0, 0), so we can easily find (x, y, z) for each angle.
- The user determines the uv coordinates in the image coordinate system for each corner of the rectangle.
- A SceneKit scene is created with a rectangle of the same size and is visible from the same point of view.
- Other nodes can be added and moved in the scene.

I also measured the position of the iphone camera relative to the center of A4 paper. Thus, for this shot, the position was (0, 14, 42.5), measured in cm. Also, my iPhone was slightly broken on the table (5-10 degrees)
Using this data, I configured SCNCamera to get the desired perspective of the blue plane in the third image:
let camera = SCNCamera() camera.xFov = 66 camera.zFar = 1000 camera.zNear = 0.01 cameraNode.camera = camera cameraAngle = -7 * CGFloat.pi / 180 cameraNode.rotation = SCNVector4(x: 1, y: 0, z: 0, w: Float(cameraAngle)) cameraNode.position = SCNVector3(x: 0, y: 14, z: 42.5)
This will give me a link to compare my result with.
To build AR using SceneKit, I need:
- Adjust the SCNCamera fov to fit the real fov camera.
- Calculate the position and rotation of the node camera using 4 root points between world points (x, 0, z) and image points (u, v)

H is homography; K is the inner matrix; [R | t] - External matrix
I tried two approaches to find the transformation matrix for the camera: using solvePnP from OpenCV and manual calculation from homography based on 4 coplanar points.
Manual approach:
1. Learn about homography

This step is successful because the UV coordinates of world origin look correct.
2. The internal matrix
To get the internal matrix of the iPhone 6, I used this application, which gave me the following result from 100 images 640 * 480 Resolution:

Assuming the input image has a 4: 3 aspect ratio, I can scale the above matrix depending on the resolution

I'm not sure, but this seems like a potential problem. I used cv :: calibrationMatrixValues to check fovx for the calculated internal matrix, and the result was ~ 50 °, and it should be close to 60 °.
3. Camera view matrix
func findCameraPose(homography h: matrix_float3x3, size: CGSize) -> matrix_float4x3? { guard let intrinsic = intrinsicMatrix(imageSize: size), let intrinsicInverse = intrinsic.inverse else { return nil } let l1 = 1.0 / (intrinsicInverse * h.columns.0).norm let l2 = 1.0 / (intrinsicInverse * h.columns.1).norm let l3 = (l1+l2)/2 let r1 = l1 * (intrinsicInverse * h.columns.0) let r2 = l2 * (intrinsicInverse * h.columns.1) let r3 = cross(r1, r2) let t = l3 * (intrinsicInverse * h.columns.2) return matrix_float4x3(columns: (r1, r2, r3, t)) }
Result:

Since I measured the approximate position and orientation for this particular image, I know the transformation matrix that will give the expected result, and this is completely different:

I also slightly conserned about 2-3 elements of the reference rotation matrix, which is -9.1, while it should be close to zero instead, since there is very little rotation.
OpenCV approach:
OpenCV has a solvePnP function, so I tried to use it instead of reinventing the wheel.
OpenCV in Objective-C ++:
typedef struct CameraPose { SCNVector4 rotationVector; SCNVector3 translationVector; } CameraPose; + (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size { vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints]; vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size]; cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0); cv::Mat rvec(3,1,cv::DataType<double>::type); cv::Mat tvec(3,1,cv::DataType<double>::type); cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size]; cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec); SCNVector4 rotationVector = SCNVector4Make(rvec.at<double>(0), rvec.at<double>(1), rvec.at<double>(2), norm(rvec)); SCNVector3 translationVector = SCNVector3Make(tvec.at<double>(0), tvec.at<double>(1), tvec.at<double>(2)); CameraPose result = CameraPose{rotationVector, translationVector}; return result; } + (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array withSize: (CGSize) size { vector<Point2f> points; for (NSValue * value in array) { CGPoint point = [value CGPointValue]; points.push_back(Point2f(point.x - size.width/2, point.y - size.height/2)); } return points; } + (vector<Point3f>) convertObjectPoints: (NSArray<NSValue *> *) array { vector<Point3f> points; for (NSValue * value in array) { CGPoint point = [value CGPointValue]; points.push_back(Point3f(point.x, 0.0, -point.y)); } return points; } + (cv::Mat) intrinsicMatrixWithImageSize: (CGSize) imageSize { double f = 0.84 * max(imageSize.width, imageSize.height); Mat result(3,3,cv::DataType<double>::type); cv::setIdentity(result); result.at<double>(0) = f; result.at<double>(4) = f; return result; }
Usage in Swift:
func testSolvePnP() { let source = modelPoints().map { NSValue(cgPoint: $0) } let destination = perspectivePicker.currentPerspective.map { NSValue(cgPoint: $0)} let cameraPose = CameraPoseDetector.findCameraPose(source, imagePoints: destination, size: backgroundImageView.size); cameraNode.rotation = cameraPose.rotationVector cameraNode.position = cameraPose.translationVector }
Output:

The result is better, but far from my expectations.
Some other things I also tried:
- This question is very similar, although I do not understand how the accepted answer works without built-in functions.
- decomposeHomographyMat also did not give me the expected result
I am really stuck with this problem, so any help would be greatly appreciated.