The actual physics of the lens is explained, for example, on this website of the State University of Georgia .
See this illustration for an explanation of how you can use linear zoom or focal length ratios to determine the size of an object from the image size:

In particular, -i / h' = o / h , and this o / h ratio is valid for all such triangles (i.e. an object of size 2h at a distance of 2o has the same size h' in the figure). Thus, as you can see, even in the case of a complete equation, you cannot know the distance o and the size h object, but one will give you the other.
On the other hand, two objects at the same distance o will see that their sizes h1' and h2' in the image will be proportional to their sizes in real life h1 and h2 , since h1' / h1 = M = h2' / h2 >.
Therefore, if you know both o and h for one object, you know M , thereby knowing the size of the object on the film, you can subtract its size from a distance and vice versa.
The value of -i / h' naturally expressed for maximum h' . If the size of the object accurately fills the image, it fills the field of view, then the ratio of its distance to its size is tan(α/2) = (l / 2) / d (note that in the legend of the image below, d = o and l = 2 * h ).

This α is what you call theta in your example. Now, from the size of the image you can get at what angle you see the image - that is, what size l will the image have if it is at a distance d . From there, you can infer the size of an object from its distance and vice versa.
Algorithm Steps:
- get the coefficient
r = size of object in image (in px) / total size of image (in px) .
Do this along the axis for which you know or plan to get the real size of the object, of course. - get the corresponding field of view and angle, multiply r by the tangent half of this angle
r *= tan(camera.getParameters().getXXXXViewAngle() / 2) r - now the tangent of the polygon under which you see the object, so the following relationships are true: r = (l / 2) / d = h / o (with the corresponding designation of the drawings).- If you know the distance
d to the object, its size l = 2 * r * d - If you know the size
l object, it is at a distance d = l / (2 * r)
This works for objects that the camera actually points to, if they are not centered, math can be turned off.