I am trying to extract the contents of an image from a file created by a Hamamatsu NanoZoomer slide scanner. The NDPI file uses a modified TIFF structure and saves the image in one large fragment in JPEG format. Using StripOffsets and StripByteCounts, I can extract data that should be a JPEG file.
The data stream has all the correct signature for the JPEG file, for example FFD8, the beginning of the scan marker and FFD9, the end of the scanning marker. If this image is smaller than 65500 * 65500 pixels, then I can open the file just fine if I save the data stream to a jpeg file.
In the JFIF header, the third and fourth bytes after the FFC0 marker represent the height of the image; two bytes subsequently represent the width of the image. However, with sizes greater than 65500 * 65500 pixels (actually 122880 * 78848 pixels), these four bytes, which supposedly represent the image height and image width, are zeros. I changed them to 255, 220, 255, 220, after this (line 255-263). When I checked the jpeg information by right-clicking on it in Windows and selecting the details, I saw that the Windows Photo Viewer reads a resolution of 65500 * 65500, despite the fact that they do not represent the actual pixel resolution. The problem is that when I tried to open the image, it apparently was decoded incorrectly.
So my question is: how can I open such a jpeg file correctly? Or tell me, how can I correctly decode the entirety of such image content into memory?
Now I am trying to understand the structure of a file using MATLAB. In the end, I will use Python + OpenCV (or, if necessary, Python + Cython + libjpeg-turbo) to read the entire image in memory.
source
share