How to determine the resolution (DPI) of images embedded in a PDF document?

I have a PDF document that also contains images.

Now I want to know the resolution of these images.

The first step would be to somehow get the images from the PDF document. But how?

Is this possible with anything provided by Cocoa?

+8
objective-c pdf cocoa quartz-graphics poppler
source share
5 answers

Take a look at this answer to another question:

  • Can a PDF contain images with different DPIs?

Basically, now you can use the (new) -list for the Poppler pdfimages command-line pdfimages (it will NOT work for XPDF version pdfimages !).

It will report the size of each image appearing on the requested pages.

(You can also use it to extract images from a PDF: pdfimages -png -f 3 -l 5 some.pdf prefix--- will extract all images as a PNG from a PDF file, starting with f on the first page 3 and ending with l ast page 5 using the prefix filename prefix--- for each image, but this problem does not seem to be the main concern of your question ...)

Example:

 pdfimages -list -f 1 -l 3 /Users/kurtpfeifle/Downloads/ct-magazin-14-14-2012.pdf

   page num type width height color comp bpc enc interp object ID
   -------------------------------------------------- -------------------
      1 0 image 1247 1738 rgb 3 8 jpx no 3053 0
      2 1 image 582 839 gray 1 8 jpeg no 2080 0
      2 2 image 344 364 gray 1 8 jpx no 2079 0
      3 3 image 581 838 rgb 3 8 jpeg no 7 0
      3 4 image 1088 776 rgb 3 8 jpx no 8 0
      3 5 image 6 6 rgb 3 8 image no 9 0
      3 6 image 8 6 rgb 3 8 image no 10 0
      3 7 image 4 6 rgb 3 8 image no 11 0
      3 8 image 212 106 rgb 3 8 jpx no 12 0
      3 9 image 150 68 rgb 3 8 jpx no 13 0
      3 10 image 6 6 rgb 3 8 image no 14 0
      3 11 image 4 4 ​​rgb 3 8 image no 15 0

It does not report DPI resolution directly, but from the dimensions “width” and “height” you can easily calculate it: you measure the width of the image on the screen using a ruler of inches, and then divide the “width” of pixels by the measured ruler number ...

Do you find this strange because the result depends on your current zoom level? Yes this!

The concept of "permission" always depends on the environment. The so-called "hi-res" image is usually always a lot of pixels wide and high. This allows you to improve the quality (or "resolution") if the image needs to be displayed or printed with a higher zoom level.


Update

Meanwhile, there is a new version of (Poppler's) pdfimages :

 $ pdfimages -version pdfimages version 0.33.0 [....] 

It also reflects the resolution of embedded images in PPI format (pixels per inch), in horizontal ( x-ppi ) and vertical ( y-ppi ) directions:

 page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio ------------------------------------------------------------------------------------- 1 0 image 1247 1738 rgb 3 8 jpx no 3053 0 151 151 228K 3.6% 2 1 image 582 839 gray 1 8 jpeg no 2080 0 72 72 319B 0.1% 2 2 image 344 364 gray 1 8 jpx no 2079 0 150 150 4325B 3.5% 3 3 image 581 838 rgb 3 8 jpeg no 7 0 73 73 1980B 0.1% 3 4 image 1088 776 rgb 3 8 jpx no 8 0 150 151 106K 4.3% 3 5 image 6 6 rgb 3 8 image no 9 0 150 150 108B 100% 3 6 image 8 6 rgb 3 8 image no 10 0 150 150 158B 110% 3 7 image 4 6 rgb 3 8 image no 11 0 150 150 73B 101% 3 8 image 212 106 rgb 3 8 jpx no 12 0 150 150 2396B 3.6% 3 9 image 150 68 rgb 3 8 jpx no 13 0 150 150 1878B 6.1% 3 10 image 6 6 rgb 3 8 image no 14 0 150 150 81B 75% 3 11 image 4 4 rgb 3 8 image no 15 0 150 150 50B 104% 

This new feature first appeared in Poppler version 0.25 (released on December 11, 2013). He also reports ...

  • ... (file) and
  • ... (compression).

... embedded images.

pdfimages -list

Perhaps I should also tell you about the limitations of the pdfimages utility and give an example when its output report is not entirely correct.

One example is this manual PDF format from my (recently created) GitHub PDF Repository, to help beginners learn the syntax of PDF source code .

I originally created this PDF file to demonstrate the error with the Mozilla PDF.js handler . Here is a screenshot of how it looks in PDF.js (on the left) and how it should look when displayed correctly (on the right, Ghostscript and Adobe Reader are rendered):

F3EXz.pngbmwAW.png

(Right-click on each of the above images. Select "Open Image in New Tab" to see the exact differences ... ")


The PDF file contains a 2 × 2 pixel image, is embedded only once (with an object identifier of 5 0 ), but is displayed on the page several times with different settings , where each time the image is placed ...

  • ... in a different position,
  • ... with different scaling,
  • ... with a different spin,
  • ... even with a different bias.

In these extreme circumstances, pdfimages -list drops on the nose, trying to determine some permissions for instances of this image:

 page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio ------------------------------------------------------------------------------------ 1 0 image 2 2 rgb 3 8 image no 5 0 4 4 13B 108% 1 1 image 2 2 rgb 3 8 image no 5 0 5 3 13B 108% 1 2 image 2 2 rgb 3 8 image no 5 0 3 5 13B 108% 1 3 image 2 2 rgb 3 8 image no 5 0 6 3 13B 108% 1 4 image 2 2 rgb 3 8 image no 5 0 3 10 13B 108% 1 5 image 2 2 rgb 3 8 image no 5 0 4 72000 13B 108% 1 6 image 2 2 rgb 3 8 image no 5 0 4 2 13B 108% 1 7 image 2 2 rgb 3 8 image no 5 0 2 4 13B 108% 1 8 image 2 2 rgb 3 8 image no 5 0 14401 1 13B 108% 1 9 image 2 2 rgb 3 8 image no 5 0 1 2 13B 108% 1 10 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108% 1 11 image 2 2 rgb 3 8 image no 5 0 4 0.950 13B 108% 1 12 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108% 1 13 image 2 2 rgb 3 8 image no 5 0 1 4 13B 108% 1 14 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108% 1 15 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108% 1 16 image 2 2 rgb 3 8 image no 5 0 4 0.950 13B 108% 

pdfimages -list returns most values ​​if there is no rotation and / or skew. It is not surprising that there are discrepancies in the fact that the image rotates or is skewed: because how would you reliably determine the values ​​of x-ppi and y-ppi for such cases? This explains the (completely incorrect) 72000 y-ppi values ​​for image no. 5 and 14401 x-ppi for image no. 8.

As you can easily notice, pdfimages is smart enough to define other image properties:

  • It correctly reports the same object identifier 5 0 for all instances of the displayed image, indicating that this image is embedded once , but is displayed several times on the page.
  • It correctly reports image sizes as 2x2 pixels.
+8
source share

This is not easy, but it is possible. Although you cannot do this with PDFDocument , you can use the CGPDF* material in Quartz instead. In short: you will need to use CGPDFPageGetDictionary() to get the dictionary for the page that the image is on, and then get information about its XObject (assuming that it is not embedded in the stream) from the dictionary. Even this is not easy - you will need to consult the PDF standard to understand how XObject can be formatted, and then use the various CG* routines to deploy what you need.

I should add that the default DPI ("custom block") for a PDF is 72. In addition, many images in PDF files are created with vector graphics, so they really don't have a DPI by default.

+6
source share

You need the dimensions of the raw XObject image that is accessed by the Do command

+1
source share

The answer is, of course, no, because PDF documents do not really have internal solutions. The resolution ultimately depends on who is processing the document and its elements at that time. It may even vary depending on the zoom scale you use in Adobe Acrobat.

For example, I created a 16x16 pixel 2D barcode and scaled it by an inch in width and an inch before adding it to the document. It looks very crisp (i.e. a lot of pixels per square element) in the adobe acrobat reader, but when I send the resulting PDF output to the fax service, it ends with a resolution of 100x200 (approximately). When I print the same document in a laser printer, it ends at about 400 dpi. When I click on the barcode image in the acrobat reader and copy / paste it into Gimp, it appears as a tiny 16x16 bitmap.

+1
source share

This answer is intended to complement @Kurt Pfeifle's answer and works outside of Objective C.

As an alternative:

If you have a Windows system and you do not have a compiler, then the easiest way. Download the Windows XPDF binaries; then use pdfimages to extract the images, convert them to BMP format, and then mspaint will tell you the resolution. The advantages of this method are as follows:

  • You can get the exact resolution without evaluating it by measuring the size of the image;

  • WILL works for XPDF version pdfimages .

Disadvantages:

  • It takes a bit more work, including converting the file to a format that you can open without changing the resolution;

  • You should do this for each file separately, instead of getting a list.

  • It gives the resolution of the images themselves, not the resolution with which they appeared in the PDF file. (thanks to Kurt Pfeifle's comment)

-one
source share

All Articles