Display first page of PDF as images

Question

Display first page of PDF as images

I am creating a web application where I display images / pdf in thumbnail format. Onclicking the corresponding image / pdf opens in a new window.

For PDF I have (this is the code of the new window)

<iframe src="images/testes.pdf" width="800" height="200" />

Using this, I see the entire PDF in a web browser. However, for sketching purposes, I want to display only the first page of the PDF as an image.

I tried

  <h:graphicImage value="images/testes.pdf" width="800" height="200" />

however it does not work. Any idea how to do this?

Update 1

I provide the path to the pdf file, for example, the target. However, I do have images in the database. In fact, I have the code as shown below.

 <iframe src="#{PersonalInformationDataBean.myAttachmentString}" width="800" height="200" />

Update 2

For the sketch that I use,

  <h:graphicImage height=200 width=200 value="....">

however, I also need to achieve the same for PDF.

I hope I understand what I expect ...

+8

html image pdf jsf-2 iframe

Fahim parkar Aug 6 '12 at 12:43

source share

4 answers

I'm not sure that all browsers display your embedded PDF (done via <h:graphicImage value="some.pdf" ... /> ) equally well.

Extract the first page in PDF

If you insist on using PDF, I would recommend one of these two command line tools to extract the first page of any PDF:

pdftk
Ghostscript

Both are available for Linux, Mac OS X, and Windows.

pdftk team

 pdftk input.pdf cat 1 output page-1-of-input.pdf

Ghostscript command

 gs -o page-1-of-input.pdf -sDEVICE=pdfwrite -dPDFLastPage=1 input.pdf

(On Windows, use gswin32c.exe or gswin64c.exe instead of gs .)

~~pdftk slightly faster than Ghostscript when it comes to page extraction, but for a single page this difference is probably not taken into account.~~ As in the latest released version v9.05, the previous sentence is no longer true. I found that Ghostscript (including all overhead service data) takes ~ 1 second to extract the 1st page from the specification of PDF page 756, while PDFTK took ~ 11 seconds.

Convert 1st page to JPEG

If you want to be sure that even older browsers can render your first page well, then convert it to JPEG. Ghostscript is your friend here (ImageMagick cannot do this on its own, it also needs Ghostscript help):

 gs -o page-1-of-input-PDF.jpeg -sDEVICE=jpeg -dLastPage=1 input.pdf

If you need page 33, you can do it like this:

 gs -o page-33-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=33 -dLastPage33 input.pdf

If you need a series of PDF files, such as pages 17-23, try the following:

 gs -o page-16+%03d-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=17 -dLastPage23 input.pdf

Note that the %03d note is incremented with each page processed, starting from 1. Thus, your first JPEG name will be page-16+001-of-input-PDF.jpeg .

Maybe PNG is better?

Remember that JPEG format is not suitable for images containing high contrast black and white colors, as well as sharp edges such as text pages. PNG is much better for this.

To create PNG from the 1st PDF page with Ghostscript is easy:

 gs -o page-1-of-input-PDF.png -sDEVICE=pngalpha -dLastPage=1 input.pdf

Analog options like JPEG are true when it comes to extracting page ranges.

+4

Kurt pfeifle Aug 6 '12 at 18:47

source share

Warning: Do not use the Ma9ic script (posted in another answer) unless you want to ...

... do PDF-> JPEG conversion consume much more time + resources than it should be
... completely abandon the process of converting PDF-> JPEG.

While this might work for you, there are so many problems in these 8 small lines of Bash.

At first,
it uses identify to extract the number of pages from the input pdf. However, identify (part of ImageMagick) cannot fully process PDF files on its own. It should run Ghostscript as a "delegate" to handle PDF input. It would be much more efficient to use Ghostscript directly rather than launching it indirectly through ImageMagick.

Secondly,
It uses the conversion convert to PDF-> JPEG. Same note as above: it uses Ghostscript anyway, so why not run it directly?

Thirdly,
it iterates over the pages and starts another convert process for each individual PDF page, which is 100 conversions for a 100-page PDF file. That means: it also runs 100 Ghostscript commands to create 100 JPEGs.

Fourth
The question of Fahim Parkar was to get a thumbnail from the first page of the PDF, and not from all of them.

The script runs at least 201 commands for a 100-page PDF file, when all this can be done in just 1 command. If you are Ghostscript directly ...

... not only will it work faster and more efficiently,
... but also gives you finer and better control over JPEG quality settings.

Use the right tool for the job and use it correctly!

Update:

Since I was asked, here is my alternative implementation of the Ma9ic script.

 #! / bin / bash 
 infile = $ {1}

 gs -q -o $ (basename "$ {infile}") _ p% 04d.jpeg -sDEVICE = jpeg "$ {infile}"

 # To get thumbnail JPEGs with a width 200 pixel use the following command:
 # gs -q -o name_200px_p% 04d.jpg -sDEVICE = jpeg -dPDFFitPage -g200x400 "$ {infile}"

 # To get higher quality JPEGs (but also bigger-in-size ones) with a 
 # resolution of 300 dpi use the following command:
 # gs -q -o name_300dpi_p% 04d.jpg -sDEVICE = jpeg -dJPEGQ = 100 -r300 "$ {infile}"

 echo "Done"

I even did a test. I converted the 756-page PDF-1.7 specification to JPEG files with both scenarios:

The Ma9ic version requires 1413 seconds, generating 756 JPEG.
My version saves 93% of that time and takes 91 seconds.
In addition, the Ma9ic script creates mostly black JPEG images on my system, mine - Ok.

+1

Kurt pfeifle Aug 6 '12 at 16:33

source share

Here's a bash script that converts pages to JPEG images.

 #!/bin/bash PDF='doc.pdf' NUMPAGES=`identify -format %n "$PDF"` for (( IDX=0; IDX<$NUMPAGES; IDX++ )) do PAGE=$(($IDX+1)) convert -resize 1200x900 "$PDF[$IDX]" `echo "$PDF" | sed "s/\.pdf$/-page$PAGE.jpg/"` done echo "Done"

0

Ma9ic Aug 6 '12 at 13:00

source share

Fahim parkar · Accepted Answer · 2013-02-12T12:36:40+0000

This is what I used

 Document document = new Document(); try { document.setFile(myProjectPath); System.out.println("Parsed successfully..."); } catch (PDFException ex) { System.out.println("Error parsing PDF document " + ex); } catch (PDFSecurityException ex) { System.out.println("Error encryption not supported " + ex); } catch (FileNotFoundException ex) { System.out.println("Error file not found " + ex); } catch (IOException ex) { System.out.println("Error handling PDF document " + ex); } // save page caputres to file. float scale = 1.0f; float rotation = 0f; System.out.println("scale == " + scale); // Paint each pages content to an image and write the image to file InputStream fis2 = null; File file = null; for (int i = 0; i < 1; i++) { BufferedImage image = (BufferedImage) document.getPageImage(i, GraphicsRenderingHints.SCREEN, Page.BOUNDARY_CROPBOX, rotation, scale); RenderedImage rendImage = image; // capture the page image to file try { System.out.println("\t capturing page " + i); file = new File(myProjectActualPath + "myImage.png"); ImageIO.write(rendImage, "png", file); fis2 = new BufferedInputStream(new FileInputStream(myProjectActualPath + "myImage.png")); } catch (IOException ioe) { System.out.println("IOException :: " + ioe); } catch (Exception e) { System.out.println("Exception :: " + e); } image.flush(); }

Display first page of PDF as images

Update 1

Update 2

Extract the first page in PDF

pdftk team

Ghostscript command

Convert 1st page to JPEG

Maybe PNG is better?

More articles: