Convert PDF to image (with proper formatting)

I have a pdf file (attached). My goal is to convert PDF to image using pdfbox AS IT IS (just like using the window cut tool). PDF has all kinds of forms and text.

I am using the following code:

PDDocument doc = PDDocument.load("Hello World.pdf"); PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(67); BufferedImage bufferedImage = firstPage.convertToImage(imageType,screenResolution); ImageIO.write(bufferedImage, "png",new File("out.png")); 

This is the PDF i want to convert

when I use the code, the image file gives absolutely wrong outputs (out.png is attached) This is the image file converted from pdfbox

how to make pdfbox something like a direct snapshot?

Also, I noticed that png image quality is not so good, is there a way to increase the resolution of the generated image?

EDIT: here is the pdf (see page 68) https://drive.google.com/file/d/0B0ZiP71EQHz2NVZUcElvbFNreEU/edit?usp=sharing

EDIT 2: it seems that all the text is removed. I also tried using the PDFImageWriter class

 test.writeImage(doc, "png", null, 68, 69, "final.png",TYPE_USHORT_GRAY,200 ); 

same result

+3
android format image pdf pdfbox
Mar 11 '14 at 17:56
source share
3 answers

it turns out that jpedal (lgpl) performs the conversion perfectly (just like a snapshot).

here is what i used:

 PdfDecoder decode_pdf = new PdfDecoder(true); FontMappings.setFontReplacements(); decode_pdf.openPdfFile("Hello World.pdf"); decode_pdf.setExtractionMode(0,800,3); try { for(int i=0;i<40;i++) { BufferedImage img=decode_pdf.getPageAsImage(2+i); ImageIO.write(img, "png",new File(String.valueOf(i)+"out.png")); } } catch (IOException ex) { Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex); } decode_pdf.closePdfFile(); } catch (PdfException e) { e.printStackTrace(); } 

It works great.

+2
Mar 14 '14 at 6:41
source share

Using PDFRenderer allows you to convert a PDF page into image formats.

Convert PDF page to image in java Using PDF Renderer. Jars Required PDFRenderer-0.9.0

 package com.pdfrenderer.examples; import java.awt.Graphics2D; import java.awt.Image; import java.awt.Rectangle; import java.awt.image.BufferedImage; import java.io.File; import java.io.RandomAccessFile; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; import javax.imageio.ImageIO; import com.sun.pdfview.PDFFile; import com.sun.pdfview.PDFPage; public class PdfToImage { public static void main(String[] args) { try { String sourceDir = "C:/Documents/Chemistry.pdf";// PDF file must be placed in DataGet folder String destinationDir = "C:/Documents/Converted/";//Converted PDF page saved in this folder File sourceFile = new File(sourceDir); File destinationFile = new File(destinationDir); String fileName = sourceFile.getName().replace(".pdf", "_cover"); if (sourceFile.exists()) { if (!destinationFile.exists()) { destinationFile.mkdir(); System.out.println("Folder created in: "+ destinationFile.getCanonicalPath()); } RandomAccessFile raf = new RandomAccessFile(sourceFile, "r"); FileChannel channel = raf.getChannel(); ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size()); PDFFile pdf = new PDFFile(buf); int pageNumber = 62;// which PDF page to be convert PDFPage page = pdf.getPage(pageNumber); System.out.println("Total pages:"+ pdf.getNumPages()); // create the image Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight()); BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB); // width & height, // clip rect, // null for the ImageObserver, // fill background with white, // block until drawing is done Image image = page.getImage(rect.width, rect.height, rect, null, true, true ); Graphics2D bufImageGraphics = bufferedImage.createGraphics(); bufImageGraphics.drawImage(image, 0, 0, null); File imageFile = new File( destinationDir + fileName +"_"+ pageNumber +".png" );// change file format here. Ex: .png, .jpg, .jpeg, .gif, .bmp ImageIO.write(bufferedImage, "png", imageFile); System.out.println(imageFile.getName() +" File created in: "+ destinationFile.getCanonicalPath()); } else { System.err.println(sourceFile.getName() +" File not exists"); } } catch (Exception e) { e.printStackTrace(); } } } 

ConvertedImage:

Chemistry_cover_62

+4
Mar 13 '14 at 11:23
source share

I get the same result as the OP using PDFBox version 1.8.4. However, in version 2.0.0-SNAPSHOT, it looks better:

enter image description here

Here, only some arrows are thinner, and some parts of the arrow are not drawn correctly as fields.

In this way,

how to make pdfbox something like a direct snapshot?

In current versions of the version (up to 1.8.4), there seems to be a greater shortage when rendering PDF files as images. You can switch to the current development version (for example, the current trunk, 2.0.0-SNAPSHOT) or wait until improvements are released.

In addition, some minor deficits even in 2.0.0-SNAPSHOT. You might want to present your sample document to the people from the PDFBox (i.e., create the corresponding problem in their JIRA) so that they can further improve the PDFBox to suit your needs.

Also, I noticed that png image quality is not so good, is there a way to increase the resolution of the generated image?

There is convertToImage overload with resolution parameters. Your current code really sets the screenResolution resolution. Increase this resolution value.

PS: The code for rendering the PDF page for the image was reorganized into 2.0.0-SNAPSHOT. Instead

 BufferedImage image = page.convertToImage(); 

Now you are doing

 BufferedImage image = RenderUtil.convertToImage(page); 

I assume that this was done to remove AWT direct links from the main classes, since AWT is not available, for example, Android.




PS : The SNAPSHOT I used last year in this answer was just a snapshot, subject to change. Version 2.0.0 is still under development, a lot has changed. Especially there is no RenderUtil class. Instead, you currently need to use the PDFRenderer in the org.apache.pdfbox.rendering package ...

+3
Mar 12 '14 at 16:43
source share



All Articles