If you have notes in a PDF file, I would use Apache PDFBox to get images from the input PDF file containing notes, then find the coordinates the whole stroke you need, with the selected image determine the coordinates to crop the image and manipulate it until get the desired result.
PDDocument document = null; document = PDDocument.load(inFile); List pages = document.getDocumentCatalog().getAllPages(); Iterator iter = pages.iterator(); while (iter.hasNext()) { PDPage page = (PDPage) iter.next(); PDResources resources = page.getResources(); Map pageImages = resources.getImages(); if (pageImages != null) { Iterator imageIter = pageImages.keySet().iterator(); while (imageIter.hasNext()) { String key = (String) imageIter.next(); PDXObjectImage image = (PDXObjectImage) pageImages.get(key); image.write2OutputStream(); } } }
Here is sample code available in Apache PDFBox.
import java.io.File; import java.io.IOException; import java.util.Iterator; import java.util.List; import java.util.Map; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDResources; import org.apache.pdfbox.pdmodel.encryption.AccessPermission; import org.apache.pdfbox.pdmodel.encryption.StandardDecryptionMaterial; import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject; import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm; import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage; public class ExtractImages { private int imageCounter = 1; private static final String PASSWORD = "-password"; private static final String PREFIX = "-prefix"; private static final String ADDKEY = "-addkey"; private static final String NONSEQ = "-nonSeq"; private ExtractImages() { } public static void main( String[] args ) throws Exception { ExtractImages extractor = new ExtractImages(); extractor.extractImages( args ); } private void extractImages( String[] args ) throws Exception { if( args.length < 1 || args.length > 4 ) { usage(); } else { String pdfFile = null; String password = ""; String prefix = null; boolean addKey = false; boolean useNonSeqParser = false; for( int i=0; i<args.length; i++ ) { if( args[i].equals( PASSWORD ) ) { i++; if( i >= args.length ) { usage(); } password = args[i]; } else if( args[i].equals( PREFIX ) ) { i++; if( i >= args.length ) { usage(); } prefix = args[i]; } else if( args[i].equals( ADDKEY ) ) { addKey = true; } else if( args[i].equals( NONSEQ ) ) { useNonSeqParser = true; } else { if( pdfFile == null ) { pdfFile = args[i]; } } } if(pdfFile == null) { usage(); } else { if( prefix == null && pdfFile.length() >4 ) { prefix = pdfFile.substring( 0, pdfFile.length() -4 ); } PDDocument document = null; try { if (useNonSeqParser) { document = PDDocument.loadNonSeq(new File(pdfFile), null, password); } else { document = PDDocument.load( pdfFile ); if( document.isEncrypted() ) { StandardDecryptionMaterial spm = new StandardDecryptionMaterial(password); document.openProtection(spm); } } AccessPermission ap = document.getCurrentAccessPermission(); if( ! ap.canExtractContent() ) { throw new IOException( "Error: You do not have permission to extract images." ); } List pages = document.getDocumentCatalog().getAllPages(); Iterator iter = pages.iterator(); while( iter.hasNext() ) { PDPage page = (PDPage)iter.next(); PDResources resources = page.getResources();
Now, to crop the image, you can use:
public InputStream cropAndScale(InputStream mainImageStream, CropRectangle crop) { try { RenderedOp mainImage = loadImage(mainImageStream); RenderedOp opaqueImage = makeImageOpaque(mainImage); RenderedOp croppedImage = cropImage(opaqueImage, crop); RenderedOp scaledImage = scaleImage(croppedImage); byte[] jpegBytes = encodeAsJpeg(scaledImage); return new ByteArrayInputStream(jpegBytes); } catch (Exception e) { throw new IllegalStateException("Failed to scale the image", e); } }
which is available on this page and.
There is another option for parsing images inside a pdf file, look at this code specifically this
cMinor
source share