I am trying to write a procedure that takes a PDF file provided by the user and extracts each page as an image, and then fills the array with these images. I found several examples that add all pages to a single image, but no one does what I need.
This is what I have, but it returns an empty array:
function PdfToImg($pdf_in) { $img_array = array(); $im = new imagick(); $im->readimageblob($pdf_in); // reading image from binary string $num_pages = $im->getnumberimages(); $im->setimageformat("png"); for ($x =1;$x <= $num_pages; $x++) { $img = $im->previousimage(); $img_array .= $img; } return $img_array; }
One of the caveats here is that I cannot write these files to disk, I must use strings / arrays. I looked through the ImageMagick manual and found nothing about outputting multiple images to an array, only to a series of files stored on disk.
UPDATE: (06/13/2012) I found a way to achieve what I need, but it is ugly, inefficient, and I'm sure it is slow, but there seems to have been no other way.
function PdfToImg3($pdf_in) { $img_array = array(); $im = new imagick(); $im->readimageblob($pdf_in); $num_pages = $im->getnumberimages(); $i = 0; for($x = 1;$x <= $num_pages; $x++) { $im = new imagick(); $im->readimageblob($pdf_in); $im->setiteratorindex($i); $im->setimageformat('png'); $img_array[$x] = $im->getimageblob(); $im->destroy(); $i++; } $im->destroy(); return $img_array; }
Produces an array named $ img_array with incoming PDF pages located in the $ img_array keys as PNG image data strings.
There must be a better way, why nextImage () will not work? Why can't I use setIteratorIndex without reinitializing / (creating new?) Imagick objects every time? I have to miss something, but there are slit holes in the documentation, and Google, ImageMagick forums and StackOverflow don't know anything about this.
TEST: An extremely slow, simple 17-page PDF file takes almost a minute.
UPDATE 2: (07/11/2012) Having finished a major project that included this code, I decided to go back to a few points and improve performance. Here is what I came up with:
$img_array = array(); $im = new imagick(); $im->readimageblob($pdf_in); $num_pages = $im->getnumberimages(); $im->destroy(); $i = 0; for($x = 1;$x <= $num_pages; $x++) { $im = new imagick(); $im->readimageblob($pdf_in); $im->setResolution(300,300); $im->setiteratorindex($i); $im->setimageformat('png'); $img_array[$x] = $im->getimageblob(); $im->destroy(); $i++; } return $img_array;
This change has led to a complex conversion of PDF pages to 4 pages, lasting from 21-25 seconds to about 2-3 seconds. I understand why some of the changes helped, and not so clearly. Hope someone finds this helpful.
UPDATE3: I found out why the performance has grown so much by moving 'setResolution to a value lower than' readImageBlob ', it ignores the DPI parameter, which by default is 72. Noting this, I moved the declaration back and reduced it to 150 and achieved similar results, but that's all much better. See Notes on php.net here .