How to ensure the loading of images before creating a PDF?

Question

How to ensure the loading of images before creating a PDF?

I have a PHP loop that does the following:

Log in to the web page via CURL
Capture and internal page requiring login
Save HTML page file in local file
Using WKHTMLTOPDF, displaying the page as a PDF

The problem that I am facing is that from time to time (maybe ~ 30% of the time) the images will not be displayed in the PDF file. If I open one of the saved HTML files, I will find that I need to go in and manually refresh the page to display the images.

Any ideas on how pragmatic to provide image loading? Things I tried:

sleep(n) between each line
Adding --javascript-delay 30000 to my WKHTMLTOPDF call to provide enough time to load any images.

# 1 made it worse, and # 2 did nothing.

Thanks!

+7

php curl pdf-generation wkhtmltopdf

Chords May 07, '12 at 15:32

source share

5 answers

I have never done this, but maybe you can find out if loading is done by iteratively calling curl_getinfo() and then reading the values for CURLINFO_SIZE_DOWNLOAD - until that value changes anymore?

0

Del pedro May 11 '12 at 14:15

source share

What if you clear html with cURL, run php through each img element and read in the binary data of the image file and replace the urr attribute of the src image with the base64 encoding value of the open image file, for example:

'<img src="data:image/jpg;base64,'. base64_encode($imagedata) . '"/>'

if the base64 image data is hard-coded on the page, than this will give you a programmatic way to check that all the images are “loaded” and prevent the problem of converting PDFs, starting with downloading all the images ...

0

Webchemist May 11 '12 at 16:19

source share

Could you add onLoad to the images you need to know which are loading? something like

 <img src='foo.jpg' onLoad='callbackFuncion();'/>

0

irenkai Jul 20 '12 at 0:18

source share

Perhaps you can handle the loaded HTML, look for img tags, then upload the images to local storage and replace the src attribute. Therefore, you must generate a PDF after all the images are available.

0

Muc Dec 01 '12 at 12:08

source share

stevecomrie · Accepted Answer · 2013-01-11T19:59:58+0000

Between steps 3 and 4 of your example, you can consider parsing the HTML file for all image links and loading them separately using curl, also saving them locally and then updating the links in the saved HTML file to indicate new local image resources instead of deleted ones.

This should greatly improve image loading time when rendering HTML to PDF.

How to ensure the loading of images before creating a PDF?

More articles: