How to ensure the loading of images before creating a PDF?

I have a PHP loop that does the following:

  • Log in to the web page via CURL
  • Capture and internal page requiring login
  • Save HTML page file in local file
  • Using WKHTMLTOPDF, displaying the page as a PDF

The problem that I am facing is that from time to time (maybe ~ 30% of the time) the images will not be displayed in the PDF file. If I open one of the saved HTML files, I will find that I need to go in and manually refresh the page to display the images.

Any ideas on how pragmatic to provide image loading? Things I tried:

  • sleep(n) between each line
  • Adding --javascript-delay 30000 to my WKHTMLTOPDF call to provide enough time to load any images.

# 1 made it worse, and # 2 did nothing.

Thanks!

+7
source share
5 answers

Between steps 3 and 4 of your example, you can consider parsing the HTML file for all image links and loading them separately using curl, also saving them locally and then updating the links in the saved HTML file to indicate new local image resources instead of deleted ones.

This should greatly improve image loading time when rendering HTML to PDF.

+1
source

I have never done this, but maybe you can find out if loading is done by iteratively calling curl_getinfo() and then reading the values ​​for CURLINFO_SIZE_DOWNLOAD - until that value changes anymore?

0
source

What if you clear html with cURL, run php through each img element and read in the binary data of the image file and replace the urr attribute of the src image with the base64 encoding value of the open image file, for example:

'<img src="data:image/jpg;base64,'. base64_encode($imagedata) . '"/>'

if the base64 image data is hard-coded on the page, than this will give you a programmatic way to check that all the images are β€œloaded” and prevent the problem of converting PDFs, starting with downloading all the images ...

0
source

Could you add onLoad to the images you need to know which are loading? something like

 <img src='foo.jpg' onLoad='callbackFuncion();'/> 
0
source

Perhaps you can handle the loaded HTML, look for img tags, then upload the images to local storage and replace the src attribute. Therefore, you must generate a PDF after all the images are available.

0
source

All Articles