Wicked_pdf shows an unknown character when converting in unicode pdf (ruby) format

I am trying to create a pdf file from an html page using wicked_pdf (version 1.1) and wkhtmltopdf-binary gems. My html page contains an emoji calendar that displays well in the browser any font that I use

 <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http-equiv='content-type' content='text/html; charset=utf-8' /> <style> unicode { font-family: 'OpenSansEmoji', sans-serif; } @font-face { font-family: 'OpenSansEmoji'; src: url(data:font/truetype;charset=utf-8;base64,<-- encoded_font_base64_string-->) format('truetype'); } </style> </head> <body> <div><unicode>&#128197;</unicode></div> </body> </html> 

However, when I try to generate a PDF using the WickedPdf.new.pdf_from_html_file method for gem in the rails console,

  File.open(File.expand_path('~/<--pdf_filename-->.pdf'), 'wb+') {|f| f.write WickedPdf.new.pdf_from_html_file('<--absolute_path_of_html_file-->')} 

I get the following result:

PDF result with unknown character

As you can see, the icon of the first calendar is displayed correctly, but there is a second symbol that is displayed, we do not know where it comes from.

I explored using coding in UTF-8 and UTF-16 and a surrogate pair, as suggested by this related entry, https://stackoverflow.com/a/464939/ ... , and examined this problem wkhtmltopdf_git_issue , but this character still cannot disappear!

If you have any tips, this is more than welcome.

Thanks in advance for your help!

EDIT

Following the comments of Eric Duminil and petkov.np, I can confirm that the above code works correctly for me on Linux. This seems to be a Linux issue against MacOS. Can anyone suggest that the kernel of the problem is tied to MacOS and can it be fixed?

+7
ruby ruby-on-rails unicode wkhtmltopdf wicked-pdf
source share
1 answer

I edited this answer several times, see the notes at the end, as well as the comments.

I am using macOs 10.12.2 and have the same problem. I list all browser versions, etc., although I suspect that the biggest factor is the build of the OS / wkhtmltopdf.

  • Chrome: version 55.0.2883.95 (64-bit)
  • Safari: version 10.0.2 (12602.3.12.0.1)
  • wkhtmltopdf: 0.12.3 (with fixed qt)

I use the following snippet example:

 <html> <head> <meta http-equiv="Content-Type" content="text/html" charset="utf-8"> <style type="text/css"> p { font-family: 'EmojiSymbols', sans-serif; } @font-face { font-family: 'EmojiSymbols'; src: local('EmojiSymbols-Regular.woff'), url('EmojiSymbols-Regular.woff') format('woff'); } span:before { content: '\01F60B'; } </style> </head> <body> <p> πŸ˜‹ <span></span> &#x1F60B; &#128523; &#xf0;&#x9f;&#x98;&#x8b; </p> </body> </html> 

I call wkhtmltopdf with the wkhtmltopdf option --encoding 'UTF-8' .

You can see the result here (Sorry for the chronic screenshot). Some brief conclusions:

  • Safari does not display "raw" UTF-8 bytes properly. They seem to treat them the same way as the original sequence of bytes (the last line in the html paragraph). Safari is doing fine.
  • Chrome does everything perfectly.
  • With the above option, wkhtmltopdf displays the raw bytes (view) in order, but does not return the CSS content attribute. Each β€œcorrect” appearance of the Unicode character is accompanied by this strange phantom character.

I tried literally everything, but the results are the same. For me, the fact that even Safari doesn't display raw bytes indicates some system-level issue that is specific to MacOS. It is not clear to me that this should be reported as a wkhtmltopdf problem or there is an incorrect dependency in the macOs assembly.

EDIT: Safari seems to be working fine, my markup has been broken.

EDIT: A CSS workaround may do the trick, please see the comments below.

FINAL EDIT: As shown in the comments, the CSS hack that solves the problems uses text-rendering: optimizeLegibility; . This is apparently only required for macOS / OS X.

From my comment below:

I just found this problem. At first glance it seems inconsequential, but the addition of text rendering: optimizeLegibility; duplicate characters removed on my styles (on macOS). Why is this happening outside of me. Since the question author also uses osx, it is obvious that there is some problem with buildwkhtmltopdf for this os.

+1
source share

All Articles