How to use imagik annotateImage for chinese text?

I need to annotate the image with Chinese text, and now I'm using the Imagick library.

An example of Chinese text is

θΏ™ 是 δΈ­ζ–‡

Chinese font file used this

The file was originally called εŽζ–‡ 黑体 .ttf

it can also be found on Mac OSX under / Library / Font

I renamed it to English. STHeiTi.ttf simplified file invocation in php code.

In particular , the Imagick::annotateImage function

I am also using the answer from "How can I draw wrapped text using Imagick in PHP?" .

The reason I use it is because the English text and application require an abstract of both English and Chinese, although not at the same time.

The problem is that when I run annotateImage using Chinese text, I get an annotation that looks like 罍

Code is included here.

+7
source share
3 answers

The complete solution is here:

https://gist.github.com/2971092/232adc3ebfc4b45f0e6e8bb5934308d9051450a4

Key ideas:

You must set the html encoding and internal encoding in the form and on the processing page

 header('Content-Type: text/html; charset=utf-8'); mb_internal_encoding('utf-8'); 

These lines should be in the top lines of php files.

Use this function to determine if the text is Chinese and use the correct font file.

 function isThisChineseText($text) { return preg_match("/\p{Han}+/u", $text); } 

For more information, visit https://stackoverflow.com/a/166778/

Set TextEncoding correctly in ImagickDraw object

 $draw = new ImagickDraw(); // set utf 8 format $draw->setTextEncoding('UTF-8'); 

Pay attention to the UTF header. In this answer, Walter Tross was useful to me : https://stackoverflow.com/a/464829/

Use preg_match_all to explode English words, Chinese words and spaces

 // separate the text by chinese characters or words or spaces preg_match_all('/([\w]+)|(.)/u', $text, $matches); $words = $matches[0]; 

Inspired by this answer https://stackoverflow.com>

Also compatible with English text

+2
source

The problem is that you are giving imagemagick the output of a "line break" ( wordWrapAnnotation ), to which you are utf8_decode text. This is not true if you are dealing with Chinese text. utf8_decode can only work with UTF-8 text, which can be converted to ISO-8859-1 (the most common 8-bit ASCII extension).

Now I hope that the text of UTF-8 is encoded. If it is not, you can convert it like this:

 $text = mb_convert_encoding($text, 'UTF-8', 'BIG-5'); 

or how is it

 $text = mb_convert_encoding($text, 'UTF-8', 'GB18030'); // only PHP >= 5.4.0 

(in your code, $text more likely $text1 and $text2 ).

Then there are (at least) two things in your code:

  • pass the text "as is" (without utf8_decode ) to wordWrapAnnotation ,
  • change the setTextEncoding argument from "utf-8" to "utf-8" according to the specification

I hope all the variables in your code are initialized in some missing part. With these two changes above (the second may not be needed, but you never know ...), and with the missing parts in place, I see no reason why your code should not work if your TTF file is not broken or Imagick the library is broken ( imagemagick , which Imagick is based on, is a great library, so I find this latter possibility unlikely).

EDIT:

Following your request, I am updating my answer

a) the fact that setting mb_internal_encoding('utf-8') very important for the solution, as you say in your own, and

b) my suggestion for a better line splitter that works acceptable for Western languages ​​and for Chinese, and this is probably a good starting point for other languages ​​using Khan logograms (Japanese character and Korean Hanja):

 function wordWrapAnnotation(&$image, &$draw, $text, $maxWidth) { $regex = '/( |(?=\p{Han})(?<!\p{Pi})(?<!\p{Ps})|(?=\p{Pi})|(?=\p{Ps}))/u'; $cleanText = trim(preg_replace('/[\s\v]+/', ' ', $text)); $strArr = preg_split($regex, $cleanText, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY); $linesArr = array(); $lineHeight = 0; $goodLine = ''; $spacePending = false; foreach ($strArr as $str) { if ($str == ' ') { $spacePending = true; } else { if ($spacePending) { $spacePending = false; $line = $goodLine.' '.$str; } else { $line = $goodLine.$str; } $metrics = $image->queryFontMetrics($draw, $line); if ($metrics['textWidth'] > $maxWidth) { if ($goodLine != '') { $linesArr[] = $goodLine; } $goodLine = $str; } else { $goodLine = $line; } if ($metrics['textHeight'] > $lineHeight) { $lineHeight = $metrics['textHeight']; } } } if ($goodLine != '') { $linesArr[] = $goodLine; } return array($linesArr, $lineHeight); } 

In words: input is first cleared, replacing all spaces of spaces, including newlines, with one space, with the exception of leading and trailing spaces, which are removed. It then breaks up either in spaces or immediately before Khan characters that are not preceded by leading characters (for example, opening parentheses or opening quotation marks) or directly before leading characters. Lines are collected in order not to display more than $maxWidth pixels horizontally, unless it is not possible according to the separation rules (in this case, the final rendering is likely to overflow). Modification for forced splitting in cases of overflow is not difficult. Note that, for example, Chinese punctuation is not classified as Han in Unicode, so, with the exception of leading punctuation, line breaks cannot be inserted before it.

+5
source

I'm afraid you will have to choose TTF, which may support Chinese code points. There are many sources for this, here are two:

http://www.wazu.jp/gallery/Fonts_ChineseTraditional.html

http://wildboar.net/multilingual/asian/chinese/language/fonts/unicode/non-microsoft/non-microsoft.html

+3
source

All Articles