Determine if a Unicode character is full or half width in C ++

I am writing a terminal (console) application that should wrap arbitrary text in Unicode.

Terminals usually use a font of monospaced (fixed width), so for wrapping text it is just more than counting characters and viewing whether a word is on the line or not and acts accordingly.

The problem is that there are full width characters in the Unicode table that occupy 2 characters wide in the terminal.

Count that they will see one Unicode character, but the printed character has a width of 2 "normal" (half-width), violating the wrapping procedure, because it does not know characters that are twice as wide.

As an example, this is a full-width character (U + 3004, JIS character)

  〄
 12

Here it does not occupy the entire width of 2 characters, although it is pre-formatted, but it uses the double width of the western character in the terminal.

To deal with this, I have to distinguish between full-width or half-width characters, but I cannot find a way to do this in C ++. Do I really need to know all the full width characters in a Unicode table to get around the problem?

+6
source share
2 answers

You must use the ICU u_getIntPropertyValue with the UCHAR_EAST_ASIAN_WIDTH property.

For instance:

 bool is_fullwidth(UChar32 c) { int width = u_getIntPropertyValue(c, UCHAR_EAST_ASIAN_WIDTH); return width == U_EA_FULLWIDTH || width == U_EA_WIDE; } 

Note that if your graphics library supports character concatenation, you will also need to consider them when deciding how many cells the sequence uses; for example e followed by U+0301 COMBINING ACUTE ACCENT will occupy only 1 cell.

+6
source

There is no need to create tables, people from Unicode have already done this:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

The same code is used in terminal emulation software, for example xterm [1] , konsole [2], and most likely others ...

+2
source

All Articles