Japanese word wrap algorithms

Question

Japanese word wrap algorithms

In a recent web application that I built, I was pleasantly surprised when one of our users decided to use it to create something entirely in Japanese. However, the text was wrapped strangely and awkwardly. Browsers apparently cannot handle the very convenient packaging of Japanese text, possibly because it contains several spaces, as each character forms a whole word. However, this is not a completely safe assumption to make, since some words are constructed from several characters, and it is not safe to break some groups of characters into different lines.

Google googling did not help me better understand the problem. It seems to me that you need a dictionary of indestructible templates, and suppose that everywhere you can safely break. But I'm afraid that I don’t know enough about the Japanese to really know all the words that I understand from some of my searches, it’s quite difficult.

How do you approach this problem? Are there any libraries or algorithms that you know of that already exist that handle this in a satisfactory way?

+6

algorithm unicode internationalization word-wrap cjk

Breton Jan 19 '10 at 0:45

source share

1 answer

Michael borgwardt · Accepted Answer · 2010-01-19T00:57:03+0000

Japanese word wrap rules are called kinsoku shori and are surprisingly simple. In fact, they are mainly associated with punctuation symbols and do not try to keep words unscathed at all.

I just checked with a Japanese novel and indeed, both words are in a syllabic kana script, and those that consist of several Chinese ideograms are wrapped with impunity in the middle of the word.

Japanese word wrap algorithms

More articles: