Effective way to check if a Unicode string is NFC in Python?

Question

Effective way to check if a Unicode string is NFC in Python?

I want to check if a string is already in NFC form. I am currently doing:

unicodedata.normalize('NFC', s) == s

I do this for a large number of lines, so I would like to be efficient. The above method seems wasteful. It converts to NFC and then performs string comparison.

Is there a more efficient way to do this? I thought:

len(unicodedata.normalize('NFC', s)) == len(s)

This avoids string comparisons. But I'm not sure that this is always correct. This works if NFC normalization always changes the length of a string other than NFC. Is this a valid assumption?

Any other ideas?

+4

python unicode normalization unicode-normalization python-unicode

user2771609 Sep 01 '15 at 19:35

source share

1 answer

一二三 · Accepted Answer · 2015-09-02T01:10:54+0000

. , 'Ω' (U + 2126) NFC 'Ω' (U + 03A9).

Unicode " " , , , Python unicodedata . , unicodedata.normalize() , , - .

, , Unicode Python (, PyICU).

Effective way to check if a Unicode string is NFC in Python?

More articles: