I want to check if a string is already in NFC form. I am currently doing:
unicodedata.normalize('NFC', s) == s
I do this for a large number of lines, so I would like to be efficient. The above method seems wasteful. It converts to NFC and then performs string comparison.
Is there a more efficient way to do this? I thought:
len(unicodedata.normalize('NFC', s)) == len(s)
This avoids string comparisons. But I'm not sure that this is always correct. This works if NFC normalization always changes the length of a string other than NFC. Is this a valid assumption?
Any other ideas?
source
share