view the following snippet:
>>> import unicodedata >>> from unicodedata import normalize, name >>> normalize('NFKD', u'\xb4') u' \u0301' >>> normalize('NFKD', u'a\xb4a') u'a \u0301a' >>> normalize('NFKC', u'a\xb4a') u'a \u0301a' >>> name(u'\xb4'), name(u'\u0301') ('ACUTE ACCENT', 'COMBINING ACUTE ACCENT')
I am trying to figure out if the behavior is correct for translating u'\xb4' to u' \u0301' . Why does it combine a sharp accent with space? Why does this translate u \xb4 ?
In fileformat we see that ACUTE ACCENT was called SPACING ACUTE . I thought this meant that the cursor should move and not wait for the next character to be entered.
UPD: in case anyone is interested, here is a list if Unicode characters that after the NFKC normalization take place at the beginning: http://pastebin.com/Z99r5AK9
python unicode
newtover
source share