Given the accented Unicode word, for example u'́' , I need to remove the sharp ( u'' ), and also change the format of the accent to u'+' , where '+' represents the sharp over the previous letter.
Now I use the dictionary for recognizable and uncharacteristic characters:
accented_list = [u'́', u'́', u'́', u'́', u'́', u'́', u'́', u'́', u'́'] regular_list = [u'', u'', u'', u'', u'', u'', u'', u'', u''] accent_dict = dict(zip(accented_list, regular_list))
I want to do something like this:
def changeAccentFormat(word): for letter in accent_dict: if letter in word: its_index = word.index(letter) word = word[:its_index + 1] + u'+' + word[its_index + 1:] return word
But of course, this does not work as desired. I noticed that this code:
>>> word = u'́' >>> for letter in word: ... print letter
gives
´
(Well, I did not expect the appearance of an empty character, but nonetheless). So, I wonder what is the easiest way to produce [u'', u'', u'́', u'', u''] ? Or maybe there is some way to solve my problem without this?
python unicode python-unicode
Frauhahnhen
source share