Python Pandas Replace Special Character

For some reason, I cannot get this simple operator to work with ñ . It seems that he works for something else, but does not like this character. Any ideas?

 DF['NAME']=DF['NAME'].str.replace("ñ","n") 

thanks

+4
source share
1 answer

I assume you are using Python 2.x here, and this is probably a Unicode problem. Do not worry, you are not alone. Unicode is really complex in general and especially in Python 2, so it has become standard in Python 3.

If all that bothers you is ñ you should decode it in UTF-8 and then just replace one character.

It looks something like this:

 DF['name'] = DF['name'].str.decode('utf-8').replace(u'\xf1', 'n') 

As an example:

 >>> "sureño".decode("utf-8").replace(u"\xf1", "n") u'sureno' 

If your string is already Unicode, you can (and should actually) skip the decode step:

 >>> u"sureño".replace(u"\xf1", "n") u'sureno' 

Note that u'\xf1' uses hex escape for the character in question.

Update

I was told in the comments that <>.str.replace is a pandas series method that I did not understand. The answer to this might be something like the following:

 DF['name'] = map(lambda x: x.decode('utf-8').replace(u'\xf1', 'n'), DF['name'].str) 

or something in that direction if this pandas object is iterable.

Another update

It actually occurred to me that your problem might be simple:

 DF['NAME']=DF['NAME'].str.replace(u"ñ","n") 

Notice how I added u before the line to make it unicode.

+7
source

All Articles