How to remove accents from values ​​in columns?

How to change special characters to regular letters of the alphabet? This is my data frame:

In [56]: cities Out[56]: Table Code Country Year City Value 240 Åland Islands 2014.0 MARIEHAMN 11437.0 1 240 Åland Islands 2010.0 MARIEHAMN 5829.5 1 240 Albania 2011.0 Durrës 113249.0 240 Albania 2011.0 TIRANA 418495.0 240 Albania 2011.0 Durrës 56511.0 

I want it to look like this:

 In [56]: cities Out[56]: Table Code Country Year City Value 240 Aland Islands 2014.0 MARIEHAMN 11437.0 1 240 Aland Islands 2010.0 MARIEHAMN 5829.5 1 240 Albania 2011.0 Durres 113249.0 240 Albania 2011.0 TIRANA 418495.0 240 Albania 2011.0 Durres 56511.0 
+6
source share
4 answers

Use this code:

 df['Country'] = df['Country'].str.replace(u"Å", "A") df['City'] = df['City'].str.replace(u"ë", "e") 

Look here ! Of course, you have to do this for each special character and each column.

+1
source

The pandas method is to use the vectorized str.normalize in combination with str.decode and str.encode :

 In [60]: df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8') Out[60]: 0 Aland Islands 1 Aland Islands 2 Albania 3 Albania 4 Albania Name: Country, dtype: object 

So, to do this for all str types of dtypes:

 In [64]: cols = df.select_dtypes(include=[np.object]).columns df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')) df Out[64]: Table Code Country Year City Value 0 240 Aland Islands 2014.0 MARIEHAMN 11437.0 1 1 240 Aland Islands 2010.0 MARIEHAMN 5829.5 1 2 240 Albania 2011.0 Durres 113249.0 3 240 Albania 2011.0 TIRANA 418495.0 4 240 Albania 2011.0 Durres 56511.0 
+8
source

This is for Python 2.7. To convert to ASCII, you can try:

 import unicodedata unicodedata.normalize('NFKD', u"Durrës Åland Islands").encode('ascii','ignore') 'Durres Aland Islands' 
+1
source

Pandas example

 def remove_accents(a): return unidecode.unidecode(a.decode('utf-8')) df['column'] = df['column'].apply(remove_accents) 

in this case decodes asciis

+1
source

All Articles