How to remove accents from values in columns?

Question

How to remove accents from values in columns?

How to change special characters to regular letters of the alphabet? This is my data frame:

In [56]: cities Out[56]: Table Code Country Year City Value 240 Åland Islands 2014.0 MARIEHAMN 11437.0 1 240 Åland Islands 2010.0 MARIEHAMN 5829.5 1 240 Albania 2011.0 Durrës 113249.0 240 Albania 2011.0 TIRANA 418495.0 240 Albania 2011.0 Durrës 56511.0

I want it to look like this:

 In [56]: cities Out[56]: Table Code Country Year City Value 240 Aland Islands 2014.0 MARIEHAMN 11437.0 1 240 Aland Islands 2010.0 MARIEHAMN 5829.5 1 240 Albania 2011.0 Durres 113249.0 240 Albania 2011.0 TIRANA 418495.0 240 Albania 2011.0 Durres 56511.0

+6

python pandas dataframe

Marius Jun 20 '16 at 15:25

source share

4 answers

The pandas method is to use the vectorized str.normalize in combination with str.decode and str.encode :

 In [60]: df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8') Out[60]: 0 Aland Islands 1 Aland Islands 2 Albania 3 Albania 4 Albania Name: Country, dtype: object

So, to do this for all str types of dtypes:

 In [64]: cols = df.select_dtypes(include=[np.object]).columns df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')) df Out[64]: Table Code Country Year City Value 0 240 Aland Islands 2014.0 MARIEHAMN 11437.0 1 1 240 Aland Islands 2010.0 MARIEHAMN 5829.5 1 2 240 Albania 2011.0 Durres 113249.0 3 240 Albania 2011.0 TIRANA 418495.0 4 240 Albania 2011.0 Durres 56511.0

+8

Edchum Jun 20 '16 at 15:39

source share

This is for Python 2.7. To convert to ASCII, you can try:

 import unicodedata unicodedata.normalize('NFKD', u"Durrës Åland Islands").encode('ascii','ignore') 'Durres Aland Islands'

+1

advance512 Jun 20 '16 at 15:29

source share

Pandas example

 def remove_accents(a): return unidecode.unidecode(a.decode('utf-8')) df['column'] = df['column'].apply(remove_accents)

in this case decodes asciis

+1

Caio andrian Jan 6 '18 at 14:35

source share

Blind0ne · Accepted Answer · 2016-06-20T15:45:56+0000

Use this code:

 df['Country'] = df['Country'].str.replace(u"Å", "A") df['City'] = df['City'].str.replace(u"ë", "e")

Look here ! Of course, you have to do this for each special character and each column.

How to remove accents from values ​​in columns?

More articles:

How to remove accents from values in columns?