Pandas replace multiple values ​​at once

I am trying to clear some data that I have from an excel file. The file contains 7400 rows and 18 columns, which includes a list of customers with their respective addresses and other data. The problem I am facing is that some cities have a spelling error, which distorts the information and makes further processing difficult.

SURNAME | ADDRESS | CITY 0 Jenson | 252 Des ChΓͺnes | D.DO 1 Jean | 236 Gouin | DOLLARD 2 Denis | 993 Boul. Gouin | DOLLARD-DES-ORMEAUX 3 Bradford | 1690 Dollard #7 | DDO 4 Alisson | 115 Du Buisson | IL PERROT 5 Abdul | 9877 Boul. Gouin | Pierrefonds 6 O'Neil | 5 Du College | Ile Bizard 7 Bundy | 7345 Sherbrooke | ILLE Perot 8 Darcy | 8671 Anthony #2 | ILE Perrot 9 Adams | 845 Georges | Pierrefonds 

In the above example, D.DO, DOLLARD, DDO should be written DOLLARD-DES-ORMEAUX and IL PERROT, ILLE PEROT, ILE PERROT should be written ILE-PERROT.

I managed to replace the values ​​using:

 df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True) df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True) 

Is there a way to combine the above operations into one? I tried:

 df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True) 

but i'm out of luck

+7
python pandas
source share
2 answers

try the .replace({}, regex=True) method:

 replacements = { 'CITY': { r'(D.*DO|DOLLARD.*)': 'DOLLARD-DES-ORMEAUX', r'I[lL]*[eE]*.*': 'ILLE Perot'} } df.replace(replacements, regex=True, inplace=True) print(df) 

Output:

  SURNAME ADDRESS CITY 0 Jenson 252 Des Chβ”œΒ¬nes DOLLARD-DES-ORMEAUX 1 Jean 236 Gouin DOLLARD-DES-ORMEAUX 2 Denis 993 Boul. Gouin DOLLARD-DES-ORMEAUX 3 Bradford 1690 Dollard #7 DOLLARD-DES-ORMEAUX 4 Alisson 115 Du Buisson ILLE Perot 5 Abdul 9877 Boul. Gouin Pierrefonds 6 O'Neil 5 Du College ILLE Perot 7 Bundy 7345 Sherbrooke ILLE Perot 8 Darcy 8671 Anthony #2 ILLE Perot 9 Adams 845 Georges Pierrefonds 
+8
source share

You can create a dictionary of substitutions and then iterate over them using "loc" to replace them.

 target_for_values = { 'DOLLARD-DES-ORMEAUX': ['D.DO', 'DOLLARD', 'DDO'], 'ILE-PERROT': ['IL PERROT', 'ILLE PEROT', 'ILE PERROT']} for k, v in target_for_values.iteritems(): df.loc[df.CITY.str.upper().isin(v), 'CITY'] = k >>> df.CITY CITY 0 C.DO 1 DOLLARD-DES-ORMEAUX 2 DOLLARD-DES-ORMEAUX 3 DOLLARD-DES-ORMEAUX 4 ILE-PERROT 5 Pierrefonds 6 Ile Bizard 7 ILE-PERROT 8 ILE-PERROT 9 Pierrefonds 
+2
source share

All Articles