I am trying to clear some data that I have from an excel file. The file contains 7400 rows and 18 columns, which includes a list of customers with their respective addresses and other data. The problem I am facing is that some cities have a spelling error, which distorts the information and makes further processing difficult.
SURNAME | ADDRESS | CITY 0 Jenson | 252 Des ChΓͺnes | D.DO 1 Jean | 236 Gouin | DOLLARD 2 Denis | 993 Boul. Gouin | DOLLARD-DES-ORMEAUX 3 Bradford | 1690 Dollard #7 | DDO 4 Alisson | 115 Du Buisson | IL PERROT 5 Abdul | 9877 Boul. Gouin | Pierrefonds 6 O'Neil | 5 Du College | Ile Bizard 7 Bundy | 7345 Sherbrooke | ILLE Perot 8 Darcy | 8671 Anthony #2 | ILE Perrot 9 Adams | 845 Georges | Pierrefonds
In the above example, D.DO, DOLLARD, DDO should be written DOLLARD-DES-ORMEAUX and IL PERROT, ILLE PEROT, ILE PERROT should be written ILE-PERROT.
I managed to replace the values ββusing:
df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True) df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True)
Is there a way to combine the above operations into one? I tried:
df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True)
but i'm out of luck
python pandas
Lukasz
source share