This code is working as expected. But large data frames take a lot of time.
for i in excel_df['name_of_college_school'] : for y in mysql_df['college_name'] : if SequenceMatcher(None, i.lower(), y.lower() ).ratio() > 0.8: excel_df.loc[excel_df['name_of_college_school'] == i, 'dupmark4'] = y
I think I cannot use the function in the join clause to compare such values. How to do this for vectorization?
Update:
Is the highest rated update possible? This loop will overwrite the previous match, and it is possible that an earlier match was more relevant than the current one.
pandas
shantanuo
source share