I just wrote the same thing for myself, but in pandas ....
import pandas as pd import numpy as np from fuzzywuzzy import fuzz from fuzzywuzzy import process d1={1:'Tim','2':'Ted',3:'Sally',4:'Dick',5:'Ethel'} d2={1:'Tam','2':'Tid',3:'Sally',4:'Dicky',5:'Aardvark'} df1=pd.DataFrame.from_dict(d1,orient='index') df2=pd.DataFrame.from_dict(d2,orient='index') df1.columns=['Name'] df2.columns=['Name'] def match(Col1,Col2): overall=[] for n in Col1: result=[(fuzz.partial_ratio(n, n2),n2) for n2 in Col2 if fuzz.partial_ratio(n, n2)>50 ] if len(result): result.sort() print('result {}'.format(result)) print("Best M={}".format(result[-1][1])) overall.append(result[-1][1]) else: overall.append(" ") return overall print(match(df1.Name,df2.Name))
In this I used a threshold of 50, but it is configurable.
Dataframe1 looks like
Name 1 Tim 2 Ted 3 Sally 4 Dick 5 Ethel
And Dataframe2 looks like
Name 1 Tam 2 Tid 3 Sally 4 Dicky 5 Aardvark
Thus, the launch is performed with the account
['Tid', 'Tid', 'Sally', 'Dicky', ' ']
Hope this helps.
source share