"Hugo Boss" "Hugo Bos", "Huggo Boss", "Hugo Boss Ltd".... soundex ( ), "LTD" ".
soundex . "Hugo Boss" "Hugo Bos" "Huggo Boss". "Hugo Boss Ltd" - LTD . , , .
, soundex , . , .
, , "", "LLC", "Corp", - . soundex, .
, ngrams, thomas , ngrams .
NYSIIS:
, -:
1. Translate first characters of name: MAC β MCC, KN β N, K β C, PH, PF β FF, SCH β SSS
2. Translate last characters of name: EE β Y, IE β Y, DT, RT, RD, NT, ND β D
3. First character of key = first character of name.
4. Translate remaining characters by following rules, incrementing by one character each time:
1. EV β AF else A, E, I, O, U β A
2. Q β G, Z β S, M β N
3. KN β N else K β C
4. SCH β SSS, PH β FF
5. H β If previous or next is non-vowel, previous.
6. W β If previous is vowel, A.
7. Add current to key if current is not same as the last key character.
5. If last character is S, remove it.
6. If last characters are AY, replace with Y.
7. If last character is A, remove it.
8. Append translated key to value from step 3 (removed first character)
9. If longer than 6 characters, truncate to first 6 characters. (only needed for true NYSIIS, some versions use the full key)
Soundex . python :
import fuzzy
names = [ 'Catherine', 'Katherine', 'Katarina',
'Johnathan', 'Jonathan', 'John',
'Teresa', 'Theresa',
'Smith', 'Smyth',
'Jessica',
'Joshua',
]
for n in names:
print '%-10s' % n, fuzzy.nysiis(n)
:
$ python show_nysiis.py
Catherine CATARAN
Katherine CATARAN
Katarina CATARAN
Johnathan JANATAN
Jonathan JANATAN
John JAN
Teresa TARAS
Theresa TARAS
Smith SNATH
Smyth SNATH
Jessica JASAC
Joshua JAS
: http://www.informit.com/articles/article.aspx?p=1848528
ngrams .
, - .