Say you have a large table containing a varchar column.
How would you match strings containing the word "preferred" in varchar col, but the data is somewhat noisy and contains random spelling errors, for example:
['$2.10 Cumulative Convertible Preffered Stock, $25 par value', '5.95% Preferres Stock', 'Class A Preffered', 'Series A Peferred Shares', 'Series A Perferred Shares', 'Series A Prefered Stock', 'Series A Preffered Stock', 'Perfered', 'Preffered C']
The permutations of the word “preferable” in the spelling errors above seem to show a family resemblance , but very little that they all have in common. Note that splitting each word and doing levenshtein on each word on each line will be overly expensive.
UPDATE:
There are several more similar examples, for example. with "limited":
['Resticted Stock Plan', 'resticted securities', 'Ristricted Common Stock', 'Common stock (restrticted, subject to vesting)', 'Common Stock (Retricted)', 'Restircted Stock Award', 'Restriced Common Stock',]
source share