Full-text search: find similar program names

I am looking for a full-text search algorithm that will find similar program names, such as "Mozilla Firefox" and "Firefox 3.5", or "Adobe Reader" and "Adobe Acrobat Reader v10". Levenshtein distance is too inefficient in this case, since the spelling does not change.

It should use sequential scanning (not indexing).

I need maximum accuracy and minimal errors. What would you recommend?

Thanks!

+5
source share
1 answer

Pattern Comparison

I used the following to automatically adjust some domain names.

, , , . , , "" . , , .

: Mozilla Firefox  = > ['mo', 'oz', 'zi', 'il', 'll', 'la', 'a', 'f', 'fi', 'ir', 're', 'fo', 'ox']

:

  • 'Firefox 3.5' = > 5,
  • 'Adobe Reader' = > 0,
  • 'Adobe Acrobat Reader v10' = > 1

.

, , .

c - , :

d = c (A) + c (B) - c (A + B)

d, A B. , , , , ..

, , .

SGDB

SQL Server, SQLite MySQL .
"", .

MySQL:

SELECT
  t.*,
  MATCH(my_field) AGAINST 'Mozilla Firefox' as relevance
FROM
  table t
WHERE
  MATCH(my_field) AGAINST 'Mozilla Firefox'
ORDER BY relevance DESC
LIMIT 100
+3

All Articles