MySQL function to check percent similarity between two texts

I need MySQL code to check the percentage of similarity between the text submitted through the form, on several texts stored in the MySQL database.

I am looking for a MySQL stored procedure that will work as a PHP function similar_text(). There is already a MySQL Levenshtein distance procedure, but this is not enough.

When a user sends text, the algorithm should return any record in the database with a given percentage of similarity with the submitted text (it will compare only one column in the database), for example, return all records from the database that have similarity> 40% to text, submitted by the user.

For example, a table

TABLE - Articles
id, article_body, article_title

The code should return all lines that have a similarity percentage> 40% (or other given value) with the text (article_body) that the user sent.

+5
source share
2 answers

I would do it in the application.

Perhaps the result of the SOUNDEX function will help you -

SELECT SOUNDEX('Hello'), SOUNDEX('Hello world'), SOUNDEX('hellboy');
+------------------+------------------------+--------------------+
| SOUNDEX('Hello') | SOUNDEX('Hello world') | SOUNDEX('hellboy') |
+------------------+------------------------+--------------------+
| H400             | H4643                  | H410               |
+------------------+------------------------+--------------------+
+1
source

I think the algorithm should be like that.

  • first calculate the length of a given word (using LENGTH ).
  • then search for that word in a specific column (using INSTR or any specific function)
  • now calculates the length of each matching word and uses simple math

: " " 50%, db Hello World Hellboy '

() = 4
length (hello world) = 11
(hellboy) = 7

(11-4)/11 = 63,63%
hellboy (7-4)/7 = 42,85%

Hello World .

, .

0

All Articles