RegEx for full-text spelling errors

Question

RegEx for full-text spelling errors

I have a MySQL table with the following columns:

City Country Continent New York States Noth America New York Germany Europe - considering there one ;) Paris France Europe

If I want to find "New Yokr" with a typo, this is easy with the MySQL function saved:

 $querylev = "select City, Country, Continent FROM table WHERE LEVENSHTEIN('City','New Yokr') < 3"

But if there are two cities in New York, in the full-text search you can put “States of New York” and you will get the desired result.

So the question is, can I find the "New Yokr Statse" and get the same results?

Is there any function combining levenshtein and fulltext to create a single solution, or should I create a new column in MySQL combining 3 columns?

I know there are other solutions like lucene or Sphinx (also soundex, metaphone, but not suitable for this), but I think it can be quite difficult for me to implement them.

+5

php regex full-text-search regex-group levenshtein distance

Tronne Feb 17 '13 at 16:29

source share

1 answer

Emma · Answer 1 · 2019-05-27T18:00:28+0000

This is a great question and a good example of how we can use character lists and regular expression boundaries to design queries and get the data we want.

Depending on the accuracy we can wish for and the data that we have in the database, we can certainly develop custom queries based on various expressions, such as this example for New York State with various types:

 ([new]+\s+[york]+\s+[stae]+)

Here we have three lists of characters that we can update with other possible letters.

 [new] [york] [stae]

We also added two sets of \s+ as our borders here to increase accuracy.

Demo

This snippet shows how capture groups work:

 const regex = /([new]+\s+[york]+\s+[stae]+)/gmi; const str = 'Anything we wish to have before followed by a New York Statse then anything we wish to have after. Anything we wish to have before followed by a New Yokr State then anything we wish to have after. Anything we wish to have before followed by a New Yokr Stats then anything we wish to have after. Anything we wish to have before followed by a New York Statse then anything we wish to have after. '; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } // The result can be accessed through the 'm'-variable. m.forEach((match, groupIndex) => { console.log('Found match, group ${groupIndex}: ${match}'); }); }

Php

 $re = '/([new]+\s+[york]+\s+[stae]+)/mi'; $str = 'Anything we wish to have before followed by a New York Statse then anything we wish to have after. Anything we wish to have before followed by a New Yokr State then anything we wish to have after. Anything we wish to have before followed by a New Yokr Stats then anything we wish to have after. Anything we wish to have before followed by a New York Statse then anything we wish to have after. '; preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0); // Print the entire match result var_dump($matches);

RegEx for full-text spelling errors

Demo

Php

More articles: