Algorithm for checking addresses for matches?

I am working on a survey program where people will receive advertising considerations on their first survey. In many scenarios, the only way to stop people from cheating on the system and getting promotions that they don't deserve is to check the street address lines against each other.

I looked at using levenshtein distance to give me a number to measure the similarity, and consider those below a certain threshold are duplicates.

However, if someone was looking for a game on the system, they could easily write "S 5th St" instead of "South Fifth Street", and levenshtein would think that these lines would be completely different. So I decided to convert all the lines to the "standard address form", that is, "South" becomes "s", "Fifth" becomes "fifth", etc.

Then I thought it was hopeless, and too much effort to make it work hard. It?

I work with PHP / MySql, so I have limitations inherent in this system.

+5
source share
3 answers

, , . , , , "" . ". 4- -", "-" , .

(, , ) , . ,

β†’ N.
β†’ E.
...
β†’ 1-
β†’ 2-
β†’ 3-
...
β†’ - Avenue β†’ Ave.

, , . , .

( ) . ( , .)

+3

API Google ( API ) (lat/long).

0

. questions .

  • :

    avenue β†’ ave β†’ rd β†’ rd

    β†’ 1 1 β†’ 1

SOUNDEX - catch, , (, Schmitt, Schmitd, ). SOUNDEX , SOUNDEX.


, ​​ Google, . , / . . .

0

All Articles