Check, as a rule, incorrectly recognized characters in a string against a list of known strings

Background

I have a list of codes in my (MySQL) database of six (6) characters. They consist of numbers and letters chosen at random. They are considered case insensitive, however they are stored in capital letters in the database. They may consist of numbers 0, but not letters O. I use this code as a one-time user authentication.

Problem

Codes were written manually on cards, and unfortunately some letters and numbers may be similar to some people. That is why I did not initially use the letter Obecause of its close appearance to the handwritten 0.

What have i done so far

I can check the code (case insensitive) against user input and determine if it is an exact match. If it’s not, I silently replace it Owith 0and try again.

Question

My question is: how can I do this for other letters and numbers, such as the ones I listed below, and still relatively confident that I will not authenticate the user like someone who is not? In this case, both characters may exist in the code. I looked at the Levenshtein function in PHP ( http://php.net/manual/en/function.levenshtein.php ) as well as similar_text()( http://php.net/manual/en/function.similar-text.php ), but I'm not quite what I want, so I think that maybe I will have to turn my own (perhaps using them) to achieve this.

Similar characters:

S <=> 5
G <=> 6
I <=> 1
+4
3

, , - -. , , . .

@bishop, , - . , :

. , , . , ABCDE5 ABCDES, .

, , , .

( , , . "S" "5", , , - , , "S" "5", 5, S, , . , , , .)

, , , , , , .

EDIT:

, , :

<?php

$inputs = [
        'ABCDEF', // No ambiguity, DB should return 0 or 1 match.
        'AAAAA1', // One ambiguous char, user could have meant `AAAAAI`
                  // instead so search DB for both.
        '156ISG', // Worst case. If the DB values overlap a lot, there
                  // wouldn't be much hope of "guessing" what the user
                  // actually meant.
];

foreach ($inputs as $input) {
    print_r(generatePossibleMatches($input));
}

//----------------------------------------
function generatePossibleMatches($input) {
    $input = strtoupper($input);
    $ambiguous = [
        'I' => '1',
        'G' => '6',
        'S' => '5',
    ];
    $possibles = [$input];
    foreach ($ambiguous as $letter => $number) {
        foreach ($possibles as $possible) {
            foreach (str_split($possible) as $pos => $char) {
                $addNumber = substr_replace($possible, $number, $pos, 1);
                $addLetter = substr_replace($possible, $letter, $pos, 1);
                if ($char === $letter && !in_array($addNumber, $possibles)) {
                    $possibles[] = $addNumber;
                }
                if ($char === $number && !in_array($addLetter, $possibles)) {
                    $possibles[] = $addLetter;
                }
            }
        }
    }
    return $possibles;
}
+4

: "" , , . : "AIX", "A [I1] X".

:

$input = 'S1G6AB'; // given this
$store = '5I6GAB'; // need to match this

// convert each confusing character to a regular expression character class
$regex = implode('', array_map(function ($c) {
    $map = ['S'=>'[S5]','5'=>'[S5]','1'=>'[1I]','I'=>'[1I]','G'=>'[6G]','6'=>'[6G]'];
    return (array_key_exists($c, $map) ? $map[$c] : $c);
}, str_split($input)));

// match regex representing the input against the stored value    
echo (0 < preg_match("/$regex/", $store) ? 'Match' : 'No match');

, , . X "ABCDE1", Y "ABCDEI", .


@beporter

( MySQL), , :

SELECT COUNT(*) FROM Table WHERE token REGEXP '$regex'

2 , , . , , - , , ? UX.

+2

-

Although you have letters and numbers, you can convert everything to binary (ASCII values) and compare them using the Hamming distance. If the distance is greater than some threshold value, reject it. Otherwise, you are essentially looking for a string that matches your need to identify your “misrecognized" characters. You are right - you may have to build it yourself.

+1
source

All Articles