How to find the shortest text as autocomplete?

I have a list of strings and I want to find the shortest unique way to identify them. This is a bit like autocomplete, but for this set there will always be the shortest identifiable way.

As an example.

PA  for Paddington
PE  for Penryn
PLO for Plymouth
PLP for Plympton
PO  for Portsmouth
Q   for Quebec

I have several thousand names (these are not cities, but program names).

I need a relatively short sequence that will be ok (for the above list, both the key and the value are ok).

Any methods / algorithms for this would be helpful.

I know that I will have to code it (using PHP), but as long as I understand the algorithm, I am happy.

I think I need to build a value tree while they are standing, and then start navigating this tree one character at a time, ignoring sequences that have one option (for example, L and Y in Plymouth / Plympton).

So, starting with Q in Quebec, I would find that all the way through the tree, all subsequent letters are used only once, so Q is enough at this stage.

+4
source share
2 answers

You can start by creating a hash table structure that maps possible substrings to a list of all names starting with that substring. This can become a fairly large data structure, but since you can short-circuit the moment you reach a unique substring, you can save the size from unreasonably large. Here is an example of using C #:

var names = new[]{
"Paddington",
"Penryn",
"Plymouth",
"Plympton",
"Portsmouth",
"Quebec"};
// First, for any given subsequence, find groups of names that
// start with it.
var groups = new Dictionary<string, List<string>>();
ILookup<string, string> newGroups;
List<string> namesToProcess = names.ToList();
int i = 0;
do
{
    // Stop looking at names once we're getting substrings too long for them.
    namesToProcess = namesToProcess.Where(n => n.Length >= i).ToList();
    newGroups = namesToProcess.ToLookup(n => n.Substring(0, i));
    foreach(var g in newGroups)
    {
        groups.Add(g.Key, g.ToList());
    }
    // stop looking at names once we find that they're the only ones
    // matching a given substring.
    namesToProcess = namesToProcess
        .Except(newGroups
            .Where(g => g.Count() == 1)
            .Select(g => g.Single()))
        .ToList();
    i++;
} while (newGroups.Any());

, , , - . , :

// Now build the best code to use for each name
var codeNamePairs = names.ToDictionary(n => 
{
    var sb = new StringBuilder();
    for(int j = 0; j < n.Length; j++)
    {
        var prefix = n.Substring(0, j+1);
        var withSamePrefix = groups[prefix];
        // Only add the next letter if it helps to narrow down
        // the possibilities
        if(withSamePrefix.Count != groups[sb.ToString()].Count)
        {
            sb.Append(n[j]);
        }
        if(withSamePrefix.Count == 1)
        {
            // Once we reach a prefix that unique to this name,
            // then we know we've built the code we want.
            break;
        }
    }
    return sb.ToString();
});

, PHP, , , .

0

. , :

Paddington
Penryn
Plymouth
Plympton
Portsmouth
Quebec

, , , , . :

Paddigton P, , Pa, .

Penryn id, , Penryn - a P. : P, Pe. , Penryn

Plymouth, , Plymo.

Plympton id Plym, , , id.

.

, , , , PLO .

-1

All Articles