500,000 street names - what data structure to use for quick searches?

So we have a lot of street names. They come to the file. Id probably caches them when loading the server into production. The search should be automatically completed, for example. you type "lang" and you get maybe 8 hits: langstr, langestr. Etc

+7
source share
2 answers

What you are looking for is some kind of concise representation of trie. You might want to look into squeezed attempts or DAWG as a starting point, as they provide excellent performance and very good use of space.

Hope this helps!

+11
source

Autofill is usually done in one of the following ways:

  • The trees . By indexing searchable text in a tree structure (prefix tree, suffix tree, dawg, etc.), you can perform a very fast search by storing memory. Tree traversal can be adapted for approximate matching.
  • Separation of folders . Dividing the text into tokens (ngrams), you can search for instances of templates using a simple hashing scheme.
  • Filtering . Find a set of potential matches, and then apply a consistent algorithm to test each candidate.

See completely , the Java autocomplete library that implements some of the latest concepts.

0
source

All Articles