500,000 street names - what data structure to use for quick searches?

Question

500,000 street names - what data structure to use for quick searches?

So we have a lot of street names. They come to the file. Id probably caches them when loading the server into production. The search should be automatically completed, for example. you type "lang" and you get maybe 8 hits: langstr, langestr. Etc

+7

java algorithm data-structures search autocomplete

Toskan Aug 28 '12 at 0:22

source share

2 answers

templatetypedef · Answer 1 · 2012-08-28T00:23:19+0000

What you are looking for is some kind of concise representation of trie. You might want to look into squeezed attempts or DAWG as a starting point, as they provide excellent performance and very good use of space.

Hope this helps!

Miguel fonseca · Answer 2 · 2015-06-23T14:50:33+0000

Autofill is usually done in one of the following ways:

The trees . By indexing searchable text in a tree structure (prefix tree, suffix tree, dawg, etc.), you can perform a very fast search by storing memory. Tree traversal can be adapted for approximate matching.
Separation of folders . Dividing the text into tokens (ngrams), you can search for instances of templates using a simple hashing scheme.
Filtering . Find a set of potential matches, and then apply a consistent algorithm to test each candidate.

See completely , the Java autocomplete library that implements some of the latest concepts.

500,000 street names - what data structure to use for quick searches?

More articles: