What is the efficiency of a space oriented acyclic graph of words (dawg)? and is there a javascript implementation?

I have a keyword dictionary that I want to make available for autocomplete / suggestions on the client side of the web application. The ajax turnover introduces too much latency, so it would be nice to keep the entire list of words on the client.

The list may contain hundreds of thousands of words, maybe a couple of million. I have done some research, and this suggests that the dawg structure will provide space and search efficiency, but I can not find real numbers in the world.

Also, feel free to offer other features to achieve the same functionality.

+4
source share
5 answers

I recently introduced a DAWG program for playing game programs. He uses a dictionary of 2.7 million words from the Polish language. The source text file is about 33 MB in size. The same list of words that is represented as DAWG in a binary file takes up only 5 MB. The actual size may vary, since it depends on the implementation, so the number of vertices is 154k and the number of edges 411k are more important figures.

However, this amount of data is too large to handle JavaScript, as described above. Attempting to process several MB of data will delay the JavaScript interpreter for several minutes, effectively hanging the entire browser.

+4
source

My mind shrinks on two facts: a couple of millions and JavaScript. JS is designed to shuffle small chunks of data, not megabytes. Just imagine how long users will have to wait for your page to load!

There must be a reason why AJAX turnover is so slow in your case. Google serves a billion AJAX requests every day, and their type is ahead fast (just try on www.google.com). Therefore, something must be broken in your setup. Find it and fix it.

+1
source

Your solution seems practical, but you might want to look, for example, at the implementation of jQuery autocomplete (s) to see how they handle the delay.

0
source

Several million words in memory (in JavaScript in a browser)? It sounds a lot, no matter what structure you decide to keep. Instead, you can consider other types of optimization, for example, loading subsets of your list of words based on the characters you type.

For example, if the user enters โ€œa,โ€ then you will begin to extract all the words starting with โ€œaโ€. You can then optimize your list of words by first returning more common words, so more likely ones will match instantly, while less common words may load a little slower.

0
source

Of my undemanding, DAWGs are good for storing and searching words, but not when you need to create hit lists. After you place the prefix, you will have to review all its children to recover words starting with this prefix.

I agree with others, you should consider server search.

0
source

All Articles