TreeSet Android Dictionary Accelerates Download

I have 300,000 words in my dictionary (actually saved in txt format (new line with delimiters) on the SD card of my Android device). I want to build a data structure that would require as little time as possible to insert words (String-s) from my txt file into this data structure. And this DS should be super fast to check for words in a dictionary (it's DS) or not. I tried several built-in DSs and the fastest IMO was TreeSet. Are there any other (not built-in) DS that would insert / create DS faster and equal TreeSet for search?

And one more thing - I can "help" TreeSet insert faster by rearranging my txt file (put the words in the correct order).

Hello

+4
source share
1 answer

Firstly, it is well done for experiments to find the best structure for your application. Often people will argue without testing various options for obtaining real performance data.

If you want to save build time, and your word file does not change very often, an obvious improvement in build speed is to cache the data structure. Regardless of the data structure you are using, create the structure once and then save the structure on the SD card (and not just save the lines). Standard java.util structures can be saved using Serialization .

If you need the fastest build time and your word list is sorted alphabetically, or maybe you can just save it in a String array. Build time will be very fast, and search time will be similar to TreeSet (using Arrays.binarySearch () ).

If you need a faster search, you can check out Perfect Hash ing or Trie s, but this is not in the standard Java libraries.

Trie will have a lot more memory than any of them, which can speed things up. ( Implementation Detection Information )

I am surprised that TreeSet is faster than HashSet in your experiments, which means that you can work in a situation where memory allocation is expensive. Did you remember to set the initial capacity when you allocated the HashSet? Do not forget to avoid costly adjustment, you need to set the initial capacity to at least the number of elements / 0.75 (load factor).

+5
source

All Articles