One structure that I saw to minimize space in the spelling dictionary was to encode each word as:
- number of characters (bytes), common with the last; and
- new finale.
So a list of words
HERE would encode as THIS sanctimonious 0,sanctimonious sanction 6,on sanguine 3,guine trivial 0,trivial
You save 7 bytes right there (19%), I suspect that saving will be like a dictionary of 20,000 words only because of the minimal distances between (common prefixes) of adjacent words.
To speed up the search, in memory there was a table with 26 inputs, in which there were initial offsets for words starting with a, b, c, ..., z. Words at these offsets always had 0 as the first byte, since they did not have common letters with the previous word.
This looks like something like trie, but without pointers, which would undoubtedly be expensive if every character in the tree has a 4-byte pointer associated with it.
Remember that it was from my CP / M days, where the memory was much less than now.
paxdiablo
source share