If we have N words, each with a maximum length of L , your algorithm will take O(N*L^3) (suppose the addition to trie is linear with the length of the adding word). However, the size of the resulting trie (the number of nodes) does not exceed O(N*L^2) , so it looks like you are wasting time and you can do better.
Indeed, you can, but you need to pull a few tricks from your sleeve. In addition, you will no longer need trie.
.substring() in constant time
In Java 7, each String had a backup char[] array, as well as an initial position and length. This allowed us to use the .substring() method in constant time since String is an immutable class. A new String object with the same char[] background array was created only with a different starting position and length.
You will need to slightly increase this number in order to support the addition at the end of the line, increasing the length. Always create a new string object, but leave an array of support.
- Reassign the hash in constant time after adding a single character
Again, let me use the Java hashCode() function for String :
int hash = 0; for (int i = 0; i < data.length; i++) { hash = 31 * hash + data[i]; }
Now, how will the hash change after adding one character at the end of a word? Just add its value (ASCII code) multiplied by 31^length . You can save credentials 31 in a separate table, other prime numbers can also be used.
- Save all substrings in one
HashMap
Using tricks 1 and 2, you can generate all the substrings in time O(N*L^2) , which is the total number of substrings. Always start with a line of length one and add one character at a time. Put all your lines in one HashMap to reduce duplicity.
(You can skip 2 and 3 and discard duplicates when sorting / after sorting, maybe it will be even faster.)
- Sort your substrings and you're good to go.
Well, when I got to point 4, I realized that my plan would not work, because when sorting you need to compare strings, and this can take O(L) . I came up with several attempts to solve this problem, including sorting the bucket, but no one will be faster than the original O(N*L^3)
I will just answer here if he inspires someone.
If you do not know the Aho-Corasic algorithm , pay attention to this, this may be useful for your problem.