You can use the Ukkonen algorithm to build a tree of suffixes in linear time:
https://en.wikipedia.org/wiki/Ukkonen%27s_algorithm
The number of substrings s is the number of string prefixes in trie that you can calculate simply in linear time. This is just the total number of characters in all nodes.
For example, your example creates a suffix tree, for example:
/\ ba | b bb
5 characters in the tree, so 5 substrings. Each unique line is a path from the root ending after another letter: abb, ab, a, bb, b. Thus, the number of lines is the number of letters in the tree.
More precisely:
- Each substring is a prefix of some string suffix;
- All suffixes are in trie;
- Thus, there is a 1-1 correspondence between substrings and paths through trie (by definition, trie); and
- There is a 1-1 correspondence between letters in a tree and non-empty paths, because:
- each separate non-empty path ends in a separate position after its last letter; and
- the path to the position after each letter is unique
NOTE for people who are wondering how to construct a tree containing the characters O (N ^ 2) in O (N) time:
Here's a trick for introducing a suffix tree. Instead of storing the actual strings in the nodes of the tree, you just store the pointers in the orignal string, so the node that contains "abb" does not have "abb", it has (0,3) - 2 integers per node, regardless how many lines there are in each node, and in the suffix tree there are O (N) nodes.
source share