Regular Expression Search Tree (glob)

Does anyone know how to adapt the search tree to handle limited regular expressions? The task, taking into account the file name, is to find all the nodes corresponding to this file name. Nodes may contain the usual globs file names (* and?). Obviously, since this is a search tree, speed makes sense.

EDIT: I have to add that the most important thing for speed is the average time to rule out a match. That is, in most cases, the match will not be performed.

Example: suppose the tree contains the following nodes:

foo, bar, foo *, * bar, foo? bar

A foo search will return nodes 1 and 3. A bar search will return nodes 2 and 4. A phob search will not return a single node. A fooxbar search will return node 5. A foobar search will return nodes 3 and 4.

+4
source share
1 answer

The aho-corasick search tree will match the score. Aho-Corasick is a very good article about this kind of Tries and the implementation used in Evolution to replace Etrie regex search

Edit: to perform a complete line match, you can add start and end binding states, if you scan multi-line data, you can add a new line to start and end. You can also remove the part where it adds a cross-reference for partial matching, starting another match, which also allows you to speed up the exception.

Another algorithm for checking whether a rowset belongs is CritBit . It does not have regex, but it is simple and complete string testing.

+9
source

All Articles