Data structure for representing patterns in rows

I am looking for a good data structure for representing form strings:

Domain:Key1=Value1,Key2=Value2... 
  • Each "Domain" may contain the following characters of the template - * ? ( * - 0 or more characters ? - 0 or 1 character)

  • Each "key" may contain the following characters of the pattern - * ? ( * - 0 or more characters ? - 0 or 1 character)

  • Each "value" may contain the following characters of the pattern - * ? ( * - 0 or more characters ? - 0 or 1 character)

Examples:

 JBoss:* *:* JBoss:type=ThreadPool,* JBoss:type=Thread*,* JB*:name=http1,type=ConnectionPool 

If you are familiar with the JMX ObjectName, then this is essentially an ObjectName template.

I’m looking for ways to easily save a rule that matches each pattern, and be able to quickly delete, update and add new rules.

I started by using the Trie prefix, but stuck with the characters of the pattern * ? .

+7
source share
3 answers

I think the easiest way to do this is to create an NFA , such as trie, which allows you to transition into multiple states. This, of course, adds to the complexity of having a different data structure that maps to multiple states based on the character set to match. For example, in your example:

 JBoss:* *:* JBoss:type=ThreadPool,* JBoss:type=Thread*,* JB*:name=http1,type=ConnectionPool 

Suppose you are trying to match a JBoxx:name=*

When you map a JBo substring, you will need a data structure to store the JBo and JB* and * states, as you currently have three branches. When x arrives, you can drop the JBo route from the moment it was JBo and use JB* and * . An easy implementation is to have a set of possible matching states at any given time and try the next character for each one. You will also need a way to resolve multiple matches (as in this case) - maybe something as simple as the longest match?

Everything seems to make sense when you think of trie as an NFA, not a well-accepted DFA format. Hope this helps.

+1
source

I believe you want to use rope

0
source

You can take a look at this other thread: Efficient data structure for searching words using wildcards

Or this site: Wildcard queries

The second site ends with "We can handle wildcard requests containing one * character using two B-trees, a normal B-tree, and a reverse B-tree."

It may be above you, but it may be useful to read.

Good luck.

0
source

All Articles