Rule Continuous Compliance Template

I have a continuous stream of messages that are being analyzed. Analysis returns various variables, such as author, subject, mood, number of words and a set of different words. Users on the system can define rules that should trigger a warning if they match. Rules must be stored in the sql database. A rule is a collection of single criteria from message analysis, i.e. word-count > 15 && topic = 'StackOverflow' && sentiment > 2.0 && word-set contains 'great' . Each allowed rule criterion is provided at the end of message analysis, after which validation will be activated and implemented in Java.

Each message should be checked for all the rules defined by all users in the system, which occupy a lot of computing power (currently there are 10+ messages per second, and there will be 10,000+ rules to check). Is there a common template to speed up the matching process, perhaps so that the rules can be checked in parallel, with the exception of one after the other? Is it possible to do this in pure SQL, what would the schema look like for rules of different types?

+6
source share
2 answers

Your considerations are likely to be more than matching bandwidth. For example, you need to support the rules.

But suppose that a static set of rules and messages contains all the fields necessary to execute all the rules. Using SQL, the structure begins with the message table. This table has an insert trigger. The insert trigger will be responsible for compliance. What is the best way to do this?

With 10 messages per second, your processing will be essentially parallel, even if each match is single-threaded. I'm not sure how much effort you need to parallelize a match. Parallelism in databases is usually included in SQL expressions, and not between them.

There are all kinds of solutions. For example, you can encode rules as code in a gigantic stored procedure. It will be a nightmare to maintain, may exceed the length limits of stored procedures, and can be painfully slow.

Another crazy idea. Save the appropriate messages for the rule in the table for this rule, and you have a restriction only on loading those that correspond. Then your process looks like an expression about zillion insertion.

More seriously, you will go further with code, for example:

 select * from rules where . . . 

In the result set there would be appropriate rules. The where clause might look something like this:

 select * from rules r where @wordcount > coalesce(r.wordcount, 0) and @topic = coalesce(r.topic, @topic) and . . . 

Thus, any possible comparison for all rules will be in the where clause. And the rules will be pre-processed to determine which articles they need.

You can even refuse external variables and directly address the request:

 select * from rules r cross join inserted i where i.wordcount > coalesce(r.wordcount, 0) and i.topic = coalesce(r.topic, @topic) and . . . 

So yes, this is possible in SQL. And you can do the matching in parallel. You just need to do the work so that your rules are in a format suitable for comparing databases.

+2
source

I solved a similar problem in C #, although I did not use SQL.

I saved the rules as serialized XML in a database for portability.

When starting the application or changing the rules table (forcing the rules cache), I loaded all the rules from the database and deserialized them into the corresponding classes.

Then, when the data came to each application server, I followed the rules against incoming data and performed the corresponding action to pass the rules. (At that time, I was doing the action in proc on the application server, but now I would drop it in the queue.)

This has the advantage of expanding the calculations in your application cluster and not keeping all these loops on the database machine.

+1
source

All Articles