Multiple Pattern Matching Algorithm

You have several date templates P1 - Pn.

Some of them are simple, like P1 - all Mondays, P2 - all Tuesdays; others are more complex, such as P4 - all working days, etc.

For a custom date array (V1, V2) I need to create the shortest result string, as shown in the figure:

Multi pattern matching

For any array, we must create a string that will represent the dates in the array. The easiest way is to create a line, for example 1.5.2013, 2.5.2013, 3.5.2013 ... But the result line will be very long.

Using several predefined patterns, we can create a shorter result string.

For the result string, I use the following rules:

Single Date Format: DD.MM.YYYY (10 characters)
Enumeration (dates and patterns): comma and space (2 characters)
Date range: DD.MM.YYYY-DD.MM.YYYY (21 characters)
Patch Name Interval: Px-Py (5 characters)
Special words: except (6 characters)

Examples of result lines:

  • V1 using the P4 pattern:

    P4 except 05/01/2013 03/03/2013, 09/09/2013, 05/10/2013, 05/16/2013, 05/17/2013 (80 characters)

  • V1 using the Pn pattern:

    Pn May 6, 2013-08 May 2013, May 13, 2013-15 May 2013, May 20, 2013-24 May 2013, May 27, 2013- May 31, 2013 (94 characters)

  • V1 using the best matches:

    P1-P3 05/01/2013-19.05.2013, P4 05/20/2013-31.05.2013 (54 characters)

The main goal is to create the shortest result string. As far as I understand, we can achieve this by finding the best matching patterns / templates.

I'm currently trying to adapt the knapsack problem and the longest common subsequence problem, but I'm not sure if this is the right direction.

I would be grateful for any ideas.


updated

Thanks to Jan Dvorak for his short description of my problem:

The goal is to describe V using a predefined dictionary (P1..Pn and all intervals and single dates) where intersection, union and subtraction are allowed, and each operation and atom has a predetermined value (number of characters in the result line).


+7
source share
2 answers

After a long search time, we finally found a solution that is very close to what we want.

http://www.fpz.unizg.hr/traffic/index.php/PROMTT/article/view/1287

Thank you all for participating.

+1
source

This is just a suggestion, but if you need a really short string representing arrays of dates, you can solve this problem in a completely different way, so it is very simple and efficient.

Let 1 represent the day "seleceted" and let 0 represent the day "unselected", then you can build a binary number that represents custom arrays of dates per month, for example, for case V1, you can generate this binary code number:

V1 = 0000011100001110000111110011111 

So, the first 0 means that the date 1.5.2013 is not selected, the next 0 means that the date 2.5.2013 is not selected, etc. If you separate this number into 8 bit groups (dividing the binary number in bytes), you can create this byte array:

 V1(starting in May 1, 2013) = 00000111 - 00001110 - 00011111 - 00111110 (4 bytes) 

Using this method, you represent V1 using just 4 bytes, this is the only information you need if you know that V1 starts from the date of 1.5.2013, so you also need to save the start date so that you can represent the month and year using only 3 bytes, so, for example, a date in May 2013 can be represented as follows:

May = 5th month, so 5 in binary format is 101

2013 in binary format - 11111011101 Thus, using 3 bytes, you can represent May 2013 as follows:

 0000101 00000111 11011101 [ 5 ] [ 2013 ] 

So you can imagine V1 like this

 V1= 0000101 - 00000111 - 11011101 00000111 - 00001110 - 00011111 - 00111110 [Month] [ Year ] [ V1 custom array of dates ] 

Thus, V1 can be fully represented using just 7 bytes!

If you need a string instead of an array of bytes, you can convert this byte array to a Base64 String, so V1 can be represented as a string

 V1 in Base64 is Cg+6Dhw+Pg== (using just 12 characters!!) 

In case of V2:

 V2 = 0000101 - 00000111 - 11011101 11111111 - 11111111 - 11111111 - 11101110 [Month] [ Year ] [ V2 custom array of dates ] V2 in Base64 is Cg+7////bg== (using just 12 characters again!!) 

With this method, you know that a monthly custom array of date information can be represented in 7 bytes (or 12 characters if you use the base 64 String).

To save information about the custom array for the year, you just need: 3 bytes for the starting month and year plus 365/8 = 45.625 (rounded to 46 bytes), that is 49 bytes! for the whole year that the database 64 has a maximum length of 69 characters !!!

It is easy to use, easy to maintain in code, better than a complex pattern matching algorithm, this smell is a good solution for me. I hope this recommendation meets your requirements.

0
source

All Articles