Algorithm for matching parallel rows of the first order

Question

Algorithm for matching parallel rows of the first order

To be in front is homework. However, it is extremely open, and we have almost no guidance on how to even start thinking about this problem (or about parallel algorithms in general). I would like pointers in the right direction, not a complete solution. Any reading that might help would be excellent too.

I am working on an effective way to match the first occurrence of a template in a lot of text using a parallel algorithm. A pattern is a simple coincidence of characters, with no regular expression involved. I managed to find a possible way to find all matches, but this requires that I look through all the matches and find the first one.

So the question is, will I have more success breaking text between processes and scanning in this way? Or would it be better if some process synchronized processes, where the jth process is looking for the jth character of the template? If then all processes return true to match, the processes will change their position in accordance with the specified pattern and move up again, continuing until all characters are matched, and then return the index of the first match.

That I am still extremely basic, and most likely not working. I will not implement this, but any pointers will be appreciated.

With p-processors, text of length t and a pattern of length L and the ceiling of L-processors are used:

  for i = 0 to tl:
     for j = 0 to p:
         processor j compares the text [i + j] to pattern [i + j]
             On false match:
                 all processors terminate current comparison, i ++
             On true match by all processors:
                 Iterate p characters at a time until L characters have been compared
                 If all L comparisons return true:
                     return i (position of pattern)
                 Else:
                     i ++

+7

string-matching language-agnostic algorithm parallel-processing

Xorlev Feb 22 '10 at 22:00

source share

2 answers

Given a pattern of length L and a search in a chain with a length over N processors, I would simply split the string into processors. Each processor occupies a piece of length N / P + L-1, with the last L-1 overlapping the line belonging to the next processor. Then, each processor will execute a bitter-boron (two pre-processing tables will be separated). When each ends, they return the result to the first processor that maintains the table

 Process Index 1 -1 2 2 3 23

After all processes have reacted (or with a little thought you can have an early exit), you return the first match. This should be an average of O (N / (L * P) + P).

The approach to the fact that the i-th processor corresponding to the i-th symbol will require too much overhead for the interaction between the processes.

EDIT: I understand that you already have a solution, and come up with a way without finding all the solutions. Well, I really don't think this approach is necessary. You can come up with some early exit conditions, they are not so difficult, but I do not think that they will improve your overall performance (unless you have additional knowledge about the distribution of matches in the text).

+3

Il-bhima Feb 22 '10 at 22:18

source share

Matthieu M. · Accepted Answer · 2010-02-26T13:25:32+0000

I am afraid that line breaking will not be executed.

Generally speaking, early acceleration is difficult, so you'd better break the text in pieces.

But let me ask Herb Sutter to explain the search using parallel algorithms first on Dr Dobbs . The idea is to use uneven distribution for early return. Of course, Sutter is interested in any match, which is not a problem, so let's adapt.

Here is my idea, let's say we have:

Text length N
p processors
heuristic: max - the maximum number of characters a fragment should contain, probably an order of magnitude greater than M the pattern length.

Now you need to divide the text into k equal pieces, where k minimal and size(chunk) maximum, but inferior to max .

Then we have a classic Producer-Consumer pattern: processes p are served with pieces of text, each process looks for a pattern in the resulting fragment.

An early escape is carried out using the flag. You can either set the index of the piece in which you found the template (and its position), or simply set the logical value and save the result in the processes themselves (in this case, you have to go through all the processes when they stop). The fact is that every time a piece is requested, the manufacturer checks the flag and stops submitting processes if a match is found (because the processes were provided with pieces in order).

Here is an example with three processors:

 [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ] xx

Pieces 6 and 8 contain lines.

The producer will first load 1, 2, and 3 into the processes, then each process will advance in its own rhythm (this depends on the similarity of the found text and template).

Say we find a pattern at 8 before we find it at 6 . Then the process that worked on 7 ends and tries to get another piece, the producer stops it → that would be inappropriate. Then the process running at 6 ends with the result and, therefore, we know that the first occurrence was at 6 , and we have its position.

The basic idea is that you do not want to look at the whole text! This is wasteful!

Algorithm for matching parallel rows of the first order

More articles: