I am afraid that line breaking will not be executed.
Generally speaking, early acceleration is difficult, so you'd better break the text in pieces.
But let me ask Herb Sutter to explain the search using parallel algorithms first on Dr Dobbs . The idea is to use uneven distribution for early return. Of course, Sutter is interested in any match, which is not a problem, so let's adapt.
Here is my idea, let's say we have:
- Text length
N p processors- heuristic:
max - the maximum number of characters a fragment should contain, probably an order of magnitude greater than M the pattern length.
Now you need to divide the text into k equal pieces, where k minimal and size(chunk) maximum, but inferior to max .
Then we have a classic Producer-Consumer pattern: processes p are served with pieces of text, each process looks for a pattern in the resulting fragment.
An early escape is carried out using the flag. You can either set the index of the piece in which you found the template (and its position), or simply set the logical value and save the result in the processes themselves (in this case, you have to go through all the processes when they stop). The fact is that every time a piece is requested, the manufacturer checks the flag and stops submitting processes if a match is found (because the processes were provided with pieces in order).
Here is an example with three processors:
[ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ] xx
Pieces 6 and 8 contain lines.
The producer will first load 1, 2, and 3 into the processes, then each process will advance in its own rhythm (this depends on the similarity of the found text and template).
Say we find a pattern at 8 before we find it at 6 . Then the process that worked on 7 ends and tries to get another piece, the producer stops it β that would be inappropriate. Then the process running at 6 ends with the result and, therefore, we know that the first occurrence was at 6 , and we have its position.
The basic idea is that you do not want to look at the whole text! This is wasteful!