- Move the file in chunks instead of line by line, where chunks are created by occurrences of a frequently occurring character or pattern, such as "X".
- An "X" is such that it never exists in your regular expression, that is, an "X" is where your regular expression will never match a string.
- Match your regex in the current snippet, extract matches and move on to the next snippet.
Example:
This is string with multline numbers -2000 2223434 34356666 444564646 . These numbers can occur at 34345 567567 places, and on 67 87878 pages . The problem is to find a good way to extract these more than 100 0 regexes without memory hogging.
In this text, suppose that the desired pattern is a numeric string, for example, /d+/s match multiline digits, then instead of processing and loading the whole file, you can choose a template for creating a piece, say, FULL STOP in this case . and only read and process up to this template, and then move on to the next fragment.
CHUNK # 1:
This is string with multline numbers -2000 2223434 34356666 444564646 .
CHUNK # 2:
These numbers can occur at 34345 567567 places, and on 67 87878 pages
etc.
EDIT: Adding @Ranty's suggestion from the comments:
Or just read a few lines, say 20. When you find a match inside, clear the rest of the match and add another 20 lines. There is no need for the frequent "X".
source share