If the text you are processing consists of repeating lines and tokens, split the file into pieces, and for each fragment you can have one stream pre-analyzing it in tokens consisting of keywords, "punctuation", identifier lines, and value. Comparing strings and searches can be quite expensive and passing it to multiple workflows can speed up the purely logical / semantic part of the code if it doesn't need to search for strings and compare.
The previously analyzed pieces of data (where you have already performed all string comparisons and tokenized) can then be transferred to the part of the code that will actually look at the semantics and ordering of token data.
In addition, you mentioned that you are occupying a large file size of your file. There are a few things you could do to reduce your memory budget.
Divide the file into pieces and analyze it. Read only as many pieces as you work at the same time, as well as a few for “reading ahead” so that you do not stop on the disk when you finish processing the fragment before moving on to the next fragment.
Alternatively, large files can be displayed in memory and downloaded as “require.” If you have more threads working on file processing than processors (usually threads = 1.5-2X CPU is a good number for demand paging applications), threads that stop at IO for a memory-mapped file automatically stop from the OS until as long as the memory is ready, and other threads will continue to be processed.
Adisak
source share