Firstly, if I understand your point in the “How Can I Search Effectively” section, you cannot just skip a few megabytes of data in the search if the target line can be in these first few megabytes.
As for loading a file into memory, if you do, be sure to make sure that you have enough memory space for the entire file. You will be disappointed if you start using your utility and find that the 2GB file you want to use cannot fit in the 1.5 GB of memory that you left.
I'm going to suggest that you load into a memory or memory card for the following.
You specifically said that this is a binary file, so this means that you cannot use normal search or C ++ string matching, because empty characters in data files will confuse it (finish prematurely without a match). Instead, you can use memchr to find the first occurrence of the first byte in your target, and memcmp to compare the next few bytes with the bytes in the target; continue to use memchr / memcmp pairs to scan the entire object until it is detected. This is not the most efficient way, since there are better algorithms for matching patterns, but I believe that this is an effective way.
To “delete” n bytes, you must actually move the data after these n bytes by copying the whole thing to a new location.
If you really copy data from disk to memory, then it would be faster to manipulate it there and write to a new file. Otherwise, as soon as you find the disk space from which you want to start the deletion, you can open a new file for writing, read in X bytes from the first file, where X is the position of the file pointer in the first file and write them directly to the second file , then find the first file in X + n and do the same from there to file1 eof, adding this to what you already put into file2.
Loduwijk
source share