How to remove parts from a binary file in C ++

I would like to remove parts from a binary using C ++. The binary file is about 5-10 MB.

What I would like to do:

  • ANSI string search for something
  • Once I found this line, I would like to delete the next n bytes, for example the next 1 MB of data. I would like to remove these characters and not fill them with NULL, thus making the file smaller.
  • I would like to save the modified file to a new binary, which is the same as the original file, except for the missing n bytes that I deleted.

Can you give me some tips / best practices on how to do this most effectively? Should I load the file into memory first?

How can I efficiently search for ANSI string? I mean, maybe I need to skip a few megabytes of data before I find this line. → I was told that I should ask about this in another question, so its here: How to find the ANSI line in a binary?

How can I delete n bytes and efficiently write them to a new file?

OK, I do not need it to be super-efficient, the file will not be more than 10 MB and its OK if it works for several seconds.

+1
c ++ replace search binaryfiles ifstream
source share
3 answers

There are several quick string search methods that work much better than testing each character. For example, when trying to find "something" you need to test only every ninth character.

Here is an example that I wrote for an earlier question: code review: find </body> reverse search tag on a non-zero completed char page

+1
source share

For a 5-10 MB file, I would look at writev () if your system supports it. Read the entire file in memory, as it is small enough. Scan the bytes you want to delete. Pass writev () a list of iovecs (which will just point to your buffer and read lengths), and then you can overwrite all the modified contents with one system call.

0
source share

Firstly, if I understand your point in the “How Can I Search Effectively” section, you cannot just skip a few megabytes of data in the search if the target line can be in these first few megabytes.

As for loading a file into memory, if you do, be sure to make sure that you have enough memory space for the entire file. You will be disappointed if you start using your utility and find that the 2GB file you want to use cannot fit in the 1.5 GB of memory that you left.

I'm going to suggest that you load into a memory or memory card for the following.

You specifically said that this is a binary file, so this means that you cannot use normal search or C ++ string matching, because empty characters in data files will confuse it (finish prematurely without a match). Instead, you can use memchr to find the first occurrence of the first byte in your target, and memcmp to compare the next few bytes with the bytes in the target; continue to use memchr / memcmp pairs to scan the entire object until it is detected. This is not the most efficient way, since there are better algorithms for matching patterns, but I believe that this is an effective way.

To “delete” n bytes, you must actually move the data after these n bytes by copying the whole thing to a new location.

If you really copy data from disk to memory, then it would be faster to manipulate it there and write to a new file. Otherwise, as soon as you find the disk space from which you want to start the deletion, you can open a new file for writing, read in X bytes from the first file, where X is the position of the file pointer in the first file and write them directly to the second file , then find the first file in X + n and do the same from there to file1 eof, adding this to what you already put into file2.

0
source share

All Articles