Finding a template in a large binary using C or C ++?

I have a ~ 700 MB binary (non-textual data); what I would like to do is search for a specific byte pattern that occurs in random locations throughout the file. for example 0x? 0x? 0x55 0x? 0x? 0x55 0x? 0x? 0x55 0x? 0x? 0x55, etc. for about 50 bytes. The pattern I would look for would be a sequence of two random bytes with 0x55 occurring every two bytes.

That is, searching for tables stored in a file with 0x55 is a delimiter, and then save the data contained in the tables, or otherwise manipulate it.

Would it be best to just go through each individual byte at a time, and then look forward two bytes to see if this is a 0x55 value, and if so, look forward again and again to confirm that there is a table in that place?

Download it all? FSEEK? Buffer chunks, search one byte at a time?

What would be the best way to view this large file and find a template using C or C ++?

+5
source share
3 answers

What ultimately worked for me was a hybrid between the Boyer-Moore-Horspool algorithm (proposed by Jerry Coffin) and my own algorithm based on table structure and stored data.

, BMH , . .

, , , 0x55, , .

, PHP, ++, MySQL . 5 , . , , , ( ) .

+1

. , , , , , . ++ Boost.Regex, , , .

+3

? FSEEK? , ?

, , , , . , , ( ), .

, , .

0
source

All Articles