I want to find a specific sequence of bytes in a binary using PHP. I represented this sequence in hexadecimal not to type too many 0 and 1. Search sequence 0x4749524f . This is the working solution that I came up with now:
$mysequence = "4749524f"; $f = fopen($filename, "r") or die("Unable to open file!"); while(!feof($f)){ $seq = fread($f, 4); if(bin2hex($seq) == $mysequence){ echo "found!"; break; } else if(!feof($f)) fseek($f, -3, SEEK_CUR); }
What makes the algorithm simple:
- Read 4 bytes
- Check if they match the sequence
- If they are equal β found! Stop execution.
- If they are not equal, and I'm not at the end of the file, return 3 bytes to the file and repeat step 1.
Why am I returning by 3 bytes? Because if this is the contents of the file:
0000 4749 524f 0000 01b0 0013
If I do not return 3 bytes, I will read 0000 4749 in the first iteration, 524f 0000 in the second, 01b0 0013 in the third and, as you can see, I skipped the sequence.
Problem: it is slow as hell ... The application will have to work with files up to 50 MB in size, so this sequence will take forever.
Is there an optimized function in PHP that would do the job? Is there a faster (not as stupid as mine) way to do this?
source share