Regardless, it seems that every byte should be
- reading from memory
- changed in some way, and
- written back to memory.
You can save a bit (without a pun) of time by working with several bytes at a time , for example, performing an XOR operation on 4 or even 8 bytes of integers, therefore, dividing the overhead associated with loop control by about 4 or 8 times, but this improvement will probably not be a significant gain for the overall algorithm.
Additional improvements can be found by replacing the βnativeβ bit operations (XOR, Shifts, Rotations, etc.) of the CPU / Language by reading the previously calculated values ββin the table. Remember, however, that these native operations are usually quite optimized and that you should be very careful when developing equivalent operations from the outside and accurately measure the relative performance of these operations.
Edit:. I just noticed the [Python] tag, as well as a link to numpy in another answer.
Beware ... while the Numpy byte array offer is believable, it all depends on the actual parameters of the problem. For example, enough time may be lost to align the underlying arrays implied by the numpy bit function. See this stack overflow question , which seems quite relevant. Although this question focuses on the XOR operation, this question provides quite a few useful hints both for improving loops and for profiling in general.
source share