Well, a lot depends on your system and data. There are so many assumptions that we can make. Which processor are you using? Should there be a direct C code? How wide are the processor registers? What is the structure of the processor cache? etc etc.
It also depends on how different your data is. If it is unlikely that the first byte from each buffer is the same, then the speed of the function is quite meaningless, since logically it will not reach the rest of the function. If, probably, the first n-1 bytes are usually sme, then this becomes more important.
All that you are unlikely to see a lot of changes, regardless of how you conduct the test.
In any case, this is a small implementation of my own, it may or may not be faster than your own (or, if I just did it when I went, it may or may not work;))
int memoryCompare(const void* lhs, const void* rhs, size_t n) { uint_64 diff = 0 // Test the first few bytes until we are 32-bit aligned. while( (n & 0x3) != 0 && diff != 0 ) { diff = (uint_8*)lhs - (uint_8*)rhs; n--; ((uint_8*)lhs)++; ((uint_8*)rhs)++; } // Test the next set of 32-bit integers using comparisons with // aligned data. while( n > sizeof( uint_32 ) && diff != 0 ) { diff = (uint_32*)lhs - (uint_32*)rhs; n -= sizeof( uint_32 ); ((uint_32*)lhs)++; ((uint_32*)rhs)++; } // now do final bytes. while( n > 0 && diff != 0 ) { diff = (uint_8*)lhs - (uint_8*)rhs; n--; ((uint_8*)lhs)++; ((uint_8*)rhs)++; } return (int)*diff / abs( diff )); }
Goz
source share