Assuming something like:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; for(i=0; i<len; i++) { dest[i] = src[i] & mask[i]; } }
I can go faster on a non-aligned access machine (e.g. x86) by writing something like:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; unsigned int wordlen = len >> 2; for(i=0; i<wordlen; i++) { ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i];
However, it should be based on several architectures, so I would like to do something like:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; unsigned int wordlen = len >> 2; #if defined(__ALIGNED2__) || defined(__ALIGNED4__) || defined(__ALIGNED8__)
But I cannot find good information about macros defined for the compiler (for example, my hypothetical __ALIGNED4__
above) that indicate alignment or any clever ways to use the preprocessor to determine the alignment of the target architecture. I could just test defined (__SVR4) && defined (__sun)
, but I would prefer something that will be Just WorkTM on other architectures that require consistent memory access.
nolandda
source share