Is there a standard macro for discovering architectures that require shared memory access?

Assuming something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; for(i=0; i<len; i++) { dest[i] = src[i] & mask[i]; } } 

I can go faster on a non-aligned access machine (e.g. x86) by writing something like:

 void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; unsigned int wordlen = len >> 2; for(i=0; i<wordlen; i++) { ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; // this raises SIGBUS on SPARC and other archs that require aligned access. } for(i=wordlen<<2; i<len; i++){ dest[i] = src[i] & mask[i]; } } 

However, it should be based on several architectures, so I would like to do something like:

 void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; unsigned int wordlen = len >> 2; #if defined(__ALIGNED2__) || defined(__ALIGNED4__) || defined(__ALIGNED8__) // go slow for(i=0; i<len; i++) { dest[i] = src[i] & mask[i]; } #else // go fast for(i=0; i<wordlen; i++) { // the following line will raise SIGBUS on SPARC and other archs that require aligned access. ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; } for(i=wordlen<<2; i<len; i++){ dest[i] = src[i] & mask[i]; } #endif } 

But I cannot find good information about macros defined for the compiler (for example, my hypothetical __ALIGNED4__ above) that indicate alignment or any clever ways to use the preprocessor to determine the alignment of the target architecture. I could just test defined (__SVR4) && defined (__sun) , but I would prefer something that will be Just WorkTM on other architectures that require consistent memory access.

+7
source share
3 answers

While x86 captures unmanaged calls without problems, this is hardly optimal for performance. It is usually best to assume a certain alignment and perform corrections yourself:

 unsigned int const alignment = 8; /* or 16, or sizeof(long) */ void memcpy(char *dst, char const *src, unsigned int size) { if((((intptr_t)dst) % alignment) != (((intptr_t)src) % alignment)) { /* no common alignment, copy as bytes or shift around */ } else { if(((intptr_t)dst) % alignment) { /* copy bytes at the beginning */ } /* copy words in the middle */ if(((intptr_t)dst + size) % alignment) { /* copy bytes at the end */ } } } 

Also read the SIMD instructions.

+5
source

A standard approach would be to have a configure script that runs a program to check for alignment problems. If the test program does not crash, the configure script defines the macro in the generated configuration header, which speeds up the implementation. A safer default implementation.

 void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len) { unsigned int i; unsigned int wordlen = len >> 2; #if defined(UNALIGNED) // go fast for(i=0; i<wordlen; i++) { // the following line will raise SIGBUS on SPARC and other archs that require aligned access. ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; } for(i=wordlen<<2; i<len; i++){ dest[i] = src[i] & mask[i]; } #else // go slow for(i=0; i<len; i++) { dest[i] = src[i] & mask[i]; } #endif } 
+2
source

(It seems strange to you that you have src and mask when they really commute. I renamed mask_bytes to memand . But anyway ...)

Other parameters are the use of different functions that use type types in C. For example:

 void memand_bytes(char *dest, char *src1, char *src2, size_t len) { unsigned int i; for (i = 0; i < len; i++) dest[i] = src1[i] & src2[i]; } void memand_ints(int *dest, int *src1, int *src2, size_t len) { unsigned int i; for (i = 0; i < len; i++) dest[i] = src1[i] & src2[i]; } 

This method allows the programmer to decide.

+1
source

All Articles