Invalid data rate

As far as I know, the processor works best with a binding that aligns at the border equal to the size of this database. For example, if each int value is 4 bytes in size, then the address of each int must be a multiple of 4 to make the CPU happy; same as 2-byte short data and 8-byte double data. For this reason, the new operator and malloc always return an address that is a multiple of 8, and therefore it is a multiple of 4 and 2.

In my program, some time-critical algorithms designed to process large byte arrays allow you to perform calculations by converting each adjacent 4 bytes to an unsigned int and, thus, make arithmetic much faster. However, the address of the byte array is not guaranteed to be a multiple of 4, because it may be necessary to process only part of the byte array.

As far as I know, Intel processors are not working correctly with data, but at the expense of speed. If working with inconsistent data is slower, the algorithms in my program will need to be redesigned. In this regard, I have two questions, the first of which is supported by the following code:

 // the address of array0 is a multiple of 4: unsigned char* array0 = new unsigned char[4]; array0[0] = 0x00; array0[1] = 0x11; array0[2] = 0x22; array0[3] = 0x33; // the address of array1 is a multiple of 4 too: unsigned char* array1 = new unsigned char[5]; array1[0] = 0x00; array1[1] = 0x00; array1[2] = 0x11; array1[3] = 0x22; array1[4] = 0x33; // OP1: the address of the 1st operand is a multiple of 4, // which is optimal for an unsigned int: unsigned int anUInt0 = *((unsigned int*)array0) + 1234; // OP2: the address of the 1st operand is not a multiple of 4: unsigned int anUInt1 = *((unsigned int*)(array1 + 1)) + 1234; 

So the questions are:

  • How much slower is OP2 compared to OP1 on x86, x86-64 and Itanium processors (if you neglect the cost of tick and address increment)?

  • When writing cross-platform portable code, which processors should I worry about inconsistent data access? (I already know about RISC)

+7
source share
1 answer

There are too many processors on the market to be able to give a general answer. The only thing that can be asserted with certainty is that some processors generally cannot perform unattached access; it may or may not matter to you if your program is designed to work in a homogeneous environment, for example. Windows

In a modern high-speed processor, the speed of unrelated accesses may be more influenced by its cache alignment than its alignment by address. On today's x86 processors, the cache line size is 64 bytes.

There is a Wikipedia article that can give some general recommendations: http://en.wikipedia.org/wiki/Data_structure_alignment

+3
source

All Articles