Before attempting any optimization, it is important to profile the code. It is time to look at the generated code and optimize only when you understand what is happening.
And as already indicated, the best optimization is not to do something, but to make a higher level change that eliminates the need.
But...
Most of the changes you might want to make here trivially are likely to be what the compiler has already done (the shift is the same as multiplying by the compiler). Some may actually prevent the compiler from doing optimization (changing add to or limit the compiler - there are more ways to add numbers, and only you know that in this case the result will be the same).
Pointer arithmetic may be better, but the compiler is not stupid - it should already create decent code for dereferencing the array, so you need to check that you have not actually done anything worse by introducing an additional variable.
In this case, the number of cycles is well defined and limited, so rolling out probably makes sense.
In addition, it depends on how dependently you want the result to be in your target architecture. If you want portability, it's hard (er) to optimize.
For example, the following here gives the best code:
unsigned int x0 = *(unsigned int *)array; unsigned int x1 = *(unsigned int *)(array+4); int decimal = ((x0 * 0x8040201) >> 20) + ((x1 * 0x8040201) >> 24);
Perhaps I could also flip the 64-bit version, which did 8 bits at a time, not 4.
But this is very definitely not portable code. I could use this locally if I knew what I was working on, and I just wanted to quickly type in the numbers. But I probably wonβt put it in production code. Of course, without documenting what he did, and without an accompanying unit test, which checks that it really works.