Efficient way to calculate three-dimensional indices from a 1D array view

I have 3D data that is stored in a 1D array. I compute 1D indices as follows:

index = i + j * WIDTH + k * WIDTH * HEIGHT 

Than I need to return the original indices i,j,k from index . The obvious way to do this is something like this:

 k = index / (WIDTH * HEIGHT) j = (index % (WIDTH * HEIGHT)) / WIDTH i = index - j * WIDTH - k * WIDTH * HEIGHT 

But I wonder if there is an even more efficient way to do this? At least without a module ...

The context of this question. I have a core in CUDA where I access the data and calculate the indices i, j, k ( index corresponds to a unique identifier for the stream). So maybe there is some specific way to CUDA? I think this is a fairly common problem, but I could not find a better way to do this ...

Thanks for your ideas!

+4
source share
3 answers

You are all right; if you want to avoid modulation (since it is very expensive on gpus), you can just do with j what you did with i :

 j = (index - (k*WIDTH*HEIGHT))/WIDTH 

If you want the logic to be a little clearer and not need the original index , you can do

 k = index/(WIDTH*HEIGHT); index -= k*WIDTH*HEIGHT; j = index/WIDTH; index -= j*WIDTH; i = index/1; 

which then quite easily extends to arbitrary sizes. You can try changing the settings above by performing tasks such as precomputing WIDTH*HEIGHT , say, but I would just turn on the optimization and trust the compiler to do this for you.

The proposals for rounding to degree 2 are correct in the sense that this will speed up the calculation of the index, but at a rather significant price. In this (not so bad) case WIDTH=HEIGHT=100 this will increase the memory requirements of your 3D array by 60% ( WIDTH=HEIGHT=128 ), and the memory on the GPU is usually already dense; and creating arrays twice can lead to problems with bank conflicts, depending on your access patterns.

+6
source

Try to round your measurements to the next power of two. Then you can use bit shifts and masks instead of multiplications, divisions and modulo.

 index = i | (j | k << HEIGHT_BITS) << WIDTH_BITS; k = index >> (WIDTH_BITS + HEIGHT_BITS); j = (index >> WIDTH_BITS) & ((1 << HEIGHT_BITS) - 1); i = index & ((1 << WIDTH_BITS) - 1); 
+5
source

Only for cases where dimensions are valid 2. Use a bitmask. For example, if the maximum value of the 1st index is 4, it should take the 1st 2nd bit index.

+1
source

All Articles