Calculation of neighboring indices for a diagonally flattened matrix

I have a two-dimensional matrix stored in a flat buffer along the diagonals. For example, a 4x4 matrix will have its indices scattered like this:

0 2 5 9 1 4 8 12 3 7 11 14 6 10 13 15 

Using this view, the most efficient way to calculate the index of a neighboring element, taking into account the original index and the X / Y offset? For instance:

 // return the index of a neighbor given an offset int getNGonalNeighbor(const size_t index, const int x_offset, const int y_offset){ //... } // for the array above: getNGonalNeighbor(15,-1,-1); // should return 11 getNGonalNeighbor(15, 0,-1); // should return 14 getNGonalNeighbor(15,-1, 0); // should return 13 getNGonalNeighbor(11,-2,-1); // should return 1 

We assume that overflow never occurs and there is no wrapping.

I have a solution involving a lot of triangular number and triangular root calculations. It also contains many branches, which I would prefer to replace with algebra, if possible (this will work on GPUs where the divergent control flow is expensive). My solution works, but very long lasting. I feel that there should be a much simpler and less computational intensive way to do this.

Perhaps this would help me if someone could put the name in this particular problem / view.

I can publish my complete solution if someone is interested, but, as I said, it is very long and relatively complicated for such a simple task. In short, my solution is:

  • translate the original index into a large triangular matrix to avoid access to two triangles (for example, 13 becomes 17)

For a 4x4 matrix, this will be:

 0 2 5 9 14 20 27 1 4 8 13 19 26 3 7 12 18 25 6 11 17 24 10 16 23 15 22 21 
  • compute the neighbor diagonal index in this representation using the Manhattan offset distance and the triangular root of the index.
  • calculate the position of a neighbor in this diagonal using the offset
  • translate back to the original view by deleting the fill.

For some reason, this is the easiest solution I could come up with.

Edit:

with a cycle to accumulate bias:

I understand that, given the properties of triangle numbers, it would be easier to divide the matrix into two triangles (call 0 to 9 'upper triangle' and 10-15 'lower triangle') and get a loop with a test inside to accumulate the offset, adding one in the upper triangle and subtracting one in the bottom (if that makes sense). But for my cycles, decisions should be avoided at all costs, especially for cycles with an unbalanced number of shutdowns (again, very bad for GPUs).

So, I'm more looking for an algebraic solution, rather than an algorithmic one .

Building a lookup table:

Again, due to the use of the GPU, it is preferable to avoid creating a lookup table and have random access to it (very expensive). An algebraic solution is preferred.

Matrix Properties:

  • The size of the matrix is โ€‹โ€‹known.
  • So far I am considering only a square matrix, but a solution for rectangular ones would also be nice.
  • since the function name in my example assumes that extending the solution for N-dimensional volumes (hence, smoothing the N-gon) will also be a big plus.
+4
source share
2 answers

I already had elements that could solve it somewhere else in my code. Since the BLUEPIXY solution hinted, I use the pivot / gather operations that I already implemented to transform the layout.

This solution basically restores the original index (x,y) given element in the matrix, applies the index offset, and converts the result back to the transformed layout. It breaks the square into 2 triangles and adjusts the calculation depending on which triangle it belongs to.

This is almost a complete algebraic transformation: it does not use a loop and does not search for tables, has a small amount of memory and a small branching. Perhaps the code can be optimized.

Here is a draft code:

 #include <stdio.h> #include <math.h> // size of the matrix #define SIZE 4 // triangle number of X #define TRIG(X) (((X) * ((X) + 1)) >> 1) // triangle root of X #define TRIROOT(X) ((int)(sqrt(8*(X)+1)-1)>>1); // return the index of a neighbor given an offset int getNGonalNeighbor(const size_t index, const int x_offset, const int y_offset){ // compute largest upper triangle index const size_t upper_triangle = TRIG(SIZE); // position of the actual element of index unsigned int x = 0,y = 0; // adjust the index depending of upper/lower triangle. const size_t adjusted_index = index < upper_triangle ? index : SIZE * SIZE - index - 1; // compute triangular root const size_t triroot = TRIROOT(adjusted_index); const size_t trig = TRIG(triroot); const size_t offset = adjusted_index - trig; // upper triangle if(index < upper_triangle){ x = offset; y = triroot-offset; } // lower triangle else { x = SIZE - offset - 1; y = SIZE - (trig + triroot + 1 - adjusted_index); } // adjust the offset x += x_offset; y += y_offset; // manhattan distance const size_t man_dist = x+y; // calculate index using triangular number return TRIG(man_dist) + (man_dist >= SIZE ? x - (man_dist - SIZE + 1) : x) - (man_dist > SIZE ? 2* TRIG(man_dist - SIZE) : 0); } int main(){ printf("%d\n", getNGonalNeighbor(15,-1,-1)); // should return 11 printf("%d\n", getNGonalNeighbor(15, 0,-1)); // should return 14 printf("%d\n", getNGonalNeighbor(15,-1, 0)); // should return 13 printf("%d\n", getNGonalNeighbor(11,-2,-1)); // should return 1 } 

And the conclusion is really:

 11 14 13 1 

If you think this solution looks complicated and inefficient, I remind you that the goal here is a GPU, where the calculation is practically worth nothing compared to memory access, and all index calculations are calculated at the same time using massively parallel architectures.

+1
source

Table search

 #include <stdio.h> #define SIZE 16 #define SIDE 4 //sqrt(SIZE) int table[SIZE]; int rtable[100];// {x,y| x<=99, y<=99 } void setup(){ int i, x, y, xy, index;//xy = x + y x=y=xy=0; for(i=0;i<SIZE;++i){ table[i]= index= x*10 + y; rtable[x*10+y]=i; x = x + 1; y = y - 1;//right up if(y < 0 || x >= SIDE){ ++xy; x = 0; y = xy;; while(y>=SIDE){ ++x; --y; } } } } int getNGonalNeighbor(int index, int offsetX, int offsetY){ int x,y; x=table[index] / 10 + offsetX; y=table[index] % 10 + offsetY; if(x < 0 || x >= SIDE || y < 0 || y >= SIDE) return -1; //ERROR return rtable[ x*10+y ]; } int main() { int i; setup(); printf("%d\n", getNGonalNeighbor(15,-1,-1)); printf("%d\n", getNGonalNeighbor(15, 0,-1)); printf("%d\n", getNGonalNeighbor(15,-1, 0)); printf("%d\n", getNGonalNeighbor(11,-2,-1)); printf("%d\n", getNGonalNeighbor(0, -1,-1)); return 0; } 

Do not use the tabular version.

 #include <stdio.h> #define SIZE 16 #define SIDE 4 void num2xy(int index, int *offsetX, int *offsetY){ int i, x, y, xy;//xy = x + y x=y=xy=0; for(i=0;i<SIZE;++i){ if(i == index){ *offsetX = x; *offsetY = y; return; } x = x + 1; y = y - 1;//right up if(y < 0 || x >= SIDE){ ++xy; x = 0; y = xy;; while(y>=SIDE){ ++x; --y; } } } } int xy2num(int offsetX, int offsetY){ int i, x, y, xy, index;//xy = x + y x=y=xy=0; for(i=0;i<SIZE;++i){ if(offsetX == x && offsetY == y) return i; x = x + 1; y = y - 1;//right up if(y < 0 || x >= SIDE){ ++xy; x = 0; y = xy;; while(y>=SIDE){ ++x; --y; } } } return -1; } int getNGonalNeighbor(int index, int offsetX, int offsetY){ int x,y; num2xy(index, &x, &y); return xy2num(x + offsetX, y + offsetY); } int main() { printf("%d\n", getNGonalNeighbor(15,-1,-1)); printf("%d\n", getNGonalNeighbor(15, 0,-1)); printf("%d\n", getNGonalNeighbor(15,-1, 0)); printf("%d\n", getNGonalNeighbor(11,-2,-1)); printf("%d\n", getNGonalNeighbor(0, -1,-1)); return 0; } 
+1
source

All Articles