Array structure, structure arrays, and memory usage patterns

I read about SOA, and I wanted to try to implement it on the system I am building.

I am writing a simple C structure to run some tests, but I'm a bit confused, now I have 3 different structures for vec3 . I will show them below, and then move on to more detailed information about the question.

 struct vec3 { size_t x, y, z; }; struct vec3_a { size_t pos[3]; }; struct vec3_b { size_t* x; size_t* y; size_t* z; }; struct vec3 vec3(size_t x, size_t y, size_t z) { struct vec3 v; vx = x; vy = y; vz = z; return v; } struct vec3_a vec3_a(size_t x, size_t y, size_t z) { struct vec3_a v; v.pos[0] = x; v.pos[1] = y; v.pos[2] = z; return v; } struct vec3_b vec3_b(size_t x, size_t y, size_t z) { struct vec3_b v; vx = (size_t*)malloc(sizeof(size_t)); vy = (size_t*)malloc(sizeof(size_t)); vz = (size_t*)malloc(sizeof(size_t)); *(vx) = x; *(vy) = y; *(vz) = z; return v; } 

These are three types of vec3 ads.

 struct vec3 v = vec3(10, 20, 30); struct vec3_a va = vec3_a(10, 20, 30); struct vec3_b vb = vec3_b(10, 20, 30); 

When printing addresses using printf, I get these values:

 size of vec3 : 24 bytes size of vec3a : 24 bytes size of vec3b : 24 bytes size of size_t : 8 bytes size of int : 4 bytes size of 16 int : 64 bytes vec3 x:10, y:20, z:30 vec3 x:0x7fff57f8e788, y:0x7fff57f8e790, z:0x7fff57f8e798 vec3a x:10, y:20, z:30 vec3a x:0x7fff57f8e768, y:0x7fff57f8e770, z:0x7fff57f8e778 vec3b x:10, y:20, z:30 vec3b x:0x7fbe514026a0, y:0x7fbe51402678, z:0x7fbe51402690 

The last thing I did was create an array of 10 struct vec3_b and print out the addresses that returned these values.

  struct vec3_b vb3[10]; for(int i = 0; i < 10; i++) { vb3[i] = vec3_b(i, i*2, i*4); } index:0 vec3b x:0x7fbe514031f0, y:0x7fbe51403208, z:0x7fbe51403420 index:1 vec3b x:0x7fbe51403420, y:0x7fbe51403438, z:0x7fbe51403590 index:2 vec3b x:0x7fbe51403590, y:0x7fbe514035a8, z:0x7fbe514035c0 index:3 vec3b x:0x7fbe514035c0, y:0x7fbe514035d8, z:0x7fbe514035f0 index:4 vec3b x:0x7fbe514035f0, y:0x7fbe51403608, z:0x7fbe51403680 index:5 vec3b x:0x7fbe51403680, y:0x7fbe51403698, z:0x7fbe514036b0 index:6 vec3b x:0x7fbe514036b0, y:0x7fbe514036c8, z:0x7fbe514036e0 index:7 vec3b x:0x7fbe514036e0, y:0x7fbe514036f8, z:0x7fbe51403710 index:8 vec3b x:0x7fbe51403710, y:0x7fbe51403728, z:0x7fbe51403740 index:9 vec3b x:0x7fbe51403740, y:0x7fbe51403758, z:0x7fbe51403770 

Questions:

  • Is my implementation of struct vec3_b correct way to set an array structure?

  • Since the vec_3b structure is 24 bytes in size, could I add 2 plus 12 extra bytes to 1 line of the current cpu cache?

  • If my vec3_b is the right way to configure SoA, I am having problems with addressing, where I add 10 vec3_b together.

Looking at the hexadecimal values ​​and their decimal representations, I do not see a single pattern that makes me think that my setting is incorrect.

  ---------------x-----------------|----------------y-----------------|----------------z-----------------| 0| 0x7fbe514031f0 : 140455383675376 | 0x7fbe51403208 : 140455383675400 | 0x7fbe51403420 : 140455383675936 1| 0x7fbe51403420 : 140455383675936 | 0x7fbe51403438 : 140455383675960 | 0x7fbe51403590 : 140455383676304 2| 0x7fbe51403590 : 140455383676304 | 0x7fbe514035a8 : 140455383676328 | 0x7fbe514035c0 : 140455383676352 
+1
source share
2 answers
  • I can't think of a case where vec_3b would be a good idea.

  • Note that you also need to find a place for 24 bytes of data for the pointers it points to, and it probably won't be in contact with the structure itself, so you probably just reduced your effective cache size by 2 times compared to with vec3 or vec_3a . Each malloc() has a minimum size; on a 64-bit machine, which is usually at least 16 bytes. Thus, three separate distributions for the three specified in the values ​​in the vec_3b structure require at least 48 other bytes for auxiliary data (plus 24 for the structure itself). This does not match a single cache line; it is not guaranteed to fit so that it fits into two lines of the cache.

  • N / A - The question is based on a false assumption.

+6
source

1 and 3: No, your vec3_b not a structured array structure.

What you do is the presence of several structures, each of which has a 64-bit pointer to 64 bits of data.

With struct-of-arrays, you create an ONE structure and have several arrays of variable size.

Thus, the value of the 10th x will be mystruct.x[9] , not mystruct[9].x[0] .

The key point is to save all stored x values, so you can load multiple x values ​​with movdqu / _mm_loadu_si128 . If you are working with SIMD, select the smallest element width that will support the range of values ​​you need. Using 64-bit elements will halve your throughput compared to 32-bit elements. Your code will process 128b at a time, and twice as many elements if they are half-widths.

+1
source

All Articles