I have some code that works pretty well, but I would like it to work better. The main problem I am facing is that it must have a nested loop. The external one is for iterations (which should happen sequentially), and the internal one is for each point particle under consideration. I know that I can do little with the outside, but I wonder if there is a way to optimize something like:
void collide(particle particles[], box boxes[], double boxShiftX, double boxShiftY) { int i; double nX; double nY; int boxnum; for(i=0;i<PART_COUNT;i++) { boxnum = ((((int)(particles[i].sX+boxShiftX))/BOX_SIZE)%BWIDTH+ BWIDTH*((((int)(particles[i].sY+boxShiftY))/BOX_SIZE)%BHEIGHT));
I looked at SIMD, although I can’t learn much about it, and I’m not quite sure that the processing needed to correctly extract and pack the data will be worth half the number of instructions, since it seems that they can be used simultaneously only two doubles.
I tried splitting it into multiple threads using shm and pthread_barrier (to synchronize the different steps, one of which is the code above), but it just made it slower.
My current code is pretty fast; it is on the order of one second for every 10M particles * iterations, and from what I can say from gprof, 30% of my time is spent only on this function (5000 calls, PART_COUNT = 8192 particles take 1.8 seconds). I'm not worried about small, constant things, it's just that 512K particles * 50K iterations * 1000 experiments took more than a week for the last time.
I think my question is is there any way to deal with these long vectors that are more efficient than just sorting through them. I feel it should be, but I can’t find it.
optimization c simd
zebediah49
source share