OpenGL Compute Shader Invocations

I have a question related to new computational shaders. I am currently working on a particle system. I store all my particles in a storage-shader-buffer for access to them in a computational shader. Then I send a one-dimensional workgroup.

#define WORK_GROUP_SIZE 128 _shaderManager->useProgram("computeProg"); glDispatchCompute((_numParticles/WORK_GROUP_SIZE), 1, 1); glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); 

My computational shader:

 #version 430 struct particle{ vec4 currentPos; vec4 oldPos; }; layout(std430, binding=0) buffer particles{ struct particle p[]; }; layout (local_size_x = 128, local_size_y = 1, local_size_z = 1) in; void main(){ uint gid = gl_GlobalInvocationID.x; p[gid].currentPos.x += 100; } 

But one way or another, not all particles are affected. I do it the same way as it was done in this example, but it does not work. http://education.siggraph.org/media/conference/S2012_Materials/ComputeShader_6pp.pdf

Edit:

After calling glMemoryBarrier (GL_SHADER_STORAGE_BARRIER_BIT), I continue:

 _shaderManager->useProgram("shaderProg"); glBindBuffer(GL_ARRAY_BUFFER, shaderStorageBufferID); glVertexPointer(4,GL_FLOAT,sizeof(glm::vec4), (void*)0); glEnableClientState(GL_VERTEX_ARRAY); glDrawArrays(GL_POINTS, 0, _numParticles); glDisableClientState(GL_VERTEX_ARRAY); 

So, which bit would be appropriate to use in this case?

+7
source share
2 answers

I solved the problem. The problem was the number of workgroups that I sent. numParticles / WORK_GROUP_SIZE will be rounded since both variables are integers. This caused too few distributed work groups with varying numbers of particles.

When I got 1000 particles, only 1000/128 = 7 workgroups are sent. Each workgroup has a size of 128. This means that I get 7 * 128 = 896 threads, and thus 104 particles will not move at all. Since numParticles% 128 can vary from 0 ... 128, I just posted another workgroup:

 glDispatchCompute((_numParticles/WORK_GROUP_SIZE)+1, 1, 1); 

And every particle moves from now on. :)

+1
source

You have barriers in the opposite direction . This is a common problem.

The bits that you pass to the barrier describe how you are going to use the recorded data, not how the data was written. GL_SHADER_STORAGE_BARRIER_BIT would be appropriate if you had some kind of process that wrote a buffer object via image loading / saving (or storage buffer / atom counters) and then used a storage buffer to read the data of this buffer object.

Since you are reading the buffer as the buffer of the vertex attribute array, you should use the clever name GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT .

+7
source

All Articles