Essentially, you just implement a spinlock . Only instead of a single lock variable do you have a whole lock texture.
Logically, what you do makes sense. But as far as OpenGL is concerned, it really won't work.
See, the OpenGL shader execution model claims that calls are made in an order that is largely undefined with respect to each other. But spin-locks work only if there is a guarantee of moving forward between different threads. In essence, spinlocks require that a thread that is spinning cannot cause the actuator to not start the thread that it is expecting.
OpenGL does not provide such a guarantee. This means that one thread can completely block a pixel, and then stop execution (for some reason), while another thread blocks that pixel. A blocked thread never stops execution, and a thread that owns the lock never resumes execution.
How can this happen in a real system? Well, let's say you have a shader fragment invocation group that runs on some fragments from a triangle. They all block their pixels. But then they diverge in execution due to a conditional transition within the blocking area. A discrepancy in performance may mean that some of these calls are transferred to another execution unit. If at the moment there is not one available, then they actually stop until one becomes available.
Now, let's say that some other group for calling the fragment shader came and was assigned an executive unit in front of the diverging group. If this group tries to spinlock pixels from a diverging group, it essentially depletes the diverging runtime group, expecting it to never happen.
Obviously, real GPUs have more than one executive module, but you can imagine that with a large number of call groups it is quite possible that in such a scenario problems arise from time to time.
Nicol bolas
source share