Actually check OpenCL subgroups . They define some functions of the sub_group_all() , such as sub_group_all() and sub_group_any() , as well as something else interesting.
Subgroups are a relatively new criterion, and I'm not sure who supports it. The Intel GPU implementation (actually an extension) has several more interesting shuffle functions for swapping lanes (in the register file), as well as for creating an explicit block of writes and reads. I'm sure AMD also supports subgroups, but I'm not sure about NVidia.
source share