You can write without blocking (and even without waiting!) Implementations for everything that you like. Modern hardware primitives, such as CMPXCHG, are sufficient for universal use. But writing and testing such algorithms is not one of the simplest tasks. In addition to this, much faster algorithms can exist: blocking algorithms are just a very small subset of algorithms in general.
As far as I remember, Dmitry Vyukov wrote for Go in the past an unplanned implementation of the MPMC (mutli-producer / multi-user) channel, but the patch was left due to some problems with the Go select operator. Supporting this statement effectively seems very difficult.
The main goal of the Go channel type, however, is to provide a high-level concurrency primitive that can easily be used for a wide range of problems. Even developers who are not experts in parallel programming should be able to write the right programs, which can be easily analyzed and stored in large software projects. If you are interested in squeezing all the last bits of performance, you will have to write a custom implementation of the queue that suits your needs.
source share