As the author of spymemcached , I am a little biased, but I would say this for the following reasons:
Designed from the ground up to be non-blocking everywhere.
When you request data, issue a set, etc .... there is one tiny parallel insertion of the queue, and you get the future to block the results (with some convenient methods for common cases like get).
Optimized aggressively
You can read more on my optimizations page, but I am doing the optimization of the entire application.
I am still well versed in micro-tests, but to compare with another client you need to settle unrealistic usage patterns (for example, waiting for a response to each given operation or blocking the building around allows them to be avoided by optimizing the packages).
Verified obsessively
I maintain a fairly strict test suite with coverage reports in each release.
Errors are still slipping, but they are usually pretty minor, and the client continues to improve. :)
Well documented
The examples page provides a brief introduction, but javadoc goes into great detail.
Provides high-level abstractions
I have a Map interface to the cache, as well as a functional CAS abstraction. Both binary and text support the incr-with-default mechanism (provided by the binary protocol, but rather complicated in the text).
Retains performance
I do a lot of work on the server itself, so I do not lag behind protocol changes.
I made the first implementations of the binary protocol server (both the test server and memcached itself), and this was the first production-ready client to support it and makes it first-class.
I also have support for several hash algorithms and node allocation algorithms, all of which are well tested for each assembly. You can make a hash code with a ketam margin, or a derivative using FNV-1 (or even your own java string hashing) if you want to improve performance.