Suggestion of a small but important improvement for @greggos solution:
maxu( a, b ) == a
has a drawback as you need to backup "a" before comparing maxu, which will result in an extra operation, something like this:
movq mmc, mma pmaxu mma, mmb pcmpeq mma, mmc
minu( a, b ) == b
gives exactly the same effect, but retains operators for checking equality:
pminu mma, mmb pcmpeq mma, mmb
The increase is significant: only 2 operations instead of 3.
source share