The bash guide says:
When used with [[,, and>, operators sort lexicographically using the current locale. The test team is sorted using ASCII order.
It comes down to using strcoll (3) or strcmp (3) respectively.
Use the following program (strcoll_strcmp.c) to verify this:
#include <stdio.h> #include <string.h> #include <locale.h> int main(int argc, char **argv) { setlocale(LC_ALL, ""); if (argc != 3) { fprintf(stderr, "Usage: %s str1 str2\n", argv[0]); return 1; } printf("strcoll('%s', '%s'): %d\n", argv[1], argv[2], strcoll(argv[1], argv[2])); printf("strcmp('%s', '%s'): %d\n", argv[1], argv[2], strcmp(argv[1], argv[2])); return 0; }
Please note the difference:
$ LC_ALL=C ./strcoll_strcmp ' a' '0a' strcoll(' a', '0a'): -16 strcmp(' a', '0a'): -16 $ LC_ALL=en_US.UTF-8 ./strcoll_strcmp ' a' '0a' strcoll(' a', '0a'): 10 strcmp(' a', '0a'): -16
That is why they are compared as such, I'm not sure. This should be due to some English lexicographic sorting rules. I think the exact rules are described in ISO 14651 Method for comparing character strings and a description of the general order template and the accompanying table of patterns. Glibc contains this data in the source tree in libc/localedata/locales .
spbnick
source share