In a C program, I want to sort the list of valid UTF-8 encoded strings in a Unicode code sequence. No comparison, no knowledge of linguistic value.
I need a comparison function. It's easy enough to write a function that iterates over Unicode characters. (I use GLib , so I repeat using g_utf8_next_char and compare the return values of g_utf8_next_char .)
But what interests me, out of curiosity and perhaps simplicity and efficiency, is this: will a simple byte per byte strcmp (or g_strcmp ) do the same job? I think that since UTF-8 encodes the most important bits in the first place, and a code point that needs to be encoded in N + 1 bytes will have a larger start than a code point that should be encoded in N bytes.
But maybe I missed something? Thanks in advance.
c unicode utf-8 glib
skagedal
source share