Print utf8 in glib
Why can't utf8 characters be printed using glib functions?
Source:
#include "glib.h" #include <stdio.h> int main() { g_print("\n"); fprintf(stdout, "\n"); } Create it as follows:
gcc main.c -o main $(pkg-config glib-2.0 --cflags --libs) You could see that glib cannot print utf8 and fprintf can:
[ marko@marko-work utf8test]$ ./main ????? The fprint functions assume that every line you print with them is correctly encoded in accordance with the current encoding of your terminal. g_print () does not assume that it will convert the encoding if it thinks it is necessary; Of course, this is a bad idea if the encoding was correct before, as this will most likely destroy the encoding. What is the local setup of your terminal?
On most systems, you can set the correct locale for environment variables, or you can do this programmatically using the setlocale function. Locale names are system-specific (not part of the POSIX standard), but the following functions will work on most systems:
#include <locale.h> : setlocale(LC_ALL, "en_US.utf8"); Instead of LC_ALL, you can also set only the locale for certain operations (for example, "en_US" will result in an English number and date formatting, but perhaps you do not want the numbers / dates to be formatted in this way). To quote from the setlocale man page:
LC_ALL Set the entire locale generically.
LC_COLLATE Set the locale for the sort string. This controls the alphabetical order in strcoll () and strxfrm ().
LC_CTYPE Set the locale for ctype (3) and multibyte (3) functions. This controls the recognition of upper and lower case, alphabetic or non-alphabetic characters, etc.
LC_MESSAGES Set the locale for message directories, see the catopen (3) function.
LC_MONETARY Set the locale for formatting monetary values; this affects the localeconv () function.
LC_NUMERIC Set the locale for the format number. This controls the formatting of decimal points in the input and output of floating point numbers in functions such as printf () and scanf (), as well as the values โโreturned by localeconv ().
LC_TIME Set the locale for date and time formatting using strftime ().
The only two locale values โโthat are always available for all systems are "C", "POSIX", and "".
By default, only three locales are defined: the empty string "" (denoting the native environment) and "C" and "POSIX" (which denote the C-language environment). The local NULL argument calls setlocale () to return the current locale. By default, C programs run in the "C" locale. only the function in the library that sets the locale is setlocale (); the locale never changes as a side effect of any other procedure.
The string passed from g_print () to glibc is not necessarily UTF-8 encoded, since g_print () converts the character set to the encoding specified in the locale.
You need to initialize the locale encoding by calling setlocale when your program starts.
setlocale(LC_CTYPE, "") This is usually done for you if you use some initialization function like gtk_init(..) or similar.
It is generally not recommended to use anything other than ASCII inside text files. To translate words from different languages, you should use tools such as gettext . If that doesn't matter, you should save your string in UTF-8 in your code.
Try printing this (this is the hexadecimal representation of your string):
char hex_marco[]={0xD0, 0xBC, 0xD0, 0xB0, 0xD1, 0x80, 0xD0, 0xBA, 0xD0, 0xBE, 0}; This works for me in printf (impossible to check here with glib):
#include <stdio.h> char hex_marco[]={0xD0, 0xBC, 0xD0, 0xB0, 0xD1, 0x80, 0xD0, 0xBA, 0xD0, 0xBE, 0}; int main(void) { printf("%s\n",hex_marco); return 0; } Redirect the output to a file and see it as UTF-8.
Hope this helps.