How can I display Unicode strings during debugging on Linux?

I have been working for several years when a C ++ developer uses MS Visual Studio as a working platform. Since I privately prefer to use linux, I recently took the opportunity to move my work environment to Linux. Since I’ve been optimizing my environment for Windows for several years, of course, it turns out that some things are missing or do not work properly. Thus, I have some questions for which I could not yet find useful answers.

Let's start the next problem, and other questions are likely to follow later. The problem is that I already came across several times when I had to debug platform errors on windowless platforms.

Simply put: How can I display Unicode strings (encoded by UCS2) when debugging on Linux?

Now some details that I have guessed so far. Our Lib uses a Unicode-based interval String class that encodes each char as a 16-bit Unicode value (but we do not support verbose word encodings, so we can mainly use only the encoded subset of UCS2 UTF16, but in any case this includes almost everything used scripts). This already creates one problem, since most platforms (for example, linux / unix) consider wchar_ttypes to be 4 bytes, and only 2 bytes on windows, so I can't just pass the internal string buffer ( wchar_t *), so I'm not sure if this will really help any debugger.

For gdb, I realized that I can call functions from debugged code, print debugging messages. Thus, I inserted a special function in our library that can arbitrarily convert string data and write it to a new buffer. I am currently recoding our internal buffer to utf8, since I expect that it will most likely work.

: , ( , 16 ), (.. CJK (a.k.a. Hansi/Kanji), , ...) , . , ISO, , utf8 char, CJK cryptic, .

gdb , , utf8 .

, , IDE . eclipse CDT, kdgb. utf8. , java eclipse Windows ( lib ), , , eclipse Unicode.

, , ( ) linux (.. gdb QStrings, , ), , , linux, , , , Unicode Linux , .

, , Unicode, (, QString) / IDE .

+5
4

Linux Unicode. , UTF16 Linux . , , Windows, Linux.

Unicode, UTF-32 ( wchar_t) wprintf wcout, , , UTF-8, . UTF-16 , int16_t, , , .

, , UTF-16 UTF-8 , . , UTF16 UTF32, Unicode ? . GDB , script.

+3

script "wchar.gdb", , , ( ), ist . script , gdb.

define wchar_print
    echo "

    set $i = 0
    while (1 == 1)
            set $c = (char)(($arg0)[$i++])
            if ($c == '\0')
                    loop_break
            end
            printf "%c", $c
    end

    echo "\n
end


document wchar_print
wchar_print <wstr>
Print ASCII part of <wstr>, which is a wide character string of type wchar_t*.
end
+2

, X? ?

? VGA 256/512 . ( 512 iirc )

0

gdb 16- : wchar_t (32 ) ICU ( Unicode) UChar (16 ), gcc -fshort-wchar, wchar_t (L "abc", L'd ') unsigned short (16 ). , wchar_t glibc . wchar_t, gdb wchar_t (16 ). gdb:

short-wchar.c:
#include <wchar.h>
wchar_t wchr;
main() { printf("sizeof(L'a') = %d\n", sizeof(L'a')); return 0; }
gcc -g -fshort-wchar short-wchar.c -o short-wchar
# terminal session encoding utf-8 assumed
gdb short-wchar
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
    (gdb) show charset
    The host character set is "auto; currently UTF-8".
    The target character set is "auto; currently UTF-8".
    The target wide character set is "auto; currently UTF-32".
    (gdb) set target-wide-charset UTF-16
    (gdb) p L"Škoda"
    $1 = L"Škoda"
    (gdb) p (wchar_t*) (some UChar string)
    ....

16- wchar_t - , . ICU, OCI (Oracle Call Interface ) Java char.

0
source

All Articles