At first,
perl -le'print chr(0x263A);'
is a mistake. Perl even tells you so much:
Wide character in print at -e line 1.
It does not qualify as "working." Therefore, although they differ in that they cannot provide what you want, none of the following gives you what you want:
perl -le'print chr(0x263A);' perl -le'print chr(0x00C0);'
To correctly output the UTF-8 encoding of these Unicode code points, you need to tell Perl to encode Unicode points with UTF-8.
$ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x263A);' βΊ $ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x00C0);' Γ
Now about the why.
A file descriptor can only transmit bytes, so unless you specify otherwise, Perl files process bytes. This means that the line you provide print cannot contain anything but bytes, or, in other words, it cannot contain characters greater than 255. The output is exactly what you provide:
$ perl -e'print map chr, 0x00, 0x65, 0xC0, 0xF0' | od -t x1 0000000 00 65 c0 f0 0000004
This is useful. This is different from what you want, but it does not do it wrong. If you need something else, you just need to tell Perl what you want.
Adding a layer :encoding , the handle now expects a Unicode character string or, as I call it, "text". The level tells Perl how to convert text to bytes.
$ perl -e' use open ":std", ":encoding(UTF-8)"; print map chr, 0x00, 0x65, 0xC0, 0xF0, 0x263a; ' | od -t x1 0000000 00 65 c3 80 c3 b0 e2 98 ba 0000011
Itβs your right that chr does not know and does not care about Unicode. Like length , substr , ord and reverse , chr implements a basic string function, not a Unicode function. This does not mean that it cannot be used to work with a text string. As you saw, the problem was not in chr , but in what you did with the string after it was created.
A character is an element of a string, and a character is a number. This means that a string is just a sequence of numbers. Regardless of whether you treat these numbers as Unicode code points (text), packed IP addresses or temperature measurements are completely up to you and the functions to which you pass the strings.
Here are a few examples of statements that assign values ββto strings that they receive as operands:
m// expects a Unicode code string.connect expects a sequence of bytes representing the sockaddr_in structure.print with no descriptor :encoding expect a sequence of bytes.print with a handle :encoding expects a sequence of Unicode codes.- etc.
So, how can I convert a number to a string consisting of one character corresponding to it, so that, for example, real_chr (0xC0) has the value eq 'Γ'?
chr(0xC0) eq 'Γ' is satisfied. Don't you remember that Perl encoded the source code using UTF-8 using use utf8; ? If you did not specify Perl, Perl actually sees a two-digit string in RHS.
Regarding the question you added:
There are problems with the encoding pragma. I recommend not using it. Use instead
use open ':std', ':encoding(UTF-8)';
This will fix one of the problems. Another problem you are facing is
chr(0x00C0) =~ /\w/
This is a known bug that intentionally crashed due to backward compatibility considerations. That is, if you do not request a newer version of the language as follows:
use 5.014;
Workaround that works as early as 5.8:
my $x = chr(0x00C0); utf8::upgrade($x); $x =~ /\w/