Perl JSON :: XS incorrectly encodes UTF8?

This simple code segment shows a problem that I am encountering with JSON :: XS encoding in Perl:

#!/usr/bin/perl use strict; use warnings; use JSON::XS; use utf8; binmode STDOUT, ":encoding(utf8)"; my (%data); $data{code} = "Gewürztraminer"; print "data{code} = " . $data{code} . "\n"; my $json_text = encode_json \%data; print $json_text . "\n"; 

The output of this result:

 johnnyb@boogie :~/Projects/repos > ./jsontest.pl data{code} = Gewürztraminer {"code":"Gewürztraminer"} 

Now, if I comment on the binmode line above, I get:

 johnnyb@boogie :~/Projects/repos > ./jsontest.pl data{code} = Gew rztraminer {"code":"Gewürztraminer"} 

What's going on here? Please note that I am trying to fix this behavior in a Perl CGI script where binmode cannot be used, but I always get the "" characters, as mentioned above, in the JSON stream. How do I debug this? What am I missing?

+7
json perl cgi
source share
2 answers

encode_json (short for JSON::XS->new->utf8->encode ) encodes using UTF-8, then you encode it by printing it to STDOUT, to which you added the encoding layer. Effectively you do encode_utf8(encode_utf8($uncoded_json)) .

Solution 1

 use open ':std', ':encoding(utf8)'; # Defaults binmode STDOUT; # Override defaults print encode_json(\%data); 

Decision 2

 use open ':std', ':encoding(utf8)'; # Defaults print JSON::XS->new->encode(\%data); # Or to_json from JSON.pm 

Decision 3

The following works with any STDOUT encoding using \u screens for non-ASCII:

 print JSON::XS->new->ascii->encode(\%data); 

In the comments, you mention this is actually a CGI script.

 #!/usr/bin/perl use strict; use warnings; use utf8; # Encoding of source code. use open ':encoding(UTF-8)'; # Default encoding of file handles. BEGIN { binmode STDIN; # Usually does nothing on non-Windows. binmode STDOUT; # Usually does nothing on non-Windows. binmode STDERR, ':encoding(UTF-8)'; # For text sent to the log file. } use CGI qw( -utf8 ); use JSON::XS qw( ); { my $cgi = CGI->new(); my $data = { code => "Gewürztraminer" }; print $cgi->header('application/json'); print encode_json($data); } 
+11
source share

JSON::XS encodes its output into octets. This means the external representation of the utf8 encoded string, but it is not a Unicode string. See perlunicode for more details. In short, the contents of $json_text ready to be passed by the IO handler in binary. If you create scalar content $data{code} after use utf8; , you have a scalar containing an internally encoded Unicode character string. (Which is internally encoded as utf8, but these are implementation details you should not rely on. Pragma use utf8; means the source code is encoded as utf8 and nothing else.) If you want to output both scalars to utf8 encoded IO you need to convert $json_string to the internal Unicode character string.

 use strict; use warnings; use JSON::XS; use utf8; binmode STDOUT, ":encoding(utf8)"; my (%data); $data{code} = "Gewürztraminer"; print "data{code} = " . $data{code} . "\n"; my $json_text = encode_json \%data; utf8::decode($json_text); print $json_text . "\n"; 

Or how it is intended to be used, output the encoded string using the IO handler in binary mode.

 my $json_text = encode_json \%data; binmode STDOUT; print $json_text . "\n"; 

Try

 print utf8::is_utf8($json_text) ? "UTF8" : "OCTETS" . "\n"; 

to see what's inside.

+3
source share

All Articles