Perl: convert string to utf-8 for json decoding

I browse the site and collect information from my JSON. Results are stored in a hash. But some pages give me "invalid UTF-8 character in JSON string". I notice that the last letter in the "cafe" will lead to an error. I think this is due to the combination of character types. So now I'm looking for a way to convert all character types to utf-8 (hope this is the perfect way). I tried utf8 :: everything, it just doesn’t work (maybe I didn’t do it right). I am noob. Please help, thanks.


UPDATA

Well, after I read the article " Know the difference between character strings and UTF-8 strings " I wrote brian d foy. I solve the problem with the codes:

use utf8; use Encode qw(encode_utf8); use JSON; my $json_data = qq( { "cat" : "Büster" } ); $json_data = encode_utf8( $json_data ); my $perl_hash = decode_json( $json_data ); 

Hope this helps someone else.

+4
source share
1 answer

decode_json expects JSON to be encoded using UTF-8.

As long as your source file is encoded using UTF-8, you have Perl decoded it using use utf8; (as it should be). This means that your string contains Unicode characters, not the UTF-8 bytes that represent these characters.

As you have shown, you can encode a string before passing it to decode_json .

 use utf8; use Encode qw( encode_utf8 ); use JSON qw( decode_json ); my $data_json = qq( { "cat" : "Büster" } ); my $data = decode_json(encode_utf8($data_json)); 

But you can just tell JSON that the string is already decoded.

 use utf8; use JSON qw( ); my $data_json = qq( { "cat" : "Büster" } ); my $data = JSON->new->utf8(0)->decode($data_json); 
+17
source

Source: https://habr.com/ru/post/1413823/


All Articles