Using 'use utf8;' gives me "Broad Character in the Press",

If I run the following Perl program:

perl -e 'use utf8; print "鸡\n";' 

I get this warning:

 Wide character in print at -e line 1. 

If I run this Perl program:

 perl -e 'print "鸡\n";' 

I do not get a warning.

I thought that use utf8 is required to use UTF-8 characters in a Perl script. Why is this not working and how can I fix it? I am using Perl 5.16.2. I have the same problem if it is a file and not a single liner on the command line.

+60
perl unicode utf-8
Mar 04 '13 at 20:29
source share
6 answers

Without use utf8 Perl interprets your string as a sequence of single-byte characters. There are four bytes in your line:

 $ perl -E 'say join ":", map { ord } split //, "鸡\n";' 233:184:161:10 

The first three bytes make up your character, the last is a string.

The print call invokes these four characters in STDOUT. Your console then works on how to display these characters. If your console is configured to use UTF8, it will interpret these three bytes as your only character, and that is what is displayed.

If we add the utf8 module, everything will be different. In this case, Perl interprets your string as two characters.

 $ perl -Mutf8 -E 'say join ":", map { ord } split //, "鸡\n";' 40481:10 

By default, the Perl IO layer assumes that it works with single-byte characters. Therefore, when you try to print a multibyte character, Perl thinks that something is wrong and gives you a warning. As always, you can get more explanation for this error by enabling use diagnostics . He will say this:

(S utf8) Perl met a wide character (> 255) when it did not expect one. This warning is enabled by default for I / O (for example, for printing). The easiest way to calm this warning is to simply add a layer: utf8 to the output, for example. binmode STDOUT, ': utf8'. Another way to disable the warning is to not add the utf8 warning; but it is often closer to fraud. In general, you should explicitly mark filehandle with encoding, see open and perlfunc / binmode.

As others have pointed out, you need to tell Perl to accept multibyte output. There are many ways to do this (see the Perl Unicode Tutorial for some examples). One of the easiest ways is to use the -CS command line flag, which tells three standard file descriptors (STDIN, STDOUT, and STDERR) to work with UTF8.

 $ perl -Mutf8 -e 'print "鸡\n";' Wide character in print at -e line 1.鸡 

against

 $ perl -Mutf8 -CS -e 'print "鸡\n";' 

Unicode is a large and complex area. As you saw, many simple programs seem to do the right thing, but for the wrong reasons. When you start correcting part of a program, the situation will often get worse until you fix the whole program.

+85
Mar 05 '13 at 10:56
source share

All use utf8; this tells Perl that the source code is encoded using UTF-8. You need to tell Perl how to encode the text:

 use open ':std', ':encoding(UTF-8)'; 
+58
Mar 04 '13 at 20:34
source share

You can get close to "just do utf8 everywhere" using the CPAN module utf8::all .

 perl -Mutf8::all -e 'print "鸡\n";' 

When print receives something that it cannot print (a character is larger than 255 if the :encoding layer is not specified), it is assumed that you want to encode it using UTF-8. He does this after warning of a problem.

+11
Mar 04 '13 at 21:25
source share

Encode all standard output as UTF-8:

 binmode STDOUT, ":utf8"; 
+11
Feb 17 '14 at 21:18
source share

You can use this,

 perl -CS filename. 

He will also complete this error.

+3
Apr 09 '15 at 10:40
source share

In Spanish, you may find this error if you start using:

 use utf8; 

Your editor encoding is in a different encoding. So what you see in the editor is not what Perl does. To resolve this error, just change the editor encoding to Unicode / UTF-8 .

+1
May 23 '15 at 2:15 pm
source share



All Articles