Due to an error called "Unicode Error". The equivalent of the following is true:
use Encode qw( encode_utf8 is_utf8 ); my $bytes = is_utf8($str) ? encode_utf8($str) : $str;
is_utf8 checks which of the two string storage formats is used by the scalar. This is an internal implementation detail that you will never have to worry about except a Unicode error.
Your program works because encode always returns a string for which is_utf8 returns false and use utf8; always returns a string for which is_utf8 returns true if the string contains non-ASCII characters.
If you are not encode as you should, you sometimes get the wrong result. For example, if you used "\x{E6}2" instead of 'æ2' , you would get a different file name, even if the lines are the same length and the same characters.
$ dir total 0 $ perl -wE' use utf8; $fu="æ"; $fd="\x{E6}"; say sprintf "%vX", $_ for $fu, $fd; say $fu eq $fd ? "eq" : "ne"; system("touch", $_) for "u".$fu, "d".$fd ' E6 E6 eq $ dir total 0 -rw------- 1 ikegami ikegami 0 Jul 12 12:18 uæ -rw------- 1 ikegami ikegami 0 Jul 12 12:18 d?
source share