Creating Unicode File Names

I am looking for some guidelines for creating Unicode character file names. Consider:

use open qw( :std :utf8 ); use strict; use utf8; use warnings; use Data::Dump; use Encode qw(encode); my $utf8_file_name1 = encode('UTF-8', 'æ1', Encode::FB_CROAK | Encode::LEAVE_SRC); my $utf8_file_name2 = 'æ2'; dd $utf8_file_name1; dd $utf8_file_name2; qx{touch $utf8_file_name1}; qx{touch $utf8_file_name2}; print (qx{ls æ*}); 

Output:

 "\xC3\xA61" "\xE62" æ1 æ2 

Why doesn't it matter if I encode the file name in UTF8 or not? (The file name still becomes valid UTF8 anyway.)

+5
source share
1 answer

Due to an error called "Unicode Error". The equivalent of the following is true:

 use Encode qw( encode_utf8 is_utf8 ); my $bytes = is_utf8($str) ? encode_utf8($str) : $str; 

is_utf8 checks which of the two string storage formats is used by the scalar. This is an internal implementation detail that you will never have to worry about except a Unicode error.

Your program works because encode always returns a string for which is_utf8 returns false and use utf8; always returns a string for which is_utf8 returns true if the string contains non-ASCII characters.

If you are not encode as you should, you sometimes get the wrong result. For example, if you used "\x{E6}2" instead of 'æ2' , you would get a different file name, even if the lines are the same length and the same characters.

 $ dir total 0 $ perl -wE' use utf8; $fu="æ"; $fd="\x{E6}"; say sprintf "%vX", $_ for $fu, $fd; say $fu eq $fd ? "eq" : "ne"; system("touch", $_) for "u".$fu, "d".$fd ' E6 E6 eq $ dir total 0 -rw------- 1 ikegami ikegami 0 Jul 12 12:18 uæ -rw------- 1 ikegami ikegami 0 Jul 12 12:18 d? 
+5
source

All Articles