Perl precompiled regex - utf8

When I do this:

use strict; use warnings;
my $regex = qr/[[:upper:]]/;
my $line = MyModule::get_my_line_from_external_source(); #file, db, etc...
print "upper here\n" if( $line =~ $regex );

How does perl know when it should match only ascii uppercaseand when utf8 uppercase? This is a precompiled regex, so multiple perl should know what uppercase is. Depending on the locale settings? If so, how do I match the uppercase utf8 in the "C" locale with a precompiled regular expression?

updated based on tchrist comments:

use strict; use warnings; use Encode;
my $regex = qr/[[:upper:]]/;

my $line = XXX::line();
print "$line: upper1 ", ($line =~ $regex) ? "YES" : "NO", "\n";

my $uline = Encode::decode_utf8($line);
print "$uline: upper2 ", ($uline =~ $regex) ? "YES" : "NO", "\n";

package XXX;
sub line { return "alpha-Ω"; } #returning octets - not utf8 chars

Conclusion:

alpha-Ω: upper1 NO
alpha-Ω: upper2 YES

Which means that the precompiled regular expression is not "hard precompiled", but "soft precompiled" - so perl replace "[[: upper:]]" based on the utf8 flag of the matched string $.

+5
1

Perl 5.14 .

5.14 , , /u, /l, /d, /a /aa.

use re "/u";

use re "/msu";

.

, 5.14:

% perl -le 'print qr/foo/'
(?^:foo)
% perl -E 'say qr/foo/'
(?^u:foo)
% perl -E 'say qr/foo/l'
(?^l:foo)

; Unicode.

, , " " , ; UTF8. , .

+6

All Articles