How to combine multiple Unicode properties in perl regex?

This script has:

use 5.014; use warnings; use utf8; binmode STDOUT, ':utf8'; my $str = "XYZ ΦΨΩ zyz φψω"; my @greek = ($str =~ /\p{Greek}/g); say "Greek: @greek"; my @upper = ($str =~ /\p{Upper}/g); say "Upper: @upper"; #my @upper_greek = ($str =~ /\p{Upper+Greek}/); #wrong. #say "Upper+Greek: @upper_greek"; 

Is it possible to combine multiple Unicode properties? For example, how to choose only Upper and Greek , and get what you want:

 Greek: Φ Ψ Ω φ ψ ω Upper: XYZ Φ Ψ Ω Upper+Greek: Φ Ψ Ω #<-- how to get this? 
+7
perl unicode
source share
2 answers

We can not use

 /(?:\p{Greek}|\p{Upper})/ # Greek OR Upper 

or

 /[\p{Greek}\p{Upper}]/ # Greek OR Upper 

One way to achieve AND in regular expression is to use backlinks.

 /\p{Greek}(?<=\p{Upper})/ # Greek AND Upper 

Another way to get AND is to negate OR. de morgan laws informs us

 NOT( Greek AND Upper ) ⇔ NOT(Greek) OR NOT(Upper) 

So

 Greek AND Upper ⇔ NOT( NOT(Greek) OR NOT(Upper) ) 

It gives us

 /[^\P{Greek}\P{Upper}]/ # Greek AND Upper 

Starting with 5.18, there is also an experimental function that you can use:

 no warnings qw( experimental::regex_sets ); /(?[ \p{Greek} & \p{Upper} ])/ # Greek AND Upper 
+9
source share

This also works in 5.14.0:

 sub InUpperGreek { return <<'END' +utf8::Greek &utf8::Upper END } my @upper_greek = ($str =~ /\p{InUpperGreek}/g); say "Upper Greek: @upper_greek"; 

Not sure if this is easier. :) For more information on how this works, see the perlunicode documentation on custom character properties.

+5
source share

All Articles