Meaning a little regex perl?

I am translating code from perl and I have moved to the next line

$text =~ s/([?!\.][\ ]*[\'\"\)\]\p{IsPf}]+) +([\'\"\(\[\¿\¡\p{IsPi}]*[\ ]*[\p{IsUpper}])/$1\n$2/g; 

My question is what \ p {IsPf} and \ p {IsPi} match? I tried searching the Internet but didn’t find anything ...

+7
source share
3 answers

Let me ask RegexBuddy : this is a property of the Unicode character.

RegexBuddy Screenshot

You can find more documentation on Unicode character properties and Unicode scripts .

+10
source

\p{..} matches characters by Unicode character properties: http://perldoc.perl.org/perlunicode.html#Unicode-Character-Properties

In particular, \p{IsPf} matches characters with the final punctuation attribute ", and \p{IsPi} matches characters with the initial punctuation character". It seems that they basically close and open quotation marks.

The substitution point, apparently, breaks sentences into separate lines, matching the end and the beginning of the sentence, taking into account that the sentence can begin and end with various types of punctuation.

+11
source

As a bit of additional information, unichars from Unicode :: Tussle can be used to display the corresponding characters.

 $ unichars -au '\p{IsPi}' | cat « U+000AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK ' U+02018 LEFT SINGLE QUOTATION MARK ‛ U+0201B SINGLE HIGH-REVERSED-9 QUOTATION MARK " U+0201C LEFT DOUBLE QUOTATION MARK ‟ U+0201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK ‹ U+02039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK ⸂ U+02E02 LEFT SUBSTITUTION BRACKET ⸄ U+02E04 LEFT DOTTED SUBSTITUTION BRACKET ⸉ U+02E09 LEFT TRANSPOSITION BRACKET ⸌ U+02E0C LEFT RAISED OMISSION BRACKET ⸜ U+02E1C LEFT LOW PARAPHRASE BRACKET ⸠ U+02E20 LEFT VERTICAL BAR WITH QUILL $ unichars -au '\p{IsPf}' | cat » U+000BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK ' U+02019 RIGHT SINGLE QUOTATION MARK " U+0201D RIGHT DOUBLE QUOTATION MARK › U+0203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ⸃ U+02E03 RIGHT SUBSTITUTION BRACKET ⸅ U+02E05 RIGHT DOTTED SUBSTITUTION BRACKET ⸊ U+02E0A RIGHT TRANSPOSITION BRACKET ⸍ U+02E0D RIGHT RAISED OMISSION BRACKET ⸝ U+02E1D RIGHT LOW PARAPHRASE BRACKET ⸡ U+02E21 RIGHT VERTICAL BAR WITH QUILL 
+3
source

All Articles