In Java, the character is \pS , which does not match the punctuation characters, which are \pP .
I am talking about this problem, plus I list the types for all ASCII punctuation marks and characters here in this answer .
Templates like [\p{Alnum}\s] only work with an outdated dataset from the 1960s. To work with things with installed Java character sets, you need something of the order
identifier_charclass = "[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}[\\p{InEnclosedAlphanumerics}&&\\p{So}]]"; whitespace_charclass = "[\\u000A\\u000B\\u000C\\u000D\\u0020\\u0085\\u00A0\\u1680\\u180E\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200A\\u2028\\u2029\\u202F\\u205F\\u3000]"; ident_or_white = "[" + identifier_charclass + whitespace_charclass + "]";
I'm sorry that Java is so difficult to work with a modern dataset, but at least it's possible.
Just don't ask about borders or grapheme clusters. For this, see my other publications .
source share