Upgraded response.
It would be unreasonable for you to simply exclude code points U + 80 - U + FFFF
given the Unicode range, extends to U + 10FFFF.
This currently spans many characters beyond the 16-bit bmp range.
I'm going to show you how to do this in the range you want in any
UTF-16 or UTF-8/32, which you may or may not control.
Utf-16
# UTF-16 regex ; equavelent UTF-8/32 regex (?![\x{80}-\x{FFFF}])[$\w] (?! (?: [\x{80}-\x{D7FF}\x{E000}-\x{FFFF}] | [\x{D800}-\x{DBFF}] (?! [\x{DC00}-\x{DFFF}] ) | [\x{DC00}-\x{DFFF}] (?<! [\x{D800}-\x{DBFF}] [\S\s] ) ) ) [$\w]
UTF-8/32
# UTF-8/32 regex ; (?! [\x{80}-\x{FFFF}] ) [$\w]
Finally, the simplest range extension to U + 10FFFF
# UTF-8/32 regex ; (?! [\x{80}-\x{10FFFF}] ) [$\w]
user557597
source share