Are you looking for specific characters or all characters outside BMP?
If the first one, you can use StringBuilder to build a string containing code points from higher planes, and the regular expression will work as expected:
String test = new StringBuilder().append("test").appendCodePoint(0x10300).append("test").toString(); Pattern regex = Pattern.compile(new StringBuilder().appendCodePoint(0x10300).toString()); Matcher matcher = regex.matcher(test); matcher.find(); System.out.println(matcher.start());
If you want to remove all non-BMP characters from a string, I would use StringBuilder directly, not a regular expression:
StringBuilder sb = new StringBuilder(test.length()); for (int ii = 0 ; ii < test.length() ; ) { int codePoint = test.codePointAt(ii); if (codePoint > 0xFFFF) { ii += Character.charCount(codePoint); } else { sb.appendCodePoint(codePoint); ii++; } }
Anon Oct 27 '10 at 17:10 2010-10-27 17:10
source share