As usual, no one cares about UTF-16 surrogate pairs. Look at them: What are the most common non-BMP Unicode characters in action? Even authors org.apache.commons / commons-lang3
In this example, you can see the difference between the correct code and regular code:
public static void main(String[] args) { //string with FACE WITH TEARS OF JOY symbol String s = "abcdafghi\uD83D\uDE02cdefg"; int maxWidth = 10; System.out.println(s); //do not care about UTF-16 surrogate pairs System.out.println(s.substring(0, Math.min(s.length(), maxWidth))); //correctly process UTF-16 surrogate pairs if(s.length()>maxWidth){ int correctedMaxWidth = (Character.isLowSurrogate(s.charAt(maxWidth)))&&maxWidth>0 ? maxWidth-1 : maxWidth; System.out.println(s.substring(0, Math.min(s.length(), correctedMaxWidth))); } }
sibnick Aug 26 '15 at 10:12 2015-08-26 10:12
source share