Although I agree with John Skeet in principle, sometimes I don’t have the opportunity to use an external XML library. And I believe that two functions for escape / unescape of a simple value (attribute or tag, not the full document) are not available in the standard XML libraries included in Java.
As a result, and based on various answers that I saw here and in other places, here is the solution I created (nothing works like a simple copy / paste):
public final static String ESCAPE_CHARS = "<>&\"\'"; public final static List<String> ESCAPE_STRINGS = Collections.unmodifiableList(Arrays.asList(new String[] { "<" , ">" , "&" , """ , "'" })); private static String UNICODE_LOW = "" + ((char)0x20); //space private static String UNICODE_HIGH = "" + ((char)0x7f); //should only use for the content of an attribute or tag public static String toEscaped(String content) { String result = content; if ((content != null) && (content.length() > 0)) { boolean modified = false; StringBuilder stringBuilder = new StringBuilder(content.length()); for (int i = 0, count = content.length(); i < count; ++i) { String character = content.substring(i, i + 1); int pos = ESCAPE_CHARS.indexOf(character); if (pos > -1) { stringBuilder.append(ESCAPE_STRINGS.get(pos)); modified = true; } else { if ( (character.compareTo(UNICODE_LOW) > -1) && (character.compareTo(UNICODE_HIGH) < 1) ) { stringBuilder.append(character); } else { stringBuilder.append("&#" + ((int)character.charAt(0)) + ";"); modified = true; } } } if (modified) { result = stringBuilder.toString(); } } return result; }
The above contains several different things:
- avoids the use of char-based logic until absolutely necessary - improves Unicode compatibility
- tries to be as effective as possible, given the likelihood that the second “if” condition is probably the most used way
- - pure function; i.e. thread safe
- perfectly optimized with the garbage collector, returning only the contents of the StringBuilder, if something really changed - otherwise the original string is returned
At some point, I will write an inverse of this function, toUnescaped (). I just don't have time to do it today. When I do this, I will come to update this answer with the code. :)
chaotic3quilibrium Dec 19 '13 at 23:09 2013-12-19 23:09
source share