Here's the actual cfscript code that clears our XML, there are two methods: one that clears higher international characters, and one that clears only the lower ASCII character, which violates our XML, if you find more characters, just extend the filtering rules.
<cfscript> function cleanHighAscii(text){ var buffer = createObject("java", "java.lang.StringBuffer").init(); var pattern = createObject("java", "java.util.regex.Pattern").compile(javaCast( "string", "[^\x00-\x7F]" )); var matcher = pattern.Matcher(javaCast( "string", text)); while(matcher.find()){ var value = matcher.group(); var asciiValue = asc(value); if ((asciiValue == 8220) OR (asciiValue == 8221)) value = """"; else if ((asciiValue == 8216) || (asciiValue == 8217)) value = "'"; else if (asciiValue == 8230) value = "..."; else value = "&###asciiValue#;"; matcher.AppendReplacement(buffer, javaCast( "string", value )); } matcher.AppendTail(buffer); return buffer.ToString(); } function removeSubAscii(text){ return rereplaceNoCase(text, "\x1A","&###26#;", "all"); } function XMLSafe(text){ text = cleanHighAscii(text); text = removeSubAscii(text); return text; } </cfscript>
Another posisbilty for CF10 user funciton encodeForXML ():
https://learn.adobe.com/wiki/display/coldfusionen/EncodeForXML
Either use the ESAPI that comes with CF10 directly, or add ESAPI banks to your senior CF from the OWASP website https://www.owasp.org/index.php/ESAPI_Overview :
var esapi = createObject("java", "org.owasp.esapi.ESAPI"); var esapiEncoder = esapi.encoder(); return esapiEncoder.encodeForXML(text);
source share