Normalize string in ColdFusion

I am trying to normalize a string in ColdFusion.

I want to use the java.text.Normalizer Java class for this, since CF does not have any similar functions as far as I know.

Here is my current code:

 <cfset normalizer = createObject( "java", "java.text.Normalizer" ) /> <cfset string = "äéöè" /> <cfset string = normalizer.normalize(string, createObject( "java", "java.text.Normalizer$Form" ).NFD) /> <cfset string = ReReplace(string, "\\p{InCombiningDiacriticalMarks}+", "") /> <cfoutput>#string#</cfoutput> 

Any ideas why it always outputs äéöè and not a normalized string?

+4
source share
3 answers

In ColdFusion, unlike Java, you do not need to avoid backslashes in string literals. Your current regex will not match anything that doesn't start with a backslash, so no replacement happens.

Other than that, your code is perfectly correct, and you can see that the line length is 8, not 4, during output. This is the effect of calling normalize .

However, remember that this is still the equivalent representation of the original string, and therefore it is not surprising that you cannot see the difference visually. This is the correct display of Unicode in action.

+8
source

Your "\\p" should just be "\p" . Cf ReReplace () automatically removes backslashes for you, so your "\\p" interpreted as the java side as "\\\\p"

One liner:

 <cfscript> var k = "mike café"; k = createObject( 'java', 'java.text.Normalizer' ).normalize( k, createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+',''); // k is now "mike cafe" </cfscript> 

http://www.cfquickdocs.com/cf9/#rereplace

+4
source

I recommend using a Java library such as Junidecode. https://github.com/gcardone/junidecode

It converts the strings UTF8 and UTF16 to ASCII7. Examples:

  • mike cafe = mike cafe
  • ℡ = TEL
  • Be 亰 = Bei Jing
  • Mr. ま さ ゆ き た け だ = Mr. Masayuki Takeda
  • ⠏⠗⠑⠍⠊⠑⠗ = prime
  • ไทย อาณาจักร ไทย = raach'aanaacchakraithy
  • Ελληνικά = Ellenica
  • Moscow = Moscow
  • Հայաստան = Ayastan
  • ℰ𝒳𝒜ℳ𝓟ℒℰ = EXAMPLE

I shared a full demo based on ColdFusion (which requires a Junidecode JAR file): https://gamesover2600.tumblr.com/post/182608667654/coldfusion-unicode-junidecode-demo

Here is the function code:

 <cfscript> function JUnidecode(inputString){ var JUnidecodeLib = ""; var response = ""; var temp = {}; temp.encoder = createObject("java", "java.nio.charset.Charset").forName("utf-8").newEncoder(); temp.isUTF = temp.encoder.canEncode(arguments.inputString); if (temp.isUTF){ /* NFKC: UTF Compatibility Decomposition, followed by Canonical Composition */ temp.normalizer = createObject( "java", "java.text.Normalizer" ); temp.normalizerForm = createObject( "java", "java.text.Normalizer$Form" ); arguments.inputString = temp.normalizer.normalize( javaCast( "string", arguments.inputString ), temp.normalizerForm.NFKC ); } try { JUnidecodeLib = createObject("java", "net.gcardone.junidecode.Junidecode"); response = JUnidecodeLib.unidecode( javacast("string", arguments.inputString) ); } catch (any e) { response = "ERROR: JUnidecode is not installed"; } return trim(Response.replaceAll("\[\?\]", "")); } function isDiff(compareArr, val, pos){ return (pos GT arrayLen(comparearr) OR comparearr[pos] neq val); } </cfscript> 
0
source

All Articles