Convert UTF-16 to UTF-8 in JavaScript

Question

Convert UTF-16 to UTF-8 in JavaScript

I have Base64 encoded data that is in UTF-16. I am trying to decode data, but most libraries only support UTF-8. I believe I need to reset the null pieces, but I'm not sure how to do this.

I am currently using David Chambbers Polyfill for Base64, but I have also tried other libraries like phpjs.org , none of which support UTF-16.

One thing to note is Chrome, the atob method works without problems, Firefox I get the results described here , and in IE I return only the first character.

Any help is appreciated

+4

javascript base64 utf-8 utf-16

Don p Jan 29 '13 at 21:11

source share

1 answer

Esailija · Accepted Answer · 2013-01-30T10:31:41+0000

You want to decode UTF-16, not convert to UTF-8. Decoding means that the result is a string of abstract characters. Of course, there is an internal encoding for strings, UTF-16 or UCS-2 in javascript, but this is an implementation detail.

With strings, the goal is that you don’t need to worry about coding, but simply how to manipulate characters “as is”. Thus, you can write string methods that do not require input decoding at all. Of course, there are many edge cases when it falls apart.

You cannot decode utf-16 by simply deleting zeros. I mean this will work fine for the first 256 Unicode code points, but you will get garbage if one of the other characters ~ 110,000 characters in Unicode is used. You can't even get the most popular non-ASCII characters like em dash or any smart quotes that work.

Also, looking at your example, it looks like UTF-16LE.

//Braindead decoder that assumes fully valid input function decodeUTF16LE( binaryStr ) { var cp = []; for( var i = 0; i < binaryStr.length; i+=2) { cp.push( binaryStr.charCodeAt(i) | ( binaryStr.charCodeAt(i+1) << 8 ) ); } return String.fromCharCode.apply( String, cp ); } var base64decode = atob; //In chrome and firefox, atob is a native method available for base64 decoding var base64 = "VABlAHMAdABpAG4AZwA"; var binaryStr = base64decode(base64); var result = decodeUTF16LE(binaryStr);

Now you can even use smart quotes:

 var base64 = "HCBoAGUAbABsAG8AHSA=" var binaryStr = base64decode(base64); var result = decodeUTF16LE(binaryStr); //""hello""

Convert UTF-16 to UTF-8 in JavaScript

More articles: