Javascript hexadecimal for binary using UTF8

I have data stored in an SQLite database as BINARY(16) , the value of which is determined by PHP hex2bin in a 32-character hexadecimal string.

As an example, line 434e405b823445c09cb6c359fb1b7918 returns CN@ [4EÀ¶ÃYûy .

The data stored in this database must be processed by JavaScript , and for this I used the following function (adapted from Andris answer here ):

 // Convert hexadecimal to binary string String.prototype.hex2bin = function () { // Define the variables var i = 0, l = this.length - 1, bytes = [] // Iterate over the nibbles and convert to binary string for (i; i < l; i += 2) { bytes.push(parseInt(this.substr(i, 2), 16)) } // Return the binary string return String.fromCharCode.apply(String, bytes) } 

This works as expected, returning CN@ [4EÀ¶ÃYûy from 434e405b823445c09cb6c359fb1b7918 .

However, the problem is that when I directly access the data returned by the PHP hex2bin function, I hex2bin string CN@ [ 4E Y y , and not CN@ [4EÀ¶ÃYûy . This makes it impossible for me to work between them (for context, JavaScript used to include a stand-alone iPad application that works with data received from a PHP web application), since I need to be able to use JavaScript to generate a 32-character hexadecimal string, convert put it into a binary string and work with the PHP function hex2bin (and SQLite HEX ).

This problem, I believe, is that JavaScript uses UTF-16 , while the binary string is saved as utf8_unicode_ci . My initial thought was that I needed to convert the string to UTF-8 . Using a Google search led me to here and a search on StackOverflow led me to a bobince answer here , both of which recommend using unescape(encodeURIComponent(str)) . However, this returns what I need ( CN@ [ 4E Y y ):

 // CN@ [Â4EöÃYûy unescape(encodeURIComponent('434e405b823445c09cb6c359fb1b7918'.hex2bin())) 

So my question is:

How to use JavaScript to convert a hexadecimal string to a binary string UTF-8 ?

+4
source share
3 answers

Your applications should not process binary files at any time. Insert is the last possible point, and where you convert to binary. The choice is the earliest possible point, and where you convert to hex and use hexadecimal strings in the application.

When pasting, you can replace UNHEX with blob literals:

 INSERT INTO table (id) VALUES (X'434e405b823445c09cb6c359fb1b7918') 

When choosing you can HEX :

 SELECT HEX(id) FROM table 
+1
source

According to the hexadecimal string UTF-8, `hex ',

 hex.replace(/../g, '%$&') 

will create a UTF-8 string encoded by a URI.

decodeURIComponent converts URI-encoded URI-8 sequences to encoded UTF-16 strings in JavaScript, therefore

 decodeURIComponent(hex.replace(/../g, '%$&')) 

must decode a correctly encoded UTF-8 string with hexadecimal encoding.

You can see that it works by applying it for example from the hex2bin documentation.

 alert(decodeURIComponent('6578616d706c65206865782064617461'.replace(/../g, '%$&'))); // alerts "example hex data" 

The string you specified is not UTF-8 encoded. In particular,

 434e405b823445c09cb6c359fb1b7918 ^ 

82 must follow the byte with at least two bits set, and 5b is not such a byte.

RFC 2279 explains:

The table below shows the format of these different types of octets. The letter x indicates the bits available for encoding bits of the UCS-4 character value.

 UCS-4 range (hex.) UTF-8 octet sequence (binary) 0000 0000-0000 007F 0xxxxxxx 0000 0080-0000 07FF 110xxxxx 10xxxxxx 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx 
+2
source

Turning around on Mike, answer here on the code for encoding and decoding.

Note that escape/unescape() functions are deprecated. If you need policies for them, you can check out a more complete UTF-8 encoding example found here: http://jsfiddle.net/47zwb41o

 // UTF-8 to hex var utf8ToHex = function( s ){ s = unescape( encodeURIComponent( s ) ); var chr, i = 0, l = s.length, out = ''; for( ; i < l; i++ ){ chr = s.charCodeAt( i ).toString( 16 ); out += ( chr.length % 2 == 0 ) ? chr : '0' + chr; } return out; }; // Hex to UTF-8 var hexToUtf8 = function( s ){ return decodeURIComponent( s.replace( /../g, '%$&' ) ); }; 
0
source

All Articles