Bitwise string operations in javascript

In javascript, the following character-by-character binary operations test prints 0 676 times:

 var s = 'abcdefghijklmnopqrstuvwxyz'; var i, j; for(i=0; i<s.length;i++){ for(j=0; j<s.length;j++){ console.log(s[i] | s[j]) }}; 

If js used the actual binary string representation, I would expect some non-zero values ​​here.

Similarly, testing binary operations on strings and integers, the next print is 255 and 0 s, respectively. (255 was selected because it is 11111111 in binary format).

 var s = 'abcdefghijklmnopqrstuvwxyz'; var i; for(i=0; i<s.length;i++){ console.log(s[i] | 255) } var i; for(i=0; i<s.length;i++){ console.log(s[i] & 255) } 

What is javascript doing here? Javascript seems to distinguish any string to false before binary operations.

Notes

If you try this in python, this will throw an error:

 >>> s = 'abcdefghijklmnopqrstuvwxyz' >>> [c1 | c2 for c2 in s for c1 in s] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for |: 'str' and 'str' 

But things like this work in php .

+7
javascript binary
source share
3 answers

In JavaScript, when a string is used with a binary operator, it is first converted to a number. The following are relevant parts of the ECMAScript specification to explain how this works.

Bitwise Operators :

The product A: A @B, where @ is one of the bitwise operators in the above products, is estimated as follows:

  • Let lref be the result of A.
  • Let lval be GetValue (lref).
  • Let rref be the result of B.
  • Let rval be GetValue (rref).
  • Let lnum be ToInt32 (lval).
  • Let rnum be ToInt32 (rval).
  • Return the result of applying the bitwise operator @ to lnum and rnum. The result is a signed 32-bit integer.

ToInt32 :

The abstract operation ToInt32 converts its argument to one of two values 32 in the range from -231 to 231-1 inclusive. This abstract operation works as follows:

  • Let the number be the result of calling ToNumber on the input argument.
  • If the number is NaN, +0, -0, + ∞ or -∞, return +0.
  • Let posInt be the sign (number) * gender (abs (number)).
  • Let int32bit be posInt modulo 2 32 ; those. a finite integer k of type Number with a positive sign and less than 2 32 in magnitude, so that the mathematical difference of posInt and k is mathematically an integer multiple of 2 32 .
  • If int32bit is greater than or equal to 2 31, return int32bit - 2 32 otherwise return int32bit.

The internal ToNumber function returns NaN for any string that cannot be analyzed as a number, and ToInt32 (NaN) will give 0. Thus, in the code example, all bitwise operators with letters as operands will be evaluated as 0 | 0 0 | 0 , which explains why only 0 is printed.

Note that something like '7' | '8' '7' | '8' will be rated as 7 | 8 7 | 8 , because in this case the strings used as operands can be successfully converted to numbers.

Regarding why the behavior in Python is different, there is no implicit type conversion in Python, so an error is expected for any type that does not implement binary operators (using __or__ , __and__ , etc.), and strings do not implement these binary operators.

Perl does something completely different, bitwise operators are implemented for strings , and it will essentially execute a bitwise operator for the corresponding bytes from each string.

If you want to use JavaScript and get the same result as Perl, you first need to convert the characters to their code points using str.charCodeAt , execute the bitwise operator in the resulting integers, and then use String.fromCodePoint to convert the resulting numeric values into characters.

+8
source share

I would be surprised if JavaScript worked at all with bitwise operations on non-numeric strings and created something meaningful. I would suggest that since any bitwise operator in JavaScript converts its operand to a 32-bit integer, it will simply turn all non-numeric strings to 0 .

I would use ...

 "a".charCodeAt(0) & 0xFF 

This creates 97 , the ASCII code for "a", which is correct if it is masked by byte with all bits set.

Try to remember this, because everything works well in other languages, this is not always the case in JavaScript. We are talking about a language conceived and implemented in a very short period of time.

+4
source share

JavaScript uses type coercion, which allows it to automatically parse strings as numbers when you try to perform a numerical operation on them. The analyzed value is 0 or more probabilities of NaN . This clearly will not give you the information you are trying to get.

I think you're looking for a charCodeAt that will allow you to get a Unicode numeric value for a character in a string and maybe an additional fromCodePoint that will convert a numeric value back to a character.

+3
source share

All Articles