Check if char is equal to several other characters, with minimal branching

I am writing some kind of performance-sensitive C # code that deals with character comparisons. I recently discovered a trick where you can determine if char is equal to one or more others without branching if the difference between them is 2.

For example, let's say you want to check if char is U + 0020 (space) or U + 00A0 (non-breaking space). Since the difference between them is 0x80, you can do this:

public static bool Is20OrA0(char c) => (c | 0x80) == 0xA0;

as opposed to this naive implementation, which would add an extra branch if the character was not a space:

public static bool Is20OrA0(char c) => c == 0x20 || c == 0xA0;

How the first works, since the difference between the two characters is 2, it has exactly one bit. Thus, this means that when you are OR with a character, and this leads to a certain result, there are exactly 2 ^ 1 different characters that could lead to such a result.

Anyway, my question is, can this trick somehow extend to characters with differences not multiple of 2? For example, if I had characters #and 0(by the way, they have a difference of 13), is there any bit-twisting hack that I could use to check if any of them are char, without branching?

Thank you for your help.

edit: , .NET Framework, char.IsLetter. , a - A == 97 - 65 == 32, OR 0x20 char ( ToUpper).

+4
3

, , , , (, , , , ), , .

, # 0 ( 35 48) 13. , 2 13 8, 0,615384615 13. 256 , 8,8 , 158.

35 48, 158 :

34 * 158 = 5372 = 0001 0100 1111 1100
35 * 158 = 5530 = 0001 0101 1001 1010
36 * 158 = 5688 = 0001 0110 0011 1000

47 * 158 = 7426 = 0001 1101 0000 0010
48 * 158 = 7548 = 0001 1101 1010 0000
49 * 158 = 7742 = 0001 1110 0011 1110

7 , , , 5530 7548 11, , OR. 1111 0111 1000 0000 (63360), 0001 0101 1000 0000 (5504), :

public static bool Is23Or30(char c) => ((c * 158) & 63360) == 5504;

, .

- , , , , , , .

+2

2 ^ N , , , N . , 0x01, 0x03, 0x81, 0x83, N = 2, (c | 0x82) == 0x83. , 1 / 7. . , , , , .

(, VHDL). .

, , , , Unicode, , , (, , ..). ( ) ( / , , , ..)

, ( ), . .

+1

, , - :

if ( (x-c0|c0-x) & (x-c1|c1-x) & ... & (x-cn|cn-x) & 0x80) {
  // x is not equal to any ci

If x is not equal to a specific c, either xc or cx will be negative, so xc | cx will have bit 7. This should work with both signed and unsigned characters. If you and it are for all c, the result will have bit 7 set only if it is set for each c (i.e., X is not equal to any of them)

+1
source

All Articles