Fast strlen with bit operations

I found this code

int strlen_my(const char *s) { int len = 0; for(;;) { unsigned x = *(unsigned*)s; if((x & 0xFF) == 0) return len; if((x & 0xFF00) == 0) return len + 1; if((x & 0xFF0000) == 0) return len + 2; if((x & 0xFF000000) == 0) return len + 3; s += 4, len += 4; } } 

I am very interested to know how this works. Can anyone explain how this works?

+6
source share
6 answers

Bitwise And with one it will extract a bit pattern from the other operand. Value 10101 & 11111 = 10101 . If the result of this bitwise AND is 0, then we know that we know that the other operand is 0. The result is 0 when ANDing of one byte with 0xFF (ones) will indicate NULL bytes.

The code itself checks every byte of the char array in four-byte partitions. NOTE. This code is not portable; on another machine or compiler, unsigned int can be more than 4 bytes. It is probably best to use the uint32_t data type to provide 32-bit unsigned integers.

The first thing to note is that on a machine with small bytes, the bytes that make up the character array will be read in the unsigned data type in reverse order; that is, if the four bytes at the current address are the bitmap corresponding to abcd , then the unsigned variable will contain the bit pattern corresponding to dcba .

The second is that the constant of a hexadecimal number in C results in an int-size number with the specified bytes at the low end of the bitmap. The value 0xFF is actually 0x000000FF when compiling with 4-byte ints. 0xFF00 0x0000FF00 . And so on.

Thus, the program basically searches for the NULL character in four possible positions. If there is no NULL character in the current section, it moves to the next four-byte slot.

Take the char abcdef array for example. In C, string constants will always have null terminators at the end, so there is a 0x00 byte at the end of this line.

It will work as follows:

Read "abcd" in unsigned int x:

 x: 0x64636261 [ASCII representations for "dcba"] 

Check each byte for a null terminator:

  0x64636261 & 0x000000FF 0x00000061 != 0, 0x64636261 & 0x0000FF00 0x00006200 != 0, 

And check out the two other positions; There are no null terminators in this 4-byte section, so move on to the next section.

Read "ef" in unsigned int x:

 x: 0xBF006665 [ASCII representations for "fe"] 

Pay attention to the bytes 0xBF; this is minus the length of the string, so we read garbage from the runtime stack. It could be anything. On a machine that does not allow unrelated accesses, this will crash if the memory after the line is not aligned by 1 byte. If only one character remained in the string, we would read two additional bytes, so the alignment of the memory adjacent to the char array would have to be aligned by 2 bytes.

Check each byte for a null terminator:

  0xBF006665 & 0x000000FF 0x00000065 != 0, 0xBF006665 & 0x0000FF00 0x00006600 != 0, 0xBF006665 & 0x00FF0000 0x00000000 == 0 !!! 

So we return len + 2 ; len was 4 since we incremented it once by 4, so we return 6, which is really the length of the string.

+3
source

It handles undefined behavior (low hits, 75% chance of accessing outside the array) for very dubious acceleration (possibly even slower). And does not conform to the standard, because it returns int instead of size_t . Even if unrelated calls are allowed on the platform, they can be much slower than agreed calls.

It also does not work on systems with large numbers, or if unsigned not 32 bits. Not to mention multiple mask and conditional operations.

That said:

It tests 4 8-bit bytes at a time, loading unsigned (which is not even guaranteed to have more than 16 bits). As soon as any of the bytes contains the '\0' -terminator, it returns the sum of the current length plus the position of this byte. Otherwise, it increases the current length by the number of bytes checked in parallel (4), and gets the next unsigned .

My advice: a bad optimization example plus too many uncertainties / traps. Most likely, not even faster - just project it onto the standard version:

 size_t strlen(restrict const char *s) { size_t l = 0; while ( *s++ ) l++; return l; } 

There may be a way to use special vector instructions, but if you cannot prove that this is a critical function, you should leave it in the compiler - some of them can expand / speed up such cycles much better.

+3
source

The code "works", trying to read 4 bytes at a time, assuming the string is laid out and available as an int array. The code reads the first int , and then each byte in turn, checking to see if it is a null character. Theoretically, code that works with int will work faster than 4 separate char operations.

But there are problems:

Alignment is a problem: for example. *(unsigned*)s may crash.

Endian - a problem with if((x & 0xFF) == 0) may not receive a byte at s

s += 4 is a problem since sizeof(int) may differ from 4.

Array types can exceed int range, it is better to use size_t .


An attempt to eliminate these difficulties.

 #include <stddef.h> #include <stdio.h> static inline aligned_as_int(const char *s) { max_align_t mat; // C11 uintptr_t i = (uintptr_t) s; return i % sizeof mat == 0; } size_t strlen_my(const char *s) { size_t len = 0; // align while (!aligned_as_int(s)) { if (*s == 0) return len; s++; len++; } for (;;) { unsigned x = *(unsigned*) s; #if UINT_MAX >> CHAR_BIT == UCHAR_MAX if(!(x & 0xFF) || !(x & 0xFF00)) break; s += 2, len += 2; #elif UINT_MAX >> CHAR_BIT*3 == UCHAR_MAX if (!(x & 0xFF) || !(x & 0xFF00) || !(x & 0xFF0000) || !(x & 0xFF000000)) break; s += 4, len += 4; #elif UINT_MAX >> CHAR_BIT*7 == UCHAR_MAX if ( !(x & 0xFF) || !(x & 0xFF00) || !(x & 0xFF0000) || !(x & 0xFF000000) || !(x & 0xFF00000000) || !(x & 0xFF0000000000) || !(x & 0xFF000000000000) || !(x & 0xFF00000000000000)) break; s += 8, len += 8; #else #error TBD code #endif } while (*s++) { len++; } return len; } 
+3
source

All sentences there are slower than simple strlen ().

The reason is that they do not reduce the number of comparisons, and only one relates to alignment.

Check the strlen () sentence from Torbjorn Granlund ( tege@sics.se ) and Dan Sahlin ( dan@sics.se ) online. If you are on a 64-bit platform, it really helps to speed things up.

+2
source

Determines if any bits are set on a specific byte on a machine with small bytes. Since we check only one byte (since all nibbles, 0 or 0xF, are doubled), and this is usually the last byte position (since the machine is not significant, and the byte pattern for numbers is therefore canceled), we can immediately find out which byte contains NUL.

+1
source

The loop takes 4 bytes of char array for each iteration. Four if statements are used to determine if a line has ended, using a bitmask with an AND operator to read the state of the ith element of the selected substring.

+1
source

All Articles