Faster strlen?

Question

Faster strlen?

Typical strlen() goes from the first character until it finds \0 . This requires you to go through each character. In the sense of the algorithm, its O (N).

Is there a faster way to do this when input is vaguely defined. For example: the length will be less than 50 or the length will be about 200 characters.

I thought about the search blocks and all, but did not get any optimization.

+8

c string algorithm

Jack Nov 21 '09 at 7:08

source share

8 answers

Of course. Keep track of the length while writing to a string.

+23

Bob aman Nov 21 '09 at 7:12

source share

Obviously, if your string has a known minimum length, you can start searching at that position.

Other than this, you cannot do anything; if you try to do something smart and find the byte \0 , you still need to check every byte between the beginning of the line and the dot to make sure that there was no \0 before.

This is not to say that strlen cannot be optimized. It can be pipelined, and it can be made to handle word sizes or vector pieces with each comparison. On most architectures, some combinations of these and other approaches will lead to significant acceleration with a constant coefficient compared to a naive byte comparison cycle. Of course, on most mature platforms, strlen already implemented using these methods.

+9

Stephen canon Nov 21 '09 at 7:14

source share

The short answer is no.

Longer answer: do you really think that if there was a faster way to check the string length for C strings of barebone strings, then, as is commonly used as a C string library, would not be already included?

Without any additional string knowledge, you need to check each character. If you want to store this additional information, you can create a struct that will save the length as a field in the structure (in addition to the actual character array / pointer for the string), in which case you could do a constant search time along the length, but with each changing the line you will have to update this field.

+6

Amber Nov 21 '09 at 7:12

source share

Jack

strlen works by looking for the ending '\ 0', here is an implementation taken from OpenBSD:

 size_t strlen(const char *str) { const char *s; for (s = str; *s; ++s) ; return (s - str); }

Now think that you know that the length is about 200 characters, as you said. Say you start at 200 and iterate up and down for "\ 0". You found it at around 204, what does this mean? What is a string with a length of 204 characters? NOT! It could end before that with another '\ 0', and everything you did looked out of bounds.

+4

Eli bendersky Nov 21 '09 at 7:23

source share

You can try using the vector. Not sure if the compiler can execute it, but I did it manually (using the built-in functions). But this can only help you for long lines.

Use stl strings, more secure and the std :: string class contains its length.

+3

Elalfer Nov 21 '09 at 7:14

source share

Get a Core i7 processor.

Core i7 comes with SSE 4.2 instruction set. Intel has added four additional vector instructions to speed up strlen tasks and related tasks.

Here are some interesting thoughts on the new instructions:

http://smallcode.weblogs.us/oldblog/2007/11/

+3

Nils pipenbrinck Nov 21 '09 at 11:55

source share

Here I have attached asm code from glibc 2.29. I deleted the fragment for the ARM processor. I checked it, it is really fast, exceeded all my expectations. It is easy to do the alignment, then compare 4 bytes.

 ENTRY(strlen) bic r1, r0, $3 @ addr of word containing first byte ldr r2, [r1], $4 @ get the first word ands r3, r0, $3 @ how many bytes are duff? rsb r0, r3, $0 @ get - that number into counter. beq Laligned @ skip into main check routine if no more orr r2, r2, $0x000000ff @ set this byte to non-zero subs r3, r3, $1 @ any more to do? orrgt r2, r2, $0x0000ff00 @ if so, set this byte subs r3, r3, $1 @ more? orrgt r2, r2, $0x00ff0000 @ then set. Laligned: @ here, we have a word in r2. Does it tst r2, $0x000000ff @ contain any zeroes? tstne r2, $0x0000ff00 @ tstne r2, $0x00ff0000 @ tstne r2, $0xff000000 @ addne r0, r0, $4 @ if not, the string is 4 bytes longer ldrne r2, [r1], $4 @ and we continue to the next word bne Laligned @ Llastword: @ drop through to here once we find a tst r2, $0x000000ff @ word that has a zero byte in it addne r0, r0, $1 @ tstne r2, $0x0000ff00 @ and add up to 3 bytes on to it addne r0, r0, $1 @ tstne r2, $0x00ff0000 @ (if first three all non-zero, 4th addne r0, r0, $1 @ must be zero) DO_RET(lr)

END (StrLen)

0

DAG May 22 '19 at 5:28

source share

Pascal cuoq · Accepted Answer · 2009-11-21T09:39:29+0000

In fact, the glibc strlen implementation is an interesting example of a vectorization approach. This is characteristic in that it does not use vector instructions, but finds a way to use only ordinary instructions for 32 or 64 bits of words from the buffer.

Faster strlen?

More articles: