String (const char *, size_t) in int?

What is the fastest way to convert the string represented by (const char *, size_t) to int?

The line does not end with zero. Both of these methods include string copying (and much more) that I would like to avoid.

And yes, this function is called several million times per second .: P

int to_int0(const char* c, size_t sz) { return atoi(std::string(c, sz).c_str()); } int to_int1(const char* c, size_t sz) { return boost::lexical_cast<int>(std::string(c, sz)); } 
0
c ++
Mar 08 2018-12-12T00:
source share
6 answers

Fastest:

 int to_int(char const *s, size_t count) { int result = 0; size_t i = 0 ; if ( s[0] == '+' || s[0] == '-' ) ++i; while(i < count) { if ( s[i] >= '0' && s[i] <= '9' ) { //see Jerry comments for explanation why I do this int value = (s[0] == '-') ? ('0' - s[i] ) : (s[i]-'0'); result = result * 10 + value; } else throw std::invalid_argument("invalid input string"); i++; } return result; } 

Since in the above code the comparison (s[0] == '-') is performed at each iteration, we can avoid this by calculating result as a negative number in the loop, and then return result if s[0] really '-' otherwise return -result (which makes it a positive number, as it should be):

 int to_int(char const *s, size_t count) { size_t i = 0 ; if ( s[0] == '+' || s[0] == '-' ) ++i; int result = 0; while(i < count) { if ( s[i] >= '0' && s[i] <= '9' ) { result = result * 10 - (s[i] - '0'); //assume negative number } else throw std::invalid_argument("invalid input string"); i++; } return s[0] == '-' ? result : -result; //-result is positive! } 

This is an improvement!




In C ++ 11, you can use any function from the std::stoi . There is also a std::to_string family.

+2
Mar 08 2018-12-12T00:
source share

Given a counted string like this one, you can get a little speed by doing the conversion yourself. Depending on how reliable the code should be, this can be quite difficult. For the moment, let’s assume the simplest case is that we are sure that the string is valid, containing only digits (no negative numbers at the moment), and the number it represents is always in the range of int. In this case:

 int to_int2(char const *c, size_t sz) { int retval = 0; for (size_t i=0; i<sz; i++) retval *= 10; retval += c[i] -'0'; } return retval; } 

From there, you can characterize as complex as you want - handling a passing / trailing space, '-' (but doing it right for the most negative number in 2 additions is not always trivial [edit: see Nawaz answer for one solution to this]), grouping numbers etc.

+3
Mar 08 2018-12-12T00:
source share

Another slow version for uint32:

 void str2uint_aux(unsigned& number, unsigned& overflowCtrl, const char*& ch) { unsigned digit = *ch - '0'; ++ch; number = number * 10 + digit; unsigned overflow = (digit + (256 - 10)) >> 8; // if digit < 10 then overflow == 0 overflowCtrl += overflow; } unsigned str2uint(const char* s, size_t n) { unsigned number = 0; unsigned overflowCtrl = 0; // for VC++10 the Duff device is faster than loop switch (n) { default: throw std::invalid_argument(__FUNCTION__ " : `n' too big"); case 10: str2uint_aux(number, overflowCtrl, s); case 9: str2uint_aux(number, overflowCtrl, s); case 8: str2uint_aux(number, overflowCtrl, s); case 7: str2uint_aux(number, overflowCtrl, s); case 6: str2uint_aux(number, overflowCtrl, s); case 5: str2uint_aux(number, overflowCtrl, s); case 4: str2uint_aux(number, overflowCtrl, s); case 3: str2uint_aux(number, overflowCtrl, s); case 2: str2uint_aux(number, overflowCtrl, s); case 1: str2uint_aux(number, overflowCtrl, s); } // here we can check that all chars were digits if (overflowCtrl != 0) throw std::invalid_argument(__FUNCTION__ " : `s' is not a number"); return number; } 

Why is it slow? Because it processes characters one by one. If we had a guarantee that we can access bytes up to s+16 , we can use vectorization for *ch - '0' and digit + 246 .
Like in this code:

  uint32_t digitsPack = *(uint32_t*)s - '0000'; overflowCtrl |= digitsPack | (digitsPack + 0x06060606); // if one byte is not in range [0;10), high nibble will be non-zero number = number * 10 + (digitsPack >> 24) & 0xFF; number = number * 10 + (digitsPack >> 16) & 0xFF; number = number * 10 + (digitsPack >> 8) & 0xFF; number = number * 10 + digitsPack & 0xFF; s += 4; 



A small update to check the range:
the first fragment has an excess shift (or mov ) at each iteration, so it should be

 unsigned digit = *s - '0'; overflowCtrl |= (digit + 256 - 10); ... if (overflowCtrl >> 8 != 0) throw ... 
+3
Mar 08 '12 at 18:20
source share
 llvm::StringRef s(c,sz); int n; s.getAsInteger(10,n); return n; 

http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html

+1
Mar 08 2018-12-12T00:
source share

If you often run this function, I bet you are parsing the same number many times. My suggestion is for the BCD to encode the string into a static char buffer (you know it won’t be very long since atoi can handle + -2G) when there are less than X digits (X = 8 for 32 bit lookup, X = 16 for a 64-bit search), then put the cache in the hash map.

When you finish with the first version, you may find some good optimizations, such as skipping the BCD encoding completely and just using the X characters in the string (when the length of the string is <= X) to search the hash table. If the string is longer, you return to atoi .

Edit : ... or roll back instead of atoi to Jerry Coffin's solution, which is as fast as they are.

0
Mar 08 '12 at 16:00
source share

You will have to either write your own routine or use a third-party library if you are dead to avoid copying strings.

You probably don't want to write atoi from scratch (you can still make a mistake here), so I would advise you to grab the existing atoi from the public domain or the BSD license code and modify it. For example, you can get an existing atoi from the FreeBSD cvs tree .

0
Mar 08 2018-12-12T00:
source share



All Articles