ANSI-C: maximum number of characters printing a decimal string int

I would like to know if this is an easy way to determine the maximum number of characters to print a decimal int .

I know that <limits.h> contains definitions of type INT_MAX that say the maximum value can accept int, but that is not what I want.

I would like to do something like:

 int get_int( void ) { char draft[ MAX_CHAR_OF_A_DECIMAL_INT ]; fgets( draft, sizeof( draft ), stdin ); return strtol( draft, NULL, 10 ); } 

But how to find the value of MAX_CHAR_OF_A_DECIMAL_INT in the portable and low MAX_CHAR_OF_A_DECIMAL_INT mode?

Thanks!

+8
c string type-conversion int
source share
6 answers

I don't know if this trick can do what you want in simple ANSI-C, but in C ++ you can easily use template metaprogramming:

 #include <iostream> #include <limits> #include <climits> template< typename T, unsigned long N = INT_MAX > class MaxLen { public: enum { StringLen = MaxLen< T, N / 10 >::StringLen + 1 }; }; template< typename T > class MaxLen< T, 0 > { public: enum { StringLen = 1 }; }; 

And you can call it from your pure-C code by creating an additional C ++ function, for example:

 extern "C" int int_str_max( ) { return MaxLen< int >::StringLen; } 

It has ZERO runtime overhead and the exact location is calculated.


You can test the above patterns with something like:

 int main( ) { std::cout << "Max: " << std::numeric_limits< short >::max( ) << std::endl; std::cout << "Digits: " << std::numeric_limits< short >::digits10 << std::endl; std::cout << "A \"short\" is " << sizeof( short ) << " bytes." << std::endl << "A string large enough to fit any \"short\" is " << MaxLen< short, SHRT_MAX >::StringLen << " bytes wide." << std::endl; std::cout << "Max: " << std::numeric_limits< int >::max( ) << std::endl; std::cout << "Digits: " << std::numeric_limits< int >::digits10 << std::endl; std::cout << "An \"int\" is " << sizeof( int ) << " bytes." << std::endl << "A string large enough to fit any \"int\" is " << MaxLen< int >::StringLen << " bytes wide." << std::endl; std::cout << "Max: " << std::numeric_limits< long >::max( ) << std::endl; std::cout << "Digits: " << std::numeric_limits< long >::digits10 << std::endl; std::cout << "A \"long\" is " << sizeof( long ) << " bytes." << std::endl << "A string large enough to fit any \"long\" is " << MaxLen< long, LONG_MAX >::StringLen << " bytes wide." << std::endl; return 0; } 

Output:

 Max: 32767 Digits: 4 A "short" is 2 bytes. A string large enough to fit any "short" is 6 bytes wide. Max: 2147483647 Digits: 9 An "int" is 4 bytes. A string large enough to fit any "int" is 11 bytes wide. Max: 9223372036854775807 Digits: 18 A "long" is 8 bytes. A string large enough to fit any "long" is 20 bytes wide. 
  • Pay attention to slightly different values โ€‹โ€‹from std::numeric_limits< T >::digits10 and MaxLen <T, N> :: StringLen, since the first ones do not take numbers into account if they cannot reach "9". Of course, you can use it and just add two, if in some cases you do not need to spend one byte.

EDIT:

Some may have found weird ones, including <climits> . If you can count on C ++ 11, you donโ€™t need it, and you get additional simplicity:

 #include <iostream> #include <limits> template< typename T, unsigned long N = std::numeric_limits< T >::max( ) > class MaxLen { public: enum { StringLen = MaxLen< T, N / 10 >::StringLen + 1 }; }; template< typename T > class MaxLen< T, 0 > { public: enum { StringLen = 1 }; }; 

Now you can use

 MaxLen< short >::StringLen 

instead

 MaxLen< short, SHRT_MAX >::StringLen 

Okay not?

+2
source share

If you assume that CHAR_BIT is 8 (required for POSIX, so this is a safe assumption for any code targeting POSIX systems, as well as any other main system, such as Windows), the cheap safe formula is 3*sizeof(int)+2 . If not, you can make it 3*sizeof(int)*CHAR_BIT/8+2 or a slightly simpler version.

If you're wondering why this works, sizeof(int) is essentially the logarithm of INT_MAX (roughly log base 2 ^ CHAR_BIT), and the conversion between logarithms of different bases (e.g. to base 10) is just a multiplication. In particular, 3 is an integer approximation / upper bound based on a 10 out of 256 journal.

+2 - accounting for the possible sign and zero completion.

+9
source share

The easiest canonical and possibly the most portable way is to ask snprintf() how much space is needed:

 char sbuf[2]; int ndigits; ndigits = snprintf(sbuf, (size_t) 1, "%lld", (long long) INT_MIN); 

slightly less portable, possibly using intmax_t and %j :

 ndigits = snprintf(sbuf, (size_t) 1, "%j", (intmax_t) INT_MIN); 

It can be considered that it is too expensive to execute at runtime, but it can work for any value, not just MIN / MAX values โ€‹โ€‹of any integer type.

Of course, you could just simply calculate the number of digits that the required integer should be expressed in Base 10 notation with a simple recursive function:

 unsigned int numCharsB10(intmax_t n) { if (n < 0) return numCharsB10((n == INTMAX_MIN) ? INTMAX_MAX : -n) + 1; if (n < 10) return 1; return 1 + numCharsB10(n / 10); } 

but this, of course, also requires the processor at runtime, even when it is built-in, although perhaps a little less than snprintf() .

@R. the answer is higher, although more or less wrong, but on the right track. Here is the correct conclusion of some very well and widely tested and portable macros that implement compile-time calculations using sizeof() using a little @R correction. initial wording to start:

First, we can easily see (or show) that sizeof(int) is the log base 2 of UINT_MAX divided by the number of bits represented by one unit sizeof() (8, aka CHAR_BIT ):

sizeof (int) == log2 (UINT_MAX) / 8

because UINT_MAX , of course, is only 2 ^ (sizeof (int) * 8)), and log2 (x) is the inverse of 2 ^ x.

We can use the identity "logb (x) = log (x) / log (b)" (where log () is the natural logarithm) to find the logarithms of other bases. For example, you can calculate "database 2" "x" using:

log2 (x) = log (x) / log (2)

and:

log10 (x) = log (x) / log (10)

So, we can conclude that:

log10 (v) = log2 (v) / log2 (10)

Now what we want at the end is base 10 of the UINT_MAX , since UINT_MAX (10) is about 3, and since we know from the top that log2 () is in terms of sizeof() , we can say that log10 ( UINT_MAX ) approximately:

log10 (2 ^ (sizeof (int) * 8)) ~ = (sizeof (int) * 8) / 3

This is not ideal, especially since we really want this ceiling value, but with some minor adjustments to account for the integer rounding of log2 (10) to 3, we can get what we need by first adding it to log2, then subtracting 1 from result for any larger integer, as a result we get this โ€œreasonably goodโ€ expression:

 #if 0 #define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \ ((((sizeof(t) * CHAR_BIT) + 1) / 3) - ((sizeof(t) > 2) ? 1 : 0)) #endif 

Even better, we can multiply our first log2 () by 1 / log2 (10) (multiplying by the inverse of the divisor is the same as dividing by the divisor), and this makes it possible to find the best integer approximation. I recently (re?) Came across this proposal while reading Sean Anderson's cue ball: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10

To do this with integer math in the best approximation, we need to find the ideal relation representing our inverse. This can be found by looking for the smallest fractional part of multiplying our desired value 1 / log2 (10) by successive degrees 2 within a reasonable range of degrees 2, for example, with the following small AWK script:

  awk 'BEGIN { minf=1.0 } END { for (i = 1; i <= 31; i++) { a = 1.0 / (log(10) / log(2)) * 2^i if (a > (2^32 / 32)) break; n = int(a) f = a - (n * 1.0) if (f < minf) { minf = f minn = n bits = i } # printf("a=%f, n=%d, f=%f, i=%d\n", a, n, f, i) } printf("%d + %f / %d, bits=%d\n", minn, minf, 2^bits, bits) }' < /dev/null 1233 + 0.018862 / 4096, bits=12 

Thus, we can get a good integer approximation of multiplying our value of log2 (v) by 1 / log2 (10) by multiplying it by 1233 followed by a right shift of 12 (2 ^ 12 is, of course, 4096):

log10 (UINT_MAX) ~ = ((sizeof (int) * 8) + 1) * 1233 โ†’ 12

and, together with the addition of one, to do the equivalent of finding the ceiling value, eliminates the need to mess with odd values:

 #define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \ (((((sizeof(t) * CHAR_BIT)) * 1233) >> 12) + 1) /* * for signed types we need room for the sign, except for int64_t */ #define __MAX_B10STRLEN_FOR_SIGNED_TYPE(t) \ (__MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) + ((sizeof(t) == 8) ? 0 : 1)) /* * NOTE: this gives a warning (for unsigned types of int and larger) saying * "comparison of unsigned expression < 0 is always false", and of course it * is, but that what we want to know (if indeed type 't' is unsigned)! */ #define __MAX_B10STRLEN_FOR_INT_TYPE(t) \ (((t) -1 < 0) ? __MAX_B10STRLEN_FOR_SIGNED_TYPE(t) \ : __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t)) 

whereas normally the compiler will evaluate the expression my __MAX_B10STRLEN_FOR_INT_TYPE() macro at compile time. Of course, my macro always calculates the maximum space required by a given integer type, not the exact space needed for a specific integer value.

+2
source share

After accepting the answer (2+ years)

The following fraction 10/33 exactly matches the needs for the unoccupied int8_t , int16_t , int32_t and int128_t . Only 1 char extra for int64_t . Exact or 1 for all integer sizes up to int362_t . In addition, there may be more than 1.

 #include <limits.h> #define MAX_CHAR_LEN_DECIMAL_INTEGER(type) (10*sizeof(type)*CHAR_BIT/33 + 2) #define MAX_CHAR_SIZE_DECIMAL_INTEGER(type) (10*sizeof(type)*CHAR_BIT/33 + 3) int get_int( void ) { // + 1 for the \n of fgets() char draft[MAX_CHAR_SIZE_DECIMAL_INTEGER(long) + 1]; //** fgets(draft, sizeof draft, stdin); return strtol(draft, NULL, 10); } 

** fgets() usually works best with extra char for trailing '\n' .

Similar to @R .. but with a better share.


Recommend the use of generous 2x buffers when reading user input. Sometimes the user adds spaces, leading zeros, etc.

  char draft[2*(MAX_CHAR_SIZE_DECIMAL_INTEGER(long) + 1)]; fgets(draft, sizeof draft, stdin); 
+2
source share

Here's the C version:

 #include <limits.h> #define xstr(s) str(s) #define str(s) #s #define INT_STR_MAX sizeof(xstr(INT_MAX)) char buffer[INT_STR_MAX]; 

Then:

 $ gcc -E -o str.cpp str.c $ grep buffer str.cpp char buffer[sizeof("2147483647")]; $ gcc -S -o str.S str.c $ grep buffer str.S .comm buffer,11,1 
+1
source share

You can calculate the number of digits using the base of 10. In my system, calculating the ceiling of database 2 using the bit representation of the number did not provide a significant increase in speed. In the database field 10 + 1, the number of digits is indicated, I add 2 to account for the zero character and sign.

 #include <limits.h> #include <stdio.h> #include <math.h> int main(void){ printf("%d %d\n", INT_MAX, (int)floor(log10(INT_MAX)) + 3); return 0; } 

Also note that the number of bytes from int can be 2 or 4, and 2 only on older systems, so you can calculate the upper bound and use it in your program.

0
source share

All Articles