Performance of unsigned integers

Is there any gain / loss in performance when using unsigned integers over signed integers?

If so, is it short and long?

+55
c ++ c integer int unsigned
Jan 17 2018-11-11T00:
source share
11 answers

Grade 2 separation is faster with unsigned int , because it can be optimized in a single switch command. With signed int , more machine instructions are usually required, since the division is rounded to zero, but the shift to the right is rounded down. Example:

 int foo(int x, unsigned y) { x /= 8; y /= 8; return x + y; } 

Here is the relevant part of x (signed division):

 movl 8(%ebp), %eax leal 7(%eax), %edx testl %eax, %eax cmovs %edx, %eax sarl $3, %eax 

And here is the corresponding part of y (unsigned separation):

 movl 12(%ebp), %edx shrl $3, %edx 
+83
Jan 17 2018-11-11T00:
source share

In C ++ (and C), overflow integer overflow is undefined, while redefinition of unsigned integer is defined for wrapping. Note that, for example, in gcc, you can use the -fwrapv flag to make the signed overflow specific (for wrapping).

Undefined The indicated integer overflow allows the compiler to assume that no overflow occurs, which may provide optimization possibilities. See this blog post for a discussion.

+31
Jan 17 2018-11-11T00:
source share

This will depend on the exact implementation. In most cases, there will be no difference. If you really need to, you should try all the options that you consider and measure performance.

+17
Jan 17 '11 at 10:53
source share

unsigned results in the same or better performance as signed . Some examples:

  • Splitting into a constant that is 2 (see also answer from FredOverflow )
  • Division by a constant number (for example, my compiler implements division by 13, using 2 asm commands for unsigned and 6 instructions for signing)
  • Checking the parity of a number (I have no idea why my MS Visual Studio compiler implements it with 4 instructions for signed numbers, gcc does this with 1 instruction, as is the case with unsigned )

short usually results in the same or worse performance than int (assuming sizeof(short) < sizeof(int) ). Performance degradation occurs when you assign the result of an arithmetic operation (usually int , never short ) to a variable of type short , which is stored in the processor register (which also has type int ). All conversions from short to int are time consuming and annoying.

Note: some DSPs have fast multiplication commands for type signed short ; in this particular case, short faster than int .

As for the difference between int and long , I can only guess (I am not familiar with 64-bit architectures). Of course, if int and long are the same size (on 32-bit platforms), then their performance is also the same.




A very important addition, which several people point to:

What really matters for most applications is the amount of memory and bandwidth used. You should use the smallest required integers ( short , maybe even signed/unsigned char ) for large arrays.

This will give better performance, but the gain is non-linear (i.e. not 2 or 4 times) and somewhat unpredictable - it depends on the size of the cache and the relationship between calculations and memory transfers in your application.

+12
Jan 18 2018-11-18T00:
source share

It pretty much depends on the particular processor.

Most processors have instructions for both signed and unsigned arithmetic, so the difference between using unsigned integers and without it comes down to what the compiler uses.

If either of the two is faster, then it depends entirely on the processor, and most likely the difference is minimal, if it exists at all.

+8
Jan 17 '11 at 10:53
source share

The performance difference between unsigned and unsigned integers is actually more general than the accepted answer suggests. Separating an unsigned integer using any constant can be faster than dividing a signed integer by a constant, regardless of whether the constant is equal to two. See http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html

At the end of his message, he includes the following section:

The natural question is whether the same optimization can improve the signing of the division; Unfortunately, this does not seem to be for two reasons:

The dividend increment should be an increase in value, i.e. increments if n> 0, decrease if n <0. This leads to additional costs.

The penalty for an inconsistent divider is approximately half the amount of the signed division, leaving a small window for improvement.

Thus, it seems that the rounding algorithm can be performed to work in the signed division, but it will be worse than the standard rounding algorithm.

+3
04 Oct '14 at 15:47
source share

In short, don't worry about the fact. But keep worrying.

If you want to have performance, you need to use compiler performance optimization , which can work against common sense. It should be remembered that different compilers can compile code in different ways, and they themselves have different types of optimization. If we talk about the g++ and talk about maximizing its optimization level using -Ofast or at least the -O3 flag, in my experience, it can compile the long type into code with even greater performance than any unsigned type, or even just int

This is from my own experience, and I recommend that you first write a complete program and take care of such things only after that, when you have real code on hand, and you can compile it with optimization to try and choose the types that are actually better total work. This is also a very good general suggestion on optimizing code for performance, write quickly, try compiling with optimization, tune everything to see what works best. And you should also try using different compilers to compile your program and choose which one displays the most efficient machine code.

An optimized multi-threaded linear algebra calculation program can easily have a performance difference> 10x precisely optimized and non-optimized. So it matters.

The output of the optimizer in many cases contradicts the logic. For example, I had a case when the difference between a[x]+=b and a[x]=b changed the program execution time by almost 2x. And no, a[x]=b not faster.

Here, for example, NVidia states that to program its GPUs:

Note. As already recommended, practice, signed arithmetic should be preferable to unsigned arithmetic, where this is possibly the best bandwidth on SMM. The C language standard places more restriction on overflow behavior for unsigned math, limiting the compiler to optimization options.

+2
Aug 10 '16 at 8:01
source share

Traditionally int is its own integer format of the target hardware platform. Any other integer type can be penalized.

EDIT:

In modern systems, the situation is slightly different:

  • int can essentially be 32-bit on 64-bit systems for compatibility reasons. I believe this is happening on Windows systems.

  • Modern compilers can implicitly use int when performing calculations for shorter types in some cases.

0
Jan 17 '11 at 10:52
source share

IIRC, on x86 signed / unsigned should not make any difference. Short / long, on the other hand, is a different story because the amount of data that needs to be transferred to / from RAM is larger for long ones (other reasons may include casting operations, such as short to long elongation).

0
Jan 17 2018-11-11T00:
source share

An unsigned integer is advantageous in that you store and process both as a bit stream, I mean only unsigned data, so multiplication, deletion becomes easier (faster) with bit change operations

0
04 Oct '14 at 16:15
source share

Not only division by degrees 2 is faster with an unsigned type, division by any other values ​​also happens faster with an unsigned type. If you look at the Agner Fog instruction tables , you will see that unsigned divisions have similar or better performance than signed versions

For example, with AMD K7

 ╔═════════════╀══════════╀═════╀═════════╀═══════════════════════╗ β•‘ Instruction β”‚ Operands β”‚ Ops β”‚ Latency β”‚ Reciprocal throughput β•‘ ╠═════════════β•ͺ══════════β•ͺ═════β•ͺ═════════β•ͺ═══════════════════════╣ β•‘ DIV β”‚ r8/m8 β”‚ 32 β”‚ 24 β”‚ 23 β•‘ β•‘ DIV β”‚ r16/m16 β”‚ 47 β”‚ 24 β”‚ 23 β•‘ β•‘ DIV β”‚ r32/m32 β”‚ 79 β”‚ 40 β”‚ 40 β•‘ β•‘ IDIV β”‚ r8 β”‚ 41 β”‚ 17 β”‚ 17 β•‘ β•‘ IDIV β”‚ r16 β”‚ 56 β”‚ 25 β”‚ 25 β•‘ β•‘ IDIV β”‚ r32 β”‚ 88 β”‚ 41 β”‚ 41 β•‘ β•‘ IDIV β”‚ m8 β”‚ 42 β”‚ 17 β”‚ 17 β•‘ β•‘ IDIV β”‚ m16 β”‚ 57 β”‚ 25 β”‚ 25 β•‘ β•‘ IDIV β”‚ m32 β”‚ 89 β”‚ 41 β”‚ 41 β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• 

The same goes for the Intel Pentium.

 ╔═════════════╀══════════╀══════════════╗ β•‘ Instruction β”‚ Operands β”‚ Clock cycles β•‘ ╠═════════════β•ͺ══════════β•ͺ══════════════╣ β•‘ DIV β”‚ r8/m8 β”‚ 17 β•‘ β•‘ DIV β”‚ r16/m16 β”‚ 25 β•‘ β•‘ DIV β”‚ r32/m32 β”‚ 41 β•‘ β•‘ IDIV β”‚ r8/m8 β”‚ 22 β•‘ β•‘ IDIV β”‚ r16/m16 β”‚ 30 β•‘ β•‘ IDIV β”‚ r32/m32 β”‚ 46 β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• 

Of course, they are quite ancient. Newer architectures with more transistors can close the gap, but the main things apply: as a rule, you need more macros, more logic, more latency to perform signed division

0
Jun 09 '17 at 16:40
source share



All Articles