Why do UInt16 arrays seem to add faster than int arrays?

C # seems to be faster when adding two UInt16[] arrays than when adding two int[] arrays. This makes no sense to me, since I assumed that the arrays would be word aligned, and therefore int[] would require less work from the CPU, no?

I ran the test code below and got the following results:

 Int for 1000 took 9896625613 tick (4227 msec) UInt16 for 1000 took 6297688551 tick (2689 msec) 

The verification code performs the following actions:

  • Creates two arrays named a and b , once.
  • Fills them with random data once.
  • Starts a stopwatch.
  • Adds a and b , for each element. This is done 1000 times.
  • Stopwatch stop.
  • Tells you how much time has passed.

This is done for int[] a, b and for UInt16 a,b . And each time I run the code, tests for UInt16 arrays take 30-50% less time than int arrays. Can you explain this to me?

Here is the code if you want to try, if for yourself:

 public static UInt16[] GenerateRandomDataUInt16(int length) { UInt16[] noise = new UInt16[length]; Random random = new Random((int)DateTime.Now.Ticks); for (int i = 0; i < length; ++i) { noise[i] = (UInt16)random.Next(); } return noise; } public static int[] GenerateRandomDataInt(int length) { int[] noise = new int[length]; Random random = new Random((int)DateTime.Now.Ticks); for (int i = 0; i < length; ++i) { noise[i] = (int)random.Next(); } return noise; } public static int[] AddInt(int[] a, int[] b) { int len = a.Length; int[] result = new int[len]; for (int i = 0; i < len; ++i) { result[i] = (int)(a[i] + b[i]); } return result; } public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b) { int len = a.Length; UInt16[] result = new UInt16[len]; for (int i = 0; i < len; ++i) { result[i] = (ushort)(a[i] + b[i]); } return result; } public static void Main() { int count = 1000; int len = 128 * 6000; int[] aInt = GenerateRandomDataInt(len); int[] bInt = GenerateRandomDataInt(len); Stopwatch s = new Stopwatch(); s.Start(); for (int i=0; i<count; ++i) { int[] resultInt = AddInt(aInt, bInt); } s.Stop(); Console.WriteLine("Int for " + count + " took " + s.ElapsedTicks + " tick (" + s.ElapsedMilliseconds + " msec)"); UInt16[] aUInt16 = GenerateRandomDataUInt16(len); UInt16[] bUInt16 = GenerateRandomDataUInt16(len); s = new Stopwatch(); s.Start(); for (int i=0; i<count; ++i) { UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16); } s.Stop(); Console.WriteLine("UInt16 for " + count + " took " + s.ElapsedTicks + " tick (" + s.ElapsedMilliseconds + " msec)"); } 
+7
arrays c # clr
source share
5 answers

It happens that you see a missing abstraction. UInt16 takes up half the memory that makes int (16 versus 32 bits).

This means that the memory area occupied by the int16 array occupies half the area that int32 does. Thus, most of this area can go into the processor cache and thus access very quickly.

You can try this code on a processor with a lot of cache, and the difference is likely to be less.

Also try with much larger arrays.

+6
source share

Arrays are word aligned, but there is no reason why entries in an array should be word aligned.

+2
source share

Just SWAG: less memory usage in UInt16 arrays has improved memory features (GC, cache, who knows what else). Since there seem to be not too many distributions, I would suggest that cache is a major factor.

Also, you have to take care that benchmarking can be tricky - it looks like your time probably includes some JIT compilations that might be garbled. You can try changing the order of checking the int array with the UInt16 array and see if the timings follow or not.

John Skeet (or had) a simple base platform that he coded back when he tried to account for these effects. I do not know if it is available (or even applicable); perhaps he will comment.

+1
source share

Couple of factors

1) You also determine the generation of the resulting array. It would also be interesting to know how long it took only to add and create an array of results that gets back

2) It would be interesting to see which IL is generated. Since your code is VERY simple (iterating and adding), the compiler can optimize this, perhaps by typing a few uint16 in a larger case and making a few additions for each command

+1
source share

I am not an expert in .NET, but I would test two things :

  • Transferring a larger array (N elements of type int ) takes longer than an array of N ushort elements. This can be tested using different sizes of arrays and coding style - see my comment on the question). The numbers from your tests match this theory :).
  • Adding two ushort variables can be implemented as adding two int with an int result - without checking for overflow >. And I believe that handling any kind of exception in the code (including the overflow exception) is a time-consuming task. This can be checked in the .NET documentation.
+1
source share

All Articles