Unsafe pointer iteration and bitmap - why is UInt64 faster?

Question

Unsafe pointer iteration and bitmap - why is UInt64 faster?

I have been doing some unsafe bitmap operations and found that increasing the pointer less time can lead to some big performance improvements. I'm not sure why this is so, even if you do a lot more bitwise operations in a loop, it is still better to do fewer iterations on the pointer.

So, for example, instead of repeating more than 32-bit pixels with UInt32 iterating over two pixels with UInt64 and repeating operations in one cycle.

The following does this by reading two pixels and changing them (of course, it will fail with images with an odd width, but just for testing).

private void removeBlueWithTwoPixelIteration() { // think of a big image with data Bitmap bmp = new Bitmap(15000, 15000, System.Drawing.Imaging.PixelFormat.Format32bppArgb); TimeSpan startTime, endTime; unsafe { UInt64 doublePixel; UInt32 pixel1; UInt32 pixel2; const int readSize = sizeof(UInt64); const UInt64 rightHalf = UInt32.MaxValue; PerformanceCounter pf = new PerformanceCounter("System", "System Up Time"); pf.NextValue(); BitmapData bd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat); byte* image = (byte*)bd.Scan0.ToPointer(); startTime = TimeSpan.FromSeconds(pf.NextValue()); for (byte* line = image; line < image + bd.Stride * bd.Height; line += bd.Stride) { for (var pointer = line; pointer < line + bd.Stride; pointer += readSize) { doublePixel = *((UInt64*)pointer); pixel1 = (UInt32)(doublePixel >> (readSize * 8 / 2)) >> 8; // loose last 8 bits (Blue color) pixel2 = (UInt32)(doublePixel & rightHalf) >> 8; // loose last 8 bits (Blue color) *((UInt32*)pointer) = pixel1 << 8; // putback but shift so ARG get back to original positions *((UInt32*)pointer + 1) = pixel2 << 8; // putback but shift so ARG get back to original positions } } endTime = TimeSpan.FromSeconds(pf.NextValue()); bmp.UnlockBits(bd); bmp.Dispose(); } MessageBox.Show((endTime - startTime).TotalMilliseconds.ToString()); }

The following code makes pixel by pixel and about 70% slower than the previous one:

  private void removeBlueWithSinglePixelIteration() { // think of a big image with data Bitmap bmp = new Bitmap(15000, 15000, System.Drawing.Imaging.PixelFormat.Format32bppArgb); TimeSpan startTime, endTime; unsafe { UInt32 singlePixel; const int readSize = sizeof(UInt32); PerformanceCounter pf = new PerformanceCounter("System", "System Up Time"); pf.NextValue(); BitmapData bd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat); byte* image = (byte*)bd.Scan0.ToPointer(); startTime = TimeSpan.FromSeconds(pf.NextValue()); for (byte* line = image; line < image + bd.Stride * bd.Height; line += bd.Stride) { for (var pointer = line; pointer < line + bd.Stride; pointer += readSize) { singlePixel = *((UInt32*)pointer) >> 8; // loose B *((UInt32*)pointer) = singlePixel << 8; // adjust ARG back } } endTime = TimeSpan.FromSeconds(pf.NextValue()); bmp.UnlockBits(bd); bmp.Dispose(); } MessageBox.Show((endTime - startTime).TotalMilliseconds.ToString()); }

Can someone clarify why incrementing a pointer is a more expensive operation than doing a few bitwise operations?

I am using the .NET 4. framework.

Could there be something similar for C ++?

NB. The 32-bit and 64-bit ratio of the two methods is equal, however, both methods are 20% slower than 64 versus 32 bits?

EDIT: As suggested by Porges and arul, this could be due to a decrease in the number of reads and memory branches.

EDIT2:

After some testing, it seems that reading from memory is less time - this is the answer:

If this code assumes that the image width is divided by 5, you get 400% faster:

 [StructLayout(LayoutKind.Sequential,Pack = 1)] struct PixelContainer { public UInt32 pixel1; public UInt32 pixel2; public UInt32 pixel3; public UInt32 pixel4; public UInt32 pixel5; }

Then use this:

  int readSize = sizeof(PixelContainer); // ..... for (var pointer = line; pointer < line + bd.Stride; pointer += readSize) { multiPixel = *((PixelContainer*)pointer); multiPixel.pixel1 &= 0xFFFFFF00u; multiPixel.pixel2 &= 0xFFFFFF00u; multiPixel.pixel3 &= 0xFFFFFF00u; multiPixel.pixel4 &= 0xFFFFFF00u; multiPixel.pixel5 &= 0xFFFFFF00u; *((PixelContainer*)pointer) = multiPixel; }

+4

c # .net image-processing image-manipulation unsafe-pointers

Marino Šimić Apr 28 '11 at 2:40

source share

2 answers

This is not a pointer increment, which is slower, but a memory read. With 32-bit modules, you make twice as many reads.

You should find it faster if you write once, and not twice in the 64-bit version.

+2

porges Apr 28 '11 at 3:02

source share

arul · Accepted Answer · 2011-04-28T03:11:05+0000

This is a technique known as loop unrolling. The main performance benefit should be associated with reduced branching costs.

As a side note, you can speed it up a bit using a bitmask:

 *((UInt64 *)pointer) &= 0xFFFFFF00FFFFFF00ul;

Unsafe pointer iteration and bitmap - why is UInt64 faster?

More articles: