I have been doing some unsafe bitmap operations and found that increasing the pointer less time can lead to some big performance improvements. I'm not sure why this is so, even if you do a lot more bitwise operations in a loop, it is still better to do fewer iterations on the pointer.
So, for example, instead of repeating more than 32-bit pixels with UInt32 iterating over two pixels with UInt64 and repeating operations in one cycle.
The following does this by reading two pixels and changing them (of course, it will fail with images with an odd width, but just for testing).
private void removeBlueWithTwoPixelIteration() { // think of a big image with data Bitmap bmp = new Bitmap(15000, 15000, System.Drawing.Imaging.PixelFormat.Format32bppArgb); TimeSpan startTime, endTime; unsafe { UInt64 doublePixel; UInt32 pixel1; UInt32 pixel2; const int readSize = sizeof(UInt64); const UInt64 rightHalf = UInt32.MaxValue; PerformanceCounter pf = new PerformanceCounter("System", "System Up Time"); pf.NextValue(); BitmapData bd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat); byte* image = (byte*)bd.Scan0.ToPointer(); startTime = TimeSpan.FromSeconds(pf.NextValue()); for (byte* line = image; line < image + bd.Stride * bd.Height; line += bd.Stride) { for (var pointer = line; pointer < line + bd.Stride; pointer += readSize) { doublePixel = *((UInt64*)pointer); pixel1 = (UInt32)(doublePixel >> (readSize * 8 / 2)) >> 8; // loose last 8 bits (Blue color) pixel2 = (UInt32)(doublePixel & rightHalf) >> 8; // loose last 8 bits (Blue color) *((UInt32*)pointer) = pixel1 << 8; // putback but shift so ARG get back to original positions *((UInt32*)pointer + 1) = pixel2 << 8; // putback but shift so ARG get back to original positions } } endTime = TimeSpan.FromSeconds(pf.NextValue()); bmp.UnlockBits(bd); bmp.Dispose(); } MessageBox.Show((endTime - startTime).TotalMilliseconds.ToString()); }
The following code makes pixel by pixel and about 70% slower than the previous one:
private void removeBlueWithSinglePixelIteration() { // think of a big image with data Bitmap bmp = new Bitmap(15000, 15000, System.Drawing.Imaging.PixelFormat.Format32bppArgb); TimeSpan startTime, endTime; unsafe { UInt32 singlePixel; const int readSize = sizeof(UInt32); PerformanceCounter pf = new PerformanceCounter("System", "System Up Time"); pf.NextValue(); BitmapData bd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat); byte* image = (byte*)bd.Scan0.ToPointer(); startTime = TimeSpan.FromSeconds(pf.NextValue()); for (byte* line = image; line < image + bd.Stride * bd.Height; line += bd.Stride) { for (var pointer = line; pointer < line + bd.Stride; pointer += readSize) { singlePixel = *((UInt32*)pointer) >> 8; // loose B *((UInt32*)pointer) = singlePixel << 8; // adjust ARG back } } endTime = TimeSpan.FromSeconds(pf.NextValue()); bmp.UnlockBits(bd); bmp.Dispose(); } MessageBox.Show((endTime - startTime).TotalMilliseconds.ToString()); }
Can someone clarify why incrementing a pointer is a more expensive operation than doing a few bitwise operations?
I am using the .NET 4. framework.
Could there be something similar for C ++?
NB. The 32-bit and 64-bit ratio of the two methods is equal, however, both methods are 20% slower than 64 versus 32 bits?
EDIT: As suggested by Porges and arul, this could be due to a decrease in the number of reads and memory branches.
EDIT2:
After some testing, it seems that reading from memory is less time - this is the answer:
If this code assumes that the image width is divided by 5, you get 400% faster:
[StructLayout(LayoutKind.Sequential,Pack = 1)] struct PixelContainer { public UInt32 pixel1; public UInt32 pixel2; public UInt32 pixel3; public UInt32 pixel4; public UInt32 pixel5; }
Then use this:
int readSize = sizeof(PixelContainer); // ..... for (var pointer = line; pointer < line + bd.Stride; pointer += readSize) { multiPixel = *((PixelContainer*)pointer); multiPixel.pixel1 &= 0xFFFFFF00u; multiPixel.pixel2 &= 0xFFFFFF00u; multiPixel.pixel3 &= 0xFFFFFF00u; multiPixel.pixel4 &= 0xFFFFFF00u; multiPixel.pixel5 &= 0xFFFFFF00u; *((PixelContainer*)pointer) = multiPixel; }