What does (r + 1 + (r >> 8)) >> 8 do?

Question

What does (r + 1 + (r >> 8)) >> 8 do?

In some old C / C ++ graphical code that I have to connect to Java and JavaScript, I found this:

b = (b+1 + (b >> 8)) >> 8; // very fast

Where b short int for blue, and the same code is displayed for r and b (red and blue). Comment is not helpful.

I can’t understand what he is doing, apart from the obvious change and addition. I can port without understanding, I just ask out of curiosity.

+5

c bit-manipulation

exebook May 14, '15 at 12:35

source share

7 answers

It seems to be designed to check if blue (or red or green ) is being used. It evaluates to 1 when b is 255 and 0 for all lower values.

+3

kaykay May 14, '15 at 12:47

source share

A common use case when you want to use a formula that is more accurate than 257/256 is when you need to combine a lot of alpha values for each pixel. As one example, when reducing the size of the image, you need to combine 4 alpha for each source pixel that contributes to the assignment, and then combine all the original pixels that contribute to the assignment.

I published an infinitely accurate version with a / 255 turntable, but it was rejected for no reason. So I will add that I implement alpha mixing equipment for life, I write real-time graphic code and game engines for life, and I published articles on this subject at conferences such as MICRO, so I really know what I'm saying about. And it can be helpful, or at least entertain people, to understand a more accurate formula that is EXACTLY 1/255:

Version 1: x = (x + (x → 8)) → 8 - the constant is not added, does not satisfy (x * 255) / 255 = x, but in most cases it will look normal. Version 2: x = (x + (x → 8) + 1) → 8 - WILL satisfy (x * 255) / 255 = x for integers, but will not beat the correct integer values for all alpha

Version 3: (simple integer rounding): (x + (x → 8) + 128) → 8 - Will not hit the correct integer values for all alpha, but on average it will be closer than version 2 at the same cost.

Version 4: An infinitely accurate version with any desired level of accuracy for any number of composite alpha: (useful for resizing images, rotation, etc.):

[(x + (x → 8)) → 8] + [((x and 255) + (x → 8)) → 8]

Why is version 4 infinitely accurate? Because 1/255 = 1/256 + 1/65536 + 1/256 ^ 3 + 1/256 ^ 4 + ...

The simplest expression above (version 1) does not handle rounding, but also does not handle hyphens that come from this infinite number of identical columns of the sum. The new term added above defines the execution (0 or 1) of this infinite number of base 256 digits. By adding it, you will get the same result as adding all the endless additions. At this point, you can round up by adding half a bit to any precision point you want.

It is not necessary for the OP, perhaps, but people should know that you do not need to approach at all. The above formula is actually more accurate than a double-precision floating point.

As for speed: in hardware, this method is faster than even one (full). In software, you must consider bandwidth and latency. In latency, it can be faster than narrow multiplication (definitely faster than full-width multiplication), but in the OP context, you can expand many pixels at the same time, and since modern multi-line units are pipelined, you're still fine. In Java translation, you probably don't have narrow breeding, so it can still be faster, but you need to check.

WRT is one person who said, “why not use the OS’s built-in capabilities for alpha blinting?”: If that OS already has a significant graphic code base, this might be a great option. If not, you are looking at hundreds and thousands of lines of code to use the OS version - code that is much more difficult to write and debug than this code. And, in the end, the OS code that you have is not portable at all, and this code can be used anywhere.

+3

user2465201 21 sept '16 at 19:43

source share

The value is b+1 + b/256 , this calculation is divided by 256 .

Thus, using a bit-shift, the compiler translates using the processor-level shift instruction instead of using the FPU or library split functions.

+2

LPs May 14, '15 at 12:40

source share

I suspect he is trying to do the following:

 boolean isBFullyOn = false; if (b == 0xff) { isBFullyOn = true; }

Back to the days of slow processors; smart bit tricks like the one described above can be faster than the obvious if-then-else logic. He avoids the jump instruction, which was expensive.

It probably also sets an overflow flag in the processor that was used for some of the latest logic. It all depends a lot on the target processor.

And also for his part speculatively !!

+2

Bret walker May 14, '15 at 12:42

source share

b = (b + (b >> 8)) >> 8; basically b = b *257/256 .

I think +1 is an ugly hack of an average reduction of -0.5 caused by internal >>8 .

I would write it as b = (b + 128 + ((b +128)>> 8)) >> 8; .

+1

user3528438 May 14, '15 at 12:56

source share

Running this test code:

 public void test() { Set<Integer> results = new HashSet<Integer>(); // short int ranges between -32767 and 32767 for (int i = -32767; i <= 32767; i++) { int b = (i + 1 + (i >> 8)) >> 8; if (!results.contains(b)) { System.out.println(i + " -> " + b); results.add(b); } } }

Produces all possible values between -129 and 128 . However, if you work with 8-bit colors ( 0 - 255 ), then the only possible outputs are 0 (for 0 - 254 ) and 1 (for 255 ), so it is likely that he is trying function @kaykay.

+1

OldCurmudgeon May 14, '15 at 12:57

source share

nobar · Accepted Answer · 2015-05-14T13:33:51+0000

 y = ( x + 1 + (x>>8) ) >> 8 // very fast

This is a fixed-point approximation by 255 . Clearly, this is useful for normalizing calculations based on pixel values, so 255 (usually the maximum pixel value) displays exactly 1.

It is described as very fast, since fully general integer division is a relatively slow operation on many processors - although it is possible that your compiler will do this kind of optimization for you if it can infer input restrictions.

This works on the basis of the idea that 257/(256*256) is a very close approximation of 1/255 and that x*257/256 1/255 can be formulated as x+(x>>8) . +1 - rounding support, which allows the formula to exactly match the integer division of x/255 for all x values in [0..65534].

Some algebra on the inside can make things a little clearer ...

  x*257/256 = (x*256+x)/256 = x + x/256 = x + (x>>8)

Here are more discussions: How to make an alpha blend fast? and here: Division through Multiplication

By the way, if you want to round to the nearest one, and your processor can quickly multiply, then for all dividend values uint16_t the following will be true: actually [0 .. (2 ^ 16) +126].

 y = ((x+128)*257)>>16 // divide by 255 with round-to-nearest for x in [0..65662]

What does (r + 1 + (r >> 8)) >> 8 do?

More articles: