Convert BYTE buffer (0-255) to floating point buffer (0.0-1.0)

Question

Convert BYTE buffer (0-255) to floating point buffer (0.0-1.0)

How to convert BYTE buffer (0 to 255) to a floating point buffer (0.0 to 1.0)? Of course, there should be a relationship between the two values, for example: 0 in the byte buffer will be .0.f in the floating-point buffer, 128-byte buffer will be .5f in the floating-point buffer, 255 in byte buffer will be 1.f in the floating buffer .

This is actually the code that I have:

for (int y=0;y<height;y++) { for (int x=0;x<width;x++) { float* floatpixel = floatbuffer + (y * width + x) * 4; BYTE* bytepixel = (bytebuffer + (y * width + x) * 4); floatpixel[0] = bytepixel[0]/255.f; floatpixel[1] = bytepixel[1]/255.f; floatpixel[2] = bytepixel[2]/255.f; floatpixel[3] = 1.0f; // A } }

This is very slow. My friend suggested I use a conversion table, but I wanted to know if someone else could give me a different approach.

+7

c ++ arrays floating-point byte bytearray

Veehmot Jun 25 '09 at 13:00

source share

7 answers

I know this is an old question, but since no one gave a solution using the IEEE float view, here is one.

 // Use three unions instead of one to avoid pipeline stalls union { float f; uint32_t i; } t, u, v, w; tf = 32768.0f; float const b = 256.f / 255.f; for(int size = width * height; size > 0; --size) { ui = ti | bytepixel[0]; floatpixel[0] = (uf - tf) * b; vi = ti | bytepixel[1]; floatpixel[1] = (vf - tf) * b; wi = ti | bytepixel[2]; floatpixel[2] = (wf - tf) * b; floatpixel[3] = 1.0f; // A floatpixel += 4; bytepixel += 4; }

This is more than twice as fast as converting int to float on my computer (Core 2 Duo CPU).

Here is the SSE3 version of the above code, which makes 16 floats at a time. This requires that the bytepixel and floatpixel are 128 bit aligned and the total size is a multiple of 4. Note that the built-in int method for SSE3 floating conversions will not help here, as this will require additional multiplication. I think this is the shortest way to learn, but if your compiler is not smart enough, you can deploy and schedule things manually.

 /* Magic values */ __m128i zero = _mm_set_epi32(0, 0, 0, 0); __m128i magic1 = _mm_set_epi32(0xff000000, 0xff000000, 0xff000000, 0xff000000); __m128i magic2 = _mm_set_epi32(0x47004700, 0x47004700, 0x47004700, 0x47004700); __m128 magic3 = _mm_set_ps(32768.0f, 32768.0f, 32768.0f, 32768.0f); __m128 magic4 = _mm_set_ps(256.0f / 255.0f, 256.0f / 255.0f, 256.0f / 255.0f, 256.0f / 255.0f); for(int size = width * height / 4; size > 0; --size) { /* Load bytes in vector and force alpha value to 255 so that * the output will be 1.0f as expected. */ __m128i in = _mm_load_si128((__m128i *)bytepixel); in = _mm_or_si128(in, magic1); /* Shuffle bytes into four ints ORed with 32768.0f and cast * to float (the cast is free). */ __m128i tmplo = _mm_unpacklo_epi8(in, zero); __m128i tmphi = _mm_unpackhi_epi8(in, zero); __m128 in1 = _mm_castsi128_ps(_mm_unpacklo_epi16(tmplo, magic2)); __m128 in2 = _mm_castsi128_ps(_mm_unpackhi_epi16(tmplo, magic2)); __m128 in3 = _mm_castsi128_ps(_mm_unpacklo_epi16(tmphi, magic2)); __m128 in4 = _mm_castsi128_ps(_mm_unpackhi_epi16(tmphi, magic2)); /* Subtract 32768.0f and multiply by 256.0f/255.0f */ __m128 out1 = _mm_mul_ps(_mm_sub_ps(in1, magic3), magic4); __m128 out2 = _mm_mul_ps(_mm_sub_ps(in2, magic3), magic4); __m128 out3 = _mm_mul_ps(_mm_sub_ps(in3, magic3), magic4); __m128 out4 = _mm_mul_ps(_mm_sub_ps(in4, magic3), magic4); /* Store 16 floats */ _mm_store_ps(floatpixel, out1); _mm_store_ps(floatpixel + 4, out2); _mm_store_ps(floatpixel + 8, out3); _mm_store_ps(floatpixel + 12, out4); floatpixel += 16; bytepixel += 16; }

Edit : Improve accuracy using (f + c/b) * b instead of f * b + c .

Edit : Add the version of SSE3.

+8

sam hocevar Mar 19 '11 at 14:46

source share

Use a static lookup table for this. When I worked at a computer graphics company, we got a hard coded lookup table, which we contacted with the project.

+2

Mats fredriksson Jun 25 '09 at 13:12

source share

You need to find out what the bottleneck is:

If you repeat your data tables in the “wrong” direction, you constantly encounter a cache skip. No search will ever get around this.
If your processor scales more slowly than with a search, you can improve performance by searching if the lookup table is suitable for the cache.

Another tip:

 struct Scale { BYTE operator()( const float f ) const { return f * 1./255; } }; std::transform( float_table, float_table + itssize, floatpixel, Scale() );

+2

xtofl Jun 25 '09 at 13:19

source share

Yes, a lookup table is definitely faster than doing a lot of ticks in a loop. Just create a table of 256 pre-computed floating point values and use the byte value to index this table.

You can also optimize the loop a bit by removing the index calculation and just do something like

 float *floatpixel = floatbuffer; BYTE *bytepixel = bytebuffer; for (...) { *floatpixel++ = float_table[*bytepixel++]; *floatpixel++ = float_table[*bytepixel++]; *floatpixel++ = float_table[*bytepixel++]; *floatpixel++ = 1.0f; }

+1

laalto Jun 25 '09 at 13:14

source share

Look-up table is the fastest way to convert :) Here you are:

Python code to generate a byte_to_float.h file to include:

 #!/usr/bin/env python def main(): print "static const float byte_to_float[] = {" for ii in range(0, 255): print "%sf," % (ii/255.0) print "1.0f };" return 0 if __name__ == "__main__": main()

And C ++ code to convert:

 floatpixel[0] = byte_to_float[ bytepixel[0] ];

Simple right?

+1

Viet Mar 01 '10 at 11:22

source share

Do not count 1/255 each time. I don't know if the compiler will be smart enough to remove this. Calculate it once and reuse it every time. Even better, define it as a constant.

0

Rodyland Jun 26 '09 at 5:28

source share

moonshadow · Accepted Answer · 2009-06-25T13:13:59+0000

If you decide to use the lookup table or not, your code does a lot of work with each iteration of the loop, which it really doesn’t need — probably enough to outshine the cost of conversion and multiply.

Declare restrictions on pointers and pointers that you only read from const. Multiply by 1 / 255th instead of dividing by 255. Do not calculate the pointers at each iteration of the inner loop, just calculate the starting values and increase them. Expand the inner contour several times. Use vector SIMD operations if your target supports it. Do not increase or compare with the maximum, decrement and compare with zero.

Something like

 float* restrict floatpixel = floatbuffer; BYTE const* restrict bytepixel = bytebuffer; for( int size = width*height; size > 0; --size ) { floatpixel[0] = bytepixel[0]*(1.f/255.f); floatpixel[1] = bytepixel[1]*(1.f/255.f); floatpixel[2] = bytepixel[2]*(1.f/255.f); floatpixel[3] = 1.0f; // A floatpixel += 4; bytepixel += 4; }

will be the beginning.

Convert BYTE buffer (0-255) to floating point buffer (0.0-1.0)

More articles: