Download 8bit uint8_t as uint32_t?

my image processing project works with grayscale images. I have an ARM Cortex-A8 processor platform. I want to use NEON.

I have an image in grayscale (let's look at an example below), and in my allegorism I have to add only columns.

How can I load four 8-bit pixel values that are uint8_t , like four uint32_t, into one of the 128-bit NEON? Which interior should I use for this?

I mean:

alt text

I have to download them as 32 bits, because if you look carefully, the way I do 255 + 255 is 512, which cannot be stored in an 8-bit register.

eg.

255 255 255 255 ......... (640 pixels)
255 255 255 255
255 255 255 255
255 255 255 255
.
.
.
.
.
(480 pixels) 
+5
5

480 8- , 17 . , , 240 , 240 , 16 . , .

NEON, vaddw. dword qword, , . vaddw.u8 8 8 16- . vaddw.u16 8 16- 8 32- - , , .

16- 8- vmovn vqmovn.

+3

, 4 8 4 32- .

, vshl. neon 32 , 8 ( 4)

16 . ...

+2

4 , (vld1 <register>[<lane>], [<address]), q-, (vmovl), 16, 32 . - ( GNU)

vld1 d0[0], [<address>] @Now d0 = (*<addr>, *<addr+1>, *<addr+2>, *<addr+3>, <junk>, ... <junk> )
vmovl.u8 q0, d0 @Now q1 = (d0, d1) = ((uint16_t)*<addr>, ... (uint16_t)*<addr+3>, <junk>, ... <junk>)
vmovl.u16 q0, d2 @Now d0 = ((uint32_t)*<addr>, ... (uint32_t)*<addr+3>), d1 = (<junk>, ... <junk>)

, <address> 4 , [<address>: 32] , . , , .

Um, I just realized that you want to use intrinsics, not an assembly, so here it is the same with intrinsics.

uint32x4_t v8; // Will actually hold 4 uint8_t
v8 = vld1_lane_u32(ptr, v8, 0);
const uint16x4_t v16 = vget_low_u16(vmovl_u8(vreinterpret_u8_u32(v8)));
const uint32x4_t v32 = vmovl_u16(v16);
0
source

All Articles