Sum all elements in a quadratic vector in an ARM assembly with NEON

I'm rather new to the build, and although the shoulder information center is often useful, sometimes the instructions can be a little confusing for beginners. Basically, what I need to do is to sum 4 float values ​​in the quadword register and store the result in one precision register. I think the VPADD instruction can do what I need, but I'm not quite sure.

+6
source share
3 answers

It seems that you want to get the sum of a specific array length, not just four float values.

In this case, your code will work, but far from optimized:

  • many multi-line locks

  • unnecessary 32-bit addition to iteration

Assuming the array length is a multiple of 8 and at least 16:

vldmia {q0-q1}, [pSrc]! sub count, count, #8 loop: pld [pSrc, #32] vldmia {q3-q4}, [pSrc]! subs count, count, #8 vadd.f32 q0, q0, q3 vadd.f32 q1, q1, q4 bgt loop vadd.f32 q0, q0, q1 vpadd.f32 d0, d0, d1 vadd.f32 s0, s0, s1 
  • pld - being an ARM instruction, not NEON - is critical to performance. This greatly increases the cache hit rate.

Hopefully the rest of the code above is self-evident.

You will notice that this version is many times faster than your original version.

+2
source

You can try this (this is not in ASM, but you should easily convert it):

 float32x2_t r = vadd_f32(vget_high_f32(m_type), vget_low_f32(m_type)); return vget_lane_f32(vpadd_f32(r, r), 0); 

In ASM, these would probably be just VADD and VPADD.

I'm not sure if this is the only way to do this (and the most optimal one), but I did not understand / did not find the best ...

PS. I am also new to NEON.

+2
source

Here is the code in ASM:

  vpadd.f32 d1,d6,d7 @ q3 is register that needs all of its contents summed vadd.f32 s1,s2,s3 @ now we add the contents of d1 together (the sum) vadd.f32 s0,s0,s1 @ sum += s1; 

Perhaps I forgot to mention that in C the code would look like this:

 float sum = 1.0f; sum += number1 * number2; 

I have omitted multiplication from this small piece of asm code.

+2
source

All Articles