LLVM IR: efficient vector summation

I am writing a compiler that generates LLVM IR instructions. I work intensively with vectors.

I would like to be able to summarize all the elements in a vector. Right now, I just extract each element individually and add it manually, but it seems to me that this is exactly what the hardware should be able to help with (since this seems like a fairly common operation). But this does not seem to be the case.

What is the best way to do this? I am using LLVM 3.2.

+4
source share
1 answer

First of all, even without using the built-in functions, you can generate vector expressions log(n) (with n vector lengths) instead of n scalar additions, here is an example with vector size 8:

 define i32 @sum(<8 x i32> %a) { %v1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %v2 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7> %sum1 = add <4 x i32> %v1, %v2 %v3 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 0, i32 1> %v4 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 2, i32 3> %sum2 = add <2 x i32> %v3, %v4 %v5 = extractelement <2 x i32> %sum2, i32 0 %v6 = extractelement <2 x i32> %sum2, i32 1 %sum3 = add i32 %v5, %v6 ret i32 %sum3 } 

If your goal supports these vector additions, then it seems very likely that the above will be reduced to use these instructions, which will give you performance.

As for the internal functions, there are no objects independent of the purpose for processing this. However, if you compile x86, you have access to hadd instrinsics (e.g. llvm.x86.int_x86_ssse3_phadd_sw_128 to add two <4 x i32> together). You still have to do something similar to the above, only add commands can be replaced.

For more information about this, you can search for "horizontal sum" or "horizontal vector sum"; for example, here are some relevant stackoverflow questions for horizontal sum on x86:

+3
source

All Articles