First of all, even without using the built-in functions, you can generate vector expressions log(n) (with n vector lengths) instead of n scalar additions, here is an example with vector size 8:
define i32 @sum(<8 x i32> %a) { %v1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %v2 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7> %sum1 = add <4 x i32> %v1, %v2 %v3 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 0, i32 1> %v4 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 2, i32 3> %sum2 = add <2 x i32> %v3, %v4 %v5 = extractelement <2 x i32> %sum2, i32 0 %v6 = extractelement <2 x i32> %sum2, i32 1 %sum3 = add i32 %v5, %v6 ret i32 %sum3 }
If your goal supports these vector additions, then it seems very likely that the above will be reduced to use these instructions, which will give you performance.
As for the internal functions, there are no objects independent of the purpose for processing this. However, if you compile x86, you have access to hadd instrinsics (e.g. llvm.x86.int_x86_ssse3_phadd_sw_128 to add two <4 x i32> together). You still have to do something similar to the above, only add commands can be replaced.
For more information about this, you can search for "horizontal sum" or "horizontal vector sum"; for example, here are some relevant stackoverflow questions for horizontal sum on x86: