If-else slaloms are a nightmare for almost all processors, especially for vector machines such as NEON, which itself does not have a conditional branch.
Therefore, we apply “impatient execution” to such problems.
- Boolean mask is created
- Both
if and else tags are calculated - The "correct" result is selected by the mask
I think it will not be a problem to convert the aarch32 code below to intrinsics.
//aarch32 vadd.f32 vecElse, vecA, vecTen // vecTen contains 10.0f vcgt.f32 vecMask, vecA, vecTen vadd.f32 vecA, vecA, vecFive vbif vecA, vecElse, vecMask //aarch64 fadd vecElse.4s, vecA.4s, vecTen.4s fcmgt vecMask.4s, vecA.4s, vecTen.4s fadd vecA.4s, vecA.4s, vecFive.4s bif vecA.16b, vecElse.16b, vecMask.16b
source share