As pointed out in phooji , your implementation suffers from a small problem: it quickly generates a long list of calls, which will cause compilers to suffocate quickly.
You can get around this by implementing a slightly more complex version using binary decomposition. I will make it generalized on a functor too, because I'm lazy.
We need a helper template that saves the parameter offset for passing
template <Functor F, unsigned N, unsigned OffSet> struct UnrolledImpl; template <Functor F, unsigned OffSet> struct UnrolledImpl<F, 0, OffSet> { static F run(F f) { return f; } }; template <Functor F, unsigned OffSet> struct UnrolledImpl<F, 1, OffSet> { static F run(F f) { f(OffSet); return f; } }; template <Functor F, unsigned N, unsigned OffSet> struct UnrolledImpl { static F run(F f) { F f2 = UnrolledImpl<F, N/2, OffSet>::run(f); return UnrolledImpl<F, N - N/2, OffSet + N/2>::run(f2); } };
And you can implement UnrolledLoop simply:
template <Functor F, unsigned N> struct UnrolledLoop { static F run(F f) { return UnrolledImpl<F, N, 0>::run(f); } }
Note that you can provide specialization for more N values ββ(3, 4, for example) to be more enjoyable in the compiler.
Matthieu M.
source share