I am writing a template for expressions parameterized by an arbitrary number of char labels.
Given a list of arguments, the factory function returns an expression of different types depending on whether two arguments of the same type exist or are unique.
Case study: suppose A is a βtaggedβ object with its overloaded operator() to create ?Expression<...> . Let a, b, ... be declared as labels LabelName<'a'>, LabelName<'b'>, ... Then A(a,b,c,d) will create a UniqueExpression<'a','b','c','d'> , while A(a,c,b,c) will produce instead of RepeatedExpression<'a','c','b','c'> .
To achieve this, I had to define the ?Expression factory function with auto and decltype . In addition, decltype must be cascaded to another decltype until the metaprogram completes the recursion via arguments and the final return type is finally resolved. As an illustration, I highlighted a fairly minimal code for the factory method.
template <typename... T> struct TypeList { }; template <char C> struct LabelName { }; template <typename... T> class UniqueExpression { // Contains implementation details in actual code }; template <typename... T> class RepeatedExpression { // Contains implementation details in actual code }; class ExpressionFactory { private: template <char _C, typename... T, typename... _T> static UniqueExpression<T...> _do_build(TypeList<T...>, TypeList<LabelName<_C>>, TypeList<>, TypeList<_T...>) { return UniqueExpression<T...> (); } template <char _C, typename... T, typename... _T1, typename... _T2, typename... _T3> static RepeatedExpression<T...> _do_build(TypeList<T...>, TypeList<LabelName<_C>, _T1...>, TypeList<LabelName<_C>, _T2...>, TypeList<_T3...>) { return RepeatedExpression<T...> (); } template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2, typename... _T3> static auto _do_build(TypeList<T...>, TypeList<LabelName<_C1>, _T1...>, TypeList<LabelName<_C2>, _T2...>, TypeList<_T3...>) -> decltype(_do_build(TypeList<T...>(), TypeList<LabelName<_C1>, _T1...>(), TypeList<_T2...>(), TypeList<_T3..., LabelName<_C2>>())) { return _do_build(TypeList<T...>(), TypeList<LabelName<_C1>, _T1...>(), TypeList<_T2...>(), TypeList<_T3..., LabelName<_C2>>()); } template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2> static auto _do_build(TypeList<T...>, TypeList<LabelName<_C1>, LabelName<_C2>, _T1...>, TypeList<>, TypeList<LabelName<_C2>, _T2...>) -> decltype(_do_build(TypeList<T...>(), TypeList<LabelName<_C2>, _T1...>(), TypeList<_T2...>(), TypeList<>())) { return _do_build(TypeList<T...>(), TypeList<LabelName<_C2>, _T1...>(), TypeList<_T2...>(), TypeList<>()); } public: template <char C, typename... T> static auto build_expression(LabelName<C>, T...) -> decltype(_do_build(TypeList<LabelName<C>, T...>(), TypeList<LabelName<C>, T...>(), TypeList<T...>(), TypeList<>())) { return _do_build(TypeList<LabelName<C>, T...>(), TypeList<LabelName<C>, T...>(), TypeList<T...>(), TypeList<>()); } };
factory can be called in the program as follows: (in a real program there is another class with operator() overloaded that calls factory)
int main() { LabelName<'a'> a; LabelName<'b'> b; ... LabelName<'j'> j; auto expr = ExpressionFactory::build_expression(a,b,c,d,e,f,g,h,i,j);
The above code works as intended and is correctly compiled by both GCC and the Intel compiler. Now I understand that the compiler will take more time to do the recursive subtraction of the template when I run the number of labels that I use.
On my computer, if build_expression is called with one argument, then GCC 4.7.1 takes about 0.26 seconds to compile on average. Compilation time scales to about 0.29 seconds for five arguments and to 0.62 seconds for ten arguments. All this is quite reasonable.
The story is different from the Intel compiler. ICPC 13.0.1 compiles code with one argument in 0.35 seconds, and compilation time remains fairly constant for four arguments. With five arguments, the compilation time increases to 12 seconds, and for six arguments it fires above 9600 seconds (i.e. more than 2 hours and 40 minutes). Needless to say, I did not wait long enough to find out how long it took to compile the version with seven arguments.
Two questions come at once:
Is the Intel compiler known to be slow to compile recursive decltype ?
Is there a way to rewrite this code to achieve the same effect, which is perhaps more compiler friendly?