Intel C ++ Compiler is very slow for compiling recursive declarations

I am writing a template for expressions parameterized by an arbitrary number of char labels.

Given a list of arguments, the factory function returns an expression of different types depending on whether two arguments of the same type exist or are unique.

Case study: suppose A is a β€œtagged” object with its overloaded operator() to create ?Expression<...> . Let a, b, ... be declared as labels LabelName<'a'>, LabelName<'b'>, ... Then A(a,b,c,d) will create a UniqueExpression<'a','b','c','d'> , while A(a,c,b,c) will produce instead of RepeatedExpression<'a','c','b','c'> .

To achieve this, I had to define the ?Expression factory function with auto and decltype . In addition, decltype must be cascaded to another decltype until the metaprogram completes the recursion via arguments and the final return type is finally resolved. As an illustration, I highlighted a fairly minimal code for the factory method.

 template <typename... T> struct TypeList { }; template <char C> struct LabelName { }; template <typename... T> class UniqueExpression { // Contains implementation details in actual code }; template <typename... T> class RepeatedExpression { // Contains implementation details in actual code }; class ExpressionFactory { private: template <char _C, typename... T, typename... _T> static UniqueExpression<T...> _do_build(TypeList<T...>, TypeList<LabelName<_C>>, TypeList<>, TypeList<_T...>) { return UniqueExpression<T...> (); } template <char _C, typename... T, typename... _T1, typename... _T2, typename... _T3> static RepeatedExpression<T...> _do_build(TypeList<T...>, TypeList<LabelName<_C>, _T1...>, TypeList<LabelName<_C>, _T2...>, TypeList<_T3...>) { return RepeatedExpression<T...> (); } template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2, typename... _T3> static auto _do_build(TypeList<T...>, TypeList<LabelName<_C1>, _T1...>, TypeList<LabelName<_C2>, _T2...>, TypeList<_T3...>) -> decltype(_do_build(TypeList<T...>(), TypeList<LabelName<_C1>, _T1...>(), TypeList<_T2...>(), TypeList<_T3..., LabelName<_C2>>())) { return _do_build(TypeList<T...>(), TypeList<LabelName<_C1>, _T1...>(), TypeList<_T2...>(), TypeList<_T3..., LabelName<_C2>>()); } template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2> static auto _do_build(TypeList<T...>, TypeList<LabelName<_C1>, LabelName<_C2>, _T1...>, TypeList<>, TypeList<LabelName<_C2>, _T2...>) -> decltype(_do_build(TypeList<T...>(), TypeList<LabelName<_C2>, _T1...>(), TypeList<_T2...>(), TypeList<>())) { return _do_build(TypeList<T...>(), TypeList<LabelName<_C2>, _T1...>(), TypeList<_T2...>(), TypeList<>()); } public: template <char C, typename... T> static auto build_expression(LabelName<C>, T...) -> decltype(_do_build(TypeList<LabelName<C>, T...>(), TypeList<LabelName<C>, T...>(), TypeList<T...>(), TypeList<>())) { return _do_build(TypeList<LabelName<C>, T...>(), TypeList<LabelName<C>, T...>(), TypeList<T...>(), TypeList<>()); } }; 

factory can be called in the program as follows: (in a real program there is another class with operator() overloaded that calls factory)

 int main() { LabelName<'a'> a; LabelName<'b'> b; ... LabelName<'j'> j; auto expr = ExpressionFactory::build_expression(a,b,c,d,e,f,g,h,i,j); // Perhaps do some cool stuff with expr return 0; } 

The above code works as intended and is correctly compiled by both GCC and the Intel compiler. Now I understand that the compiler will take more time to do the recursive subtraction of the template when I run the number of labels that I use.

On my computer, if build_expression is called with one argument, then GCC 4.7.1 takes about 0.26 seconds to compile on average. Compilation time scales to about 0.29 seconds for five arguments and to 0.62 seconds for ten arguments. All this is quite reasonable.

The story is different from the Intel compiler. ICPC 13.0.1 compiles code with one argument in 0.35 seconds, and compilation time remains fairly constant for four arguments. With five arguments, the compilation time increases to 12 seconds, and for six arguments it fires above 9600 seconds (i.e. more than 2 hours and 40 minutes). Needless to say, I did not wait long enough to find out how long it took to compile the version with seven arguments.


Two questions come at once:

  • Is the Intel compiler known to be slow to compile recursive decltype ?

  • Is there a way to rewrite this code to achieve the same effect, which is perhaps more compiler friendly?

+7
source share
1 answer

Here is a hit. Instead of making pairwise comparisons of each of the elements, I sort the list of types and then use a unique algorithm for the brain to see if there are duplicates.

I implemented type merge sorting because it was fun. A naive bubble type would probably work better on a reasonable number of arguments. Please note that a little work will allow us to sort the merge into long lists and specialize in sorting bubbles (or even sorting inserts) in short lists. I am not going to write a template shortcut.

This gives me boolean compilation time, which tells me if there are duplicates in the list. Then I can use enable_if to select the overload that I am going to use.

Please note that your solution included n ^ 2 layers of template recursion, at each stage of which the return type requires evaluating the type of the 1st simpler class, and then the return type also requires the same! If Intel's compiler memoization fails, you're talking about exponential amounts of work.

I added some of your classes with some helpers. I made my LabelName inherit from std::integral_constant , so I have easy access to compile time. This makes sorting the code easier. I also set enum out of duplicate and unique return values ​​so that I can perform simple debugging of printf from the result.

Most of this work records merge sorting - is there a standard type of compile time type that we could use?

 #include <type_traits> #include <iostream> template <typename... T> struct TypeList { }; // NOTE THIS CHANGE: template <char C> struct LabelName:std::integral_constant<char, C> {}; template <typename... T> class UniqueExpression { // Contains implementation details in actual code public: enum { is_unique = true }; }; template <typename... T> class RepeatedExpression { // Contains implementation details in actual code public: enum { is_unique = false }; }; // A compile time merge sort for types // Split takes a TypeList<>, and sticks the even // index types into Left and odd into Right template<typename T> struct Split; template<> struct Split<TypeList<>> { typedef TypeList<> Left; typedef TypeList<> Right; }; template<typename T> struct Split<TypeList<T>> { typedef TypeList<T> Left; typedef TypeList<> Right; }; // Prepends First into the TypeList List. template<typename First, typename List> struct Prepend; template<typename First, typename... ListContents> struct Prepend<First,TypeList<ListContents...>> { typedef TypeList<First, ListContents...> type; }; template<typename First, typename Second, typename... Tail> struct Split<TypeList<First, Second, Tail...>> { typedef typename Prepend< First, typename Split<TypeList<Tail...>>::Left>::type Left; typedef typename Prepend< Second, typename Split<TypeList<Tail...>>::Right>::type Right; }; // Merges the sorted TypeList<>s Left and Right to the end of TypeList<> MergeList template< typename Left, typename Right, typename MergedList=TypeList<> > struct Merge; template<typename MergedList> struct Merge< TypeList<>, TypeList<>, MergedList > { typedef MergedList type; }; template<typename L1, typename... Left, typename... Merged> struct Merge< TypeList<L1, Left...>, TypeList<>, TypeList<Merged... >> { typedef TypeList<Merged..., L1, Left...> type; }; template<typename R1, typename... Right, typename... Merged> struct Merge< TypeList<>, TypeList<R1, Right...>, TypeList<Merged...> > { typedef TypeList<Merged..., R1, Right...> type; }; template<typename L1, typename... Left, typename R1, typename... Right, typename... Merged> struct Merge< TypeList<L1, Left...>, TypeList<R1, Right...>, TypeList<Merged...>> { template<bool LeftIsSmaller, typename LeftList, typename RightList, typename MergedList> struct MergeHelper; template<typename FirstLeft, typename... LeftTail, typename FirstRight, typename... RightTail, typename... MergedElements> struct MergeHelper< true, TypeList<FirstLeft, LeftTail...>, TypeList<FirstRight, RightTail...>, TypeList<MergedElements...> > { typedef typename Merge< TypeList<LeftTail...>, TypeList< FirstRight, RightTail... >, TypeList< MergedElements..., FirstLeft > >::type type; }; template<typename FirstLeft, typename... LeftTail, typename FirstRight, typename... RightTail, typename... MergedElements> struct MergeHelper< false, TypeList<FirstLeft, LeftTail...>, TypeList<FirstRight, RightTail...>, TypeList<MergedElements...> > { typedef typename Merge< TypeList<FirstLeft, LeftTail...>, TypeList<RightTail... >, TypeList< MergedElements..., FirstRight > >::type type; }; typedef typename MergeHelper< (L1::value < R1::value), TypeList<L1, Left...>, TypeList<R1, Right...>, TypeList<Merged...> >::type type; }; // Takes a TypeList<T...> and sorts it via a merge sort: template<typename List> struct MergeSort; template<> struct MergeSort<TypeList<>> { typedef TypeList<> type; }; template<typename T> struct MergeSort<TypeList<T>> { typedef TypeList<T> type; }; template<typename First, typename Second, typename... T> struct MergeSort<TypeList<First, Second, T...>> { typedef Split<TypeList<First, Second, T...>> InitialSplit; typedef typename MergeSort< typename InitialSplit::Left >::type Left; typedef typename MergeSort< typename InitialSplit::Right >::type Right; typedef typename Merge< Left, Right >::type type; }; // Sorts a TypeList<T..>: template<typename List> struct Sort: MergeSort<List> {}; // Checks sorted TypeList<T...> SortedList for adjacent duplicate types // return value is in value template<typename SortedList> struct Unique; template<> struct Unique< TypeList<> >:std::true_type {}; template<typename T> struct Unique< TypeList<T> >:std::true_type {}; template<typename First, typename Second, typename... Tail> struct Unique< TypeList< First, Second, Tail... > > { enum { value = (!std::is_same<First, Second>::value) && Unique< TypeList<Second, Tail...> >::value }; }; // value is true iff there is a repeated type in Types... template<typename... Types> struct RepeatedType { typedef TypeList<Types...> MyListOfTypes; typedef typename Sort< MyListOfTypes >::type MyListOfTypesSorted; enum { value = !Unique< MyListOfTypesSorted >::value }; }; // A struct that creates an rvalue trivial constructed type // of any type requested. struct ProduceRequestedType { template<typename Result> operator Result() { return Result(); }; }; struct ExpressionFactory { template<typename... T> typename std::enable_if< !RepeatedType<T...>::value, UniqueExpression<T...> >::type build_expression(T...) const { return ProduceRequestedType(); }; template<typename... T> typename std::enable_if< RepeatedType<T...>::value, RepeatedExpression<T...> >::type build_expression(T...) const { return ProduceRequestedType(); }; }; // Simple testing code for above: int main() { auto foo1 = ExpressionFactory().build_expression( LabelName<'a'>(), LabelName<'b'>(), LabelName<'a'>() ); typedef decltype(foo1) foo1Type; if (foo1Type::is_unique) std::cout << "foo1 is unique\n"; else std::cout << "foo1 is repeated\n"; auto foo2 = ExpressionFactory().build_expression( LabelName<'q'>(), LabelName<'a'>(), LabelName<'b'>(), LabelName<'d'>(), LabelName<'t'>(), LabelName<'z'>() ); typedef decltype(foo2) foo2Type; if (foo2Type::is_unique) std::cout << "foo2 is unique\n"; else std::cout << "foo2 is repeated\n"; } 

In addition, I would like to add criticism of your code: template programming is programming - your names are equivalent to using "i1" through "i9" for integer variables in a function. Apply your names to meaningful names when doing something non-trivial.

How does it compile on Intel?

+3
source

All Articles