Optimal recursive template structure packaging without loss of alignment

Question

Optimal recursive template structure packaging without loss of alignment

I have a structure of 4 type fields that come from the template parameters:

template <typename T1, typename T2, typename T3, typename T4> struct __attribute__((aligned(8))) four_tuple { typedef struct { T1 t1; T2 t2; T3 t3; T4 t4; } payload; payload p; };

Each type T1 , T2 , T3 and T4 guaranteed to be a primitive type or type four_tuple<...>::payload . Warranties are recursive — you can think of a structure as quadtree encoding, whose leaf nodes are primitive types.

My goal is for the structure to have the smallest possible sizeof , provided that all leaf nodes are properly aligned. The tools allowed for optimization are specializations of class templates using:

reordering of fields T1 , T2 , T3 , T4
adding filler fields
gcc attribute packed on payload
could there be others?

I feel that there is a smart solution to this problem using enable_if and SFINAE. Can anyone find it?

To illustrate the problem, if we use the above implementation of as-is using Foo = four_tuple<char,double,char,double> , we will have size 32 for the payload and in general. If we just declare the packed payload, double will not be aligned. The specialization of the template, which reorders the fields in descending order (here, double, double, char, char ), will give a payload and a total size of 24. But the additional 6 bytes that it uses are wasteful, as can be seen from the consideration using Bar = four_tuple<Foo::payload,int,int,int> . The optimal Bar packing can fit in 32 bytes, but with this scheme 40 is required. Overusing the reordering of fields with packed will result in the int to Bar - some filler is needed.

I know that with a general restructuring, the layout memory of structure fields may have performance implications due to caching considerations, and that in general these consequences will be at least as significant as any potential benefits of better packaging. I would like to learn the trade-offs though, and I cannot do it right in my context without solving this problem.

+6

c ++ c ++ 11 memory-alignment

dshin Jan 08 '16 at 7:28

source share

1 answer

hvd · Accepted Answer · 2016-01-08T23:35:24+0000

The big problem with your nested tuple is that you want to have a field of type four_tuple<char,double,char,double>::payload aligned just as if it were four_tuple<char,double,char,double> , but not requiring the container type to inherit its alignment. It's complicated. Doing this is possible, but it makes your code very incapable of anything other than GCC. I assume that everything is in order as you already offer the GCC extension in your question. The basic idea is that bit fields can be used to insert gaskets to ensure alignment:

 struct __attribute__((packed)) S { char c; // at offset 0 int i; // at offset 1, not aligned int : 0; int j; // at offset 8, aligned int : 0; int k; // at offset 12, no extra padding between j and k };

int , of course, a very specific type with very specific alignment, and you need dynamically defined alignment. Fortunately, GCC allows char -type bit fields, which usually only provide byte alignment, which should be combined with alignas , guaranteeing arbitrary alignment.

With this, you can check all 24 possible field orders and select the payload that gives the lowest overall size. I made the payload a global type and gave it an additional template parameter to indicate the order of the fields. This allows tuple4<T1, T2, T3, T4> to check tuple4_payload<T1, T2, T3, T4, 1234> , tuple4_payload<T1, T2, T3, T4, 1243> etc. In order and choose depending on which is better.

 template <typename...> struct smallest; template <typename...T> using smallest_t = typename smallest<T...>::type; template <typename T> struct smallest<T> { using type = T; }; template <typename T, typename...Ts> struct smallest<T, Ts...> { using type = std::conditional_t<sizeof(T) <= sizeof(smallest_t<Ts...>), T, smallest_t<Ts...>>; }; template <typename T1, typename T2, typename T3, typename T4> struct tuple4; template <typename T1, typename T2, typename T3, typename T4, int fieldOrder> struct tuple4_payload; template <typename T1, typename T2, typename T3, typename T4> struct tuple4_simple { T1 t1; T2 t2; T3 t3; T4 t4; }; template <typename T> struct extract_payload { using type = T; }; template <typename...T> struct extract_payload<tuple4<T...>> { using type = typename tuple4<T...>::payload; }; template <typename T> using extract_payload_t = typename extract_payload<T>::type; #define PERMS \ PERM(1,2,3,4) PERM(1,2,4,3) PERM(1,3,2,4) PERM(1,3,4,2) PERM(1,4,2,3) PERM(1,4,3,2) \ PERM(2,1,3,4) PERM(2,1,4,3) PERM(2,3,1,4) PERM(2,3,4,1) PERM(2,4,1,3) PERM(2,4,3,1) \ PERM(3,1,2,4) PERM(3,1,4,2) PERM(3,2,1,4) PERM(3,2,4,1) PERM(3,4,1,2) PERM(3,4,2,1) \ PERM(4,1,2,3) PERM(4,1,3,2) PERM(4,2,1,3) PERM(4,2,3,1) PERM(4,3,1,2) PERM(4,3,2,1) #define PERM(a,b,c,d) \ template <typename T1, typename T2, typename T3, typename T4> \ struct __attribute__((packed)) tuple4_payload<T1, T2, T3, T4, a##b##c##d> { \ char : 0 alignas(T##a); extract_payload_t<T##a> t##a; \ char : 0 alignas(T##b); extract_payload_t<T##b> t##b; \ char : 0 alignas(T##c); extract_payload_t<T##c> t##c; \ char : 0 alignas(T##d); extract_payload_t<T##d> t##d; \ }; PERMS #undef PERM #define PERM(a,b,c,d) , tuple4_payload<T1, T2, T3, T4, a##b##c##d> template <typename, typename...T> using tuple4_smallest_payload_t = smallest_t<T...>; template <typename T1, typename T2, typename T3, typename T4> struct alignas(tuple4_simple<T1, T2, T3, T4>) tuple4 : tuple4_smallest_payload_t<void PERMS> { using payload = tuple4_smallest_payload_t<void PERMS>; }; #undef PERM

In your case, you would use this as tuple4<int, tuple4<char, double, char, double>, int, int> . Note that although the payload type is not explicitly mentioned here, it will still be used for the t2 member.

Optimal recursive template structure packaging without loss of alignment

More articles: