C ++ Low Latency Design: Dispatch v / s CRTP Feature for Factory Implementation

Question

C ++ Low Latency Design: Dispatch v / s CRTP Feature for Factory Implementation

As part of system design, we need to implement a factory pattern. In combination with the factory pattern, we also use CRTP to provide a basic set of functions that can then be configured with Derived classes.

Sample code below:

class FactoryInterface{ public: virtual void doX() = 0; }; //force all derived classes to implement custom_X_impl template< typename Derived, typename Base = FactoryInterface> class CRTP : public Base { public: void doX(){ // do common processing..... then static_cast<Derived*>(this)->custom_X_impl(); } }; class Derived: public CRTP<Derived> { public: void custom_X_impl(){ //do custom stuff } };

Although this design is confusing, it offers several advantages. All calls after calling the initial virtual function can be embedded. Calling the derived class custom_X_impl is also efficient.

I wrote a comparison program to compare behavior for a similar implementation (hard loop, callbacks) using function pointers and virtual functions. This design was a triumph for gcc / 4.8 with O2 and O3.

However, as a C ++ consultant told me yesterday, any call to a virtual function in a large runtime program can take variable time, given cache misses, and I can achieve potentially better performance using the C-style table function tables and gcc function hot lists. However, I still see 2x cost in my sample program mentioned above.

My questions: 1. Is the statement of the guru true? For any answers, are there any links I can link to. 2. Is there any low latency implementation that I can reference that has a base class that calls a user-defined function in a derived class using function pointers? 3. Any suggestions for improving the design?

Any other feedback is always welcome.

+8

c ++ gcc low-latency class-design crtp

Sid Mar 14 '15 at 18:07

source share

1 answer

Christophe · Accepted Answer · 2015-03-14T18:39:30+0000

Your guru refers to the hot attribute of the gcc compiler. The effect of this attribute :

The function is optimized more aggressively, and for many purposes it is placed in a special subsection of the text section, so all the hotter functions are displayed close to each other, improving locality.

So yes, in a very large code base, the hotlisted function can remain in the cache, ready to be executed without delay, because it skips avodis caching.

You can use this attribute for member functions:

 struct X { void test() __attribute__ ((hot)) {cout <<"hello, world !\n"; } };

But...

When you use virtual functions, the compiler usually generates a vtable that is shared between all the objects in the class. This table is a table of function pointers. And really - your guru is right - nothing guarantees that this table will remain in cached memory.

But if you manually create a "C-style" function index table, the problem will be EXACTLY ONE. Although a function may remain in the cache, nothing guarantees that your function table will also remain in the cache.

The main difference between the two approaches is as follows:

in the case of virtual functions, the compiler knows that the virtual function is a “hot spot” and it can decide to also keep the vtable in the cache (I don’t know if gcc can do this or if there are plans to do this).
in the case of a manual function pointer table, your compiler will not easily deduce that the table belongs to a hot spot. Thus, this attempt at manual optimization can very strongly affect the opposite.

My opinion: never try to optimize yourself, which can make the compiler much better.

Conclusion

Trust your criteria. And trust your OS: if your function or your data is often used, there is a good chance that a modern OS will take this into account in its management of virtual memory and regardless of what the compiler will generate.

C ++ Low Latency Design: Dispatch v / s CRTP Feature for Factory Implementation

But...

Conclusion

More articles: