Object Orientation, Data Orientation, Cache Pollution, and Cache Evidence

Question

Object Orientation, Data Orientation, Cache Pollution, and Cache Evidence

In ordinary object-oriented practice, it is not that rare objects have several unrelated element properties. And when objects are processed, it is not uncommon that this is done in different passages that are aimed at different parts of their properties.

In this regard, the typical approach to creating collections of objects is apparently not very effective. Given how computers access memory and the average size of cache lines, it is very likely that the cache is filled with what is not required, but just turned out to be contiguous, so it ends up wasting cache bandwidth and adding stop and latency to execution.

Worse is the practice of using polymorphism and dynamic allocation of objects, without memory pools and user allocators. In this case, not only the cache is filled with unnecessary data, but also because of arbitrary addresses used in the allocation of dynamic memory, prefeters also can not work adequately.

The salvation is to go back to the times before OOP and choose the data orientation, which, apparently, is the choice of preference for the development of critical applications, operating systems, etc. But why not use a hybrid version of the two? Sort Programming Data-Oriented Objects ?

After this long overture, let me get to the question. I do not have a large enough project to test the effectiveness of this concept, so theoretical community expertise is very welcome.

Well, instead of objects that store their own data items, they only store a link to collections, where their data members are stored sequentially in their containers, and their member methods return data from these containers, so there’s a chance of unnecessary data ending in the way to the CPU should be reduced, and the data chances needed in the near "future" will increase. The logical assumption is that this approach will improve the efficiency of dopers, the number of caches and the efficiency of use, as well as reduce the delays associated with automatic and manual parallelization.

What do you think?

Later editing: applying a “data orientation template” can be even more useful if you take into account the structure of the structure and class, if the “model” has char and int data elements, in OOP mode it will be supplemented, which will only pollute the cache further, but data-oriented data storage mode can store all char and all int sequentially, without spaces or cache in general.

+7

performance oop caching data-oriented-design memory-efficient

dtech Sep 11 '12 at 8:20

source share

2 answers

Luiz RP Santiago · Answer 1 · 2012-09-11T14:53:20+0000

First of all, a good slide presentation. Well, as I understand it, your question is not at all like a presentation. Variables are stored randomly in main memory, even object attributes. If you try to allocate memory for a structured data structure, the size of your data structure will be limited by the largest “air bubble” in your main memory, otherwise it will not be purely arbitrary. Maybe you thought something like this:

 class MyClass { public: MyClass() { m_dataMembers = new GenericObject[DATA_MEMBERS_AMOUNT]; //initialize array values... } int getMyVar() { return (int)m_dataMembers[MY_VAR_INDEX]; } //others functions... private: GenericObject* m_dataMembers; }

This way you will have problems. First, you will need a general class of objects to store any variables. Then you will need to know where each variable is located in your data structure, and then you will need to find out the type of each variable in your data structure in order to correctly convert to getters. What he actually does in the presentation reduces his class size by using links, so that he goes to the cache page better and reduces the use in the cache, not in the main memory. I hope I do not understand you.

Team upvote · Answer 2 · 2017-12-23T10:16:57+0000

I see that object-level polymorphism is inherently expensive if you use it at the level of very thin, granular objects, such as an abstract IPixel interface. In this case, the video processing software revolving around IPixel dependencies would be nice to screw in terms of efficiency, since it has no respite for optimization. In addition to the cost of dynamically sending per pixel, even the virtual pointer required here can be larger than the entire pixel, doubling or tripling the memory usage. In addition, we can no longer play with pixel representations in ways that go beyond the scope of one pixel, and, worst of all, neighboring pixels in the image may not even be displayed adjacent to the memory.

Meanwhile, IImage can offer many opportunities for optimization, since the image models a collection / container of pixels and still has great flexibility (for example: another specific image representation for each pixel format). Now dynamic distribution to the image is cheap, and the size of the virtual pointer is negligible for the entire image. We can also explore how we represent pixels in our heart content in ways that allow us to process multiple pixels at the same time. Thus, I see it as you do, as an object of design at an appropriate level of rudeness, which often involves collections of things in order to reduce all the overhead and barriers that represent optimization.

And what, instead of objects storing their own data items, they only store a reference to the collection, where their data members are stored sequentially in their containers, and their member methods return data from these containers, thus, the chances of unnecessary data ending on the way to CPUs should be reduced and the chances of data needed in the near "future" are increasing.

I like this idea, but you can work your way back to specialized memory allocators and sorting base pointers if you go too far into a polymorphic context. In cases where I often find application for this type of design, it is necessary to improve the usability of one element in those cases when it is necessary to aggregate it to increase efficiency (one case is in a container using the SoA view, and the other I will discuss below).

Polymorphic cases are not always beneficial, because the inherent problem is the heterogeneous processing of granular things once in a while. To restore efficiency, we must restore the uniformity of critical loops.

Inhomogeneous critical loops

Take an example Orc inherits Creature and Human inherits Creature , Elf inherits Elves , but humans and orcs and elves have different sizes / fields, different alignment requirements and different vtables, in this case, when the client code wants to process a heterogeneous list of them, storing polymorphic basic pointers for such creatures:

 for each creature in creatures: creature.do_something();

... by contrast, which will sacrifice polymorphism:

 for each orc in orcs: orc.do_something(); for each human in humans: humans.do_something(); for each elf in elves: elves.do_something();

... which would be a real PITA for expansion, if we need to do this in many places every time we introduce a new type of creature ...

... then if we want to preserve the polymorphic solution, but still process each creature once unilaterally, we still lose temporal and spatial locality, regardless of whether each creature is just storing the back pointer in the container or not. We are losing temporary locality on vtables, since we can access one vtable at one iteration, and then another vtable at the next iteration. The memory access patterns here can also be random and sporadic, which leads to a loss of spatial locality, so we end up losing caching.

So, the solution for me in this case, if you want to inherit and polymorphism, is to abstract at the container level: Orcs inherits Creatures , Humans inherit Creatures . Elves inherits Creatures . This brings some extra complexity to the client code when it wants to express the operations to execute for a specific creature, but now the aforementioned sequential loop can be written as follows:

 for each creatures in creature_types: creatures.do_something();

In which at the first iteration there may be something to do with the whole list of orcs (which may look like millions of orcs stored in an array). Now all orcs in this list can be stored contiguously, and we apply uniform functionality to all orcs in this list. We have a boat with a breather to optimize in this case without changing the design.

We still have a heterogeneous loop using polymorphism, but now it’s so cheaper that we pay only the overhead for the entire container of creatures, and not for each individual creature. The processing cycles of individual creatures are now uniform. This is similarly equivalent to using abstract IImage to, for example, brighten up a bunch of images (a bunch of pixel containers) instead of doing it with one abstract pixel object that implements IPixel at a time.

Homogeneous Loops and Views

Thus, the shift of the critical loops of heavy lifting away from inhomogeneous contours processes all kinds of different data once all over the place and moves them to homogeneous contours, processing homogeneous data stored in contact.

And this is the general strategy that I look at the interface design. If he is inclined to give hot spots in ways that are difficult to optimize, then the inherent problem, as I see it, is that the interface was designed at a too narrow level ( Creature , not Creatures ).

So, as I see it, to approach this problem, if you want to use OOP. Where I think your design idea might be useful is to simplify cases where client code needs to express an operation that applies to only one specific creature, after which they can work through some kind of proxy object that points to a container and may contain an index or a pointer to a specific record to make it convenient, for example, CreatureHandle , which refers to a specific record in one of the abstract Creatures containers.

Object Orientation, Data Orientation, Cache Pollution, and Cache Evidence

More articles: