Shared memory access in C ++

I have a huge tree that can occupy up to several gigabytes. The node structure is shown below. You will notice that I made the last element an array of size 1. The reason for this is that I can override Node with flexible sizes. Similar to what C supports as a flexible element of an array. Instead, I could use std::unique_ptr<T[]> or std::vector<T> , but the problem is that with every node tree there is double dynamic allocation, double indirect use and extra cache misses. In my last test, this made my program take about 50% more time, which is simply not acceptable for my application.

 template<typename T> class Node { public: Node<T> *parent; Node<T> *child; /* ... */ T &operator[](int); private; int size; T array[1]; }; 

The easiest way to implement operator[] is this.

 template<typename T> T &Node::operator[](int n) { return array[n]; } 

It should work perfectly in the most reasonable C ++ implementations. But since the C ++ standard allows insane implementations to check the bounds of an array, since, as I know, this technically causes undefined behavior. Then can I do this?

 template<typename T> T &Node::operator[](int n) { return (&array[0])[n]; } 

I am a bit confused. The [] operator for primitive types is just syntactic sugar before * . Thus, (&array[0])[n] equivalent to (&*(array + 0))[n] , which, I think, can be cleared as array[n] , doing everything the same as the first. Good, but I can still do it.

 template<typename T> T &Node::operator[](int n) { return *(reinterpret_cast<T *>(reinterpret_cast<char *>(this) + offsetof(Node<T>, array)) + n); } 

Hopefully now I am free from possible undefined behaviors. Perhaps the built-in assembly will show my intent better. But do I really have to go this far? Can someone clarify all this for me?

By the way, T always a type of POD. All Node also a POD.

+5
source share
3 answers

First of all, an implementation can freely change the order of class members in all but trivial cases. Your case is not trivial because it has access specifiers. Unless you make your POD class or anything else that it called in C ++ 11 (a trivial layout?), You are not guaranteed that your array is actually laid out last.

Then, of course, flexible members do not exist in C ++.

All is not lost. Allocate a piece of memory large enough to accommodate both your class and your array, and then place the new class at the beginning and interpret the part that comes after the object (plus any paddibg to ensure proper alignment) as an array.

If you have this , the array can be accessed with

 reinterpret_cast<T*>( reinterpret_cast<char*>(this) + sizeof(*this) + padding)) 

where adfing is chosen so that sizeof(T) divides sizeof(*this) + padding .

See std :: make_shared` for inspiration. It also packs two objects into one dedicated memory block.

+1
source

The main problem with accessing the array "out of bounds" is that there is no object there. This is not the capital index itself that causes the problem. Now in your case, supposedly there is raw memory in the intended location. This means that you can create a POD object through assignment. Any subsequent read access will find the object there.

The main reason is that C really has no array bounds. a[n] is just *(a+n) , by definition. So, the first two proposed forms are already identical.

I would be a little more worried about any addition for T array[1] , which you will refer to as part of array[1] .

+1
source

You also wondered if there is an alternative approach. Given your recent comment about โ€œno redistributionโ€, I would save the array data as a pointer to the storage allocated by the heap, but:

Trees have predictable access patterns, from root to baby. So I will have Node::operator new and make sure that the child nodes are distributed immediately after their parent. This gives you a terrain of links when walking on a tree. Secondly, I will have another allocator for the array data and make it return contiguous memory for the parent array and their first child (the first grandson followed, of course).

As a result, the node and its array do not have the locality of the links between them, but instead you get the locality of the link for both the tree graph and the associated data of the array.

It is possible that an array data allocator may be a trivial pool allocator for a tree. Just select 256 Kbytes of pieces at a time, and lay them out a few ints at a time. All the status that you need to track is that you have already allocated 256 kB. This is much faster than std::vector<T, std::allocator> can achieve, because it cannot know that memory lives as long as the tree lives.

0
source

All Articles