C ++ structure: more members, MUCH slower member access time?

I have a linked list of structures. Suppose I insert x million nodes into a linked list, then I repeat all the nodes to find the given value.

The strange thing (for me, at least) if I have a structure like this:

struct node { int a; node *nxt; }; 

Then I can iterate through the list and check the value ten times faster compared to when I have another member in the structure, for example:

 struct node_complex { int a; string b; node_complex *nxt; }; 

I also tried it using C style strings (char array), the result was the same: just because I had a different member (string), the whole iteration (+ checking the value) was 10 times slower , even if I didn't even touch this member! Now I don’t know how internal structures work, but it looks like a high price to pay ...

What is the trick?

Edit : I'm a newbie, and this is the first time I use pointers, so it's most likely an error on my part. I will send the code as soon as possible (not being at home now).

Update : I checked the values ​​again, and I know that the difference is much smaller: 2x instead of 10x. This is much more reasonable.

Although this, of course, was possible, it was yesterday, and I was so exhausted that I couldn’t separate the two numbers yesterday, I just did more tests and the results are bloated.

Time for the same number of nodes:

  • One int and time pointer for iteration trough 0.101
  • One int and line: 0.196
  • One int and 2 lines: 0.274
  • One int and 3 lines: 0.147 (!!!)
  • For two ints: 0.107

See what happens when there are more than two lines in a structure! It is getting faster! Has someone thrown LSD in my coffee? No! I do not drink coffee.

This is too much for my brain in the moment, so I think I’ll just think about it on my own, instead of depleting the public resources here in SO.

(Ad: I don’t think my profiling class is buggy, and in any case, I see the time difference with my own eyes).

Anyway, thanks for the help. Greetings.

+4
source share
5 answers

I have to be connected with memory access. You are talking about millions of related items. Using only an int and a pointer in a node, it takes 8 bytes (assuming 32-bit pointers). It takes up 8 MB of memory, the size of which depends on the size of the cache.

When you add other members, you increase the total size of your data. It no longer fits in cache memory. You are returning to simple memory accesses, which are much slower.

+7
source

It can also be caused by the fact that during iteration you can create a copy of your structures. I.e:

 node* pHead; // ... for (node* p = pHead; p; p = p->nxt) { node myNode = *p; // here you create a copy! // ... } 

Copying a simple structure is very fast. But the member you added is a string , which is a complex object. Copying is a relatively complicated operation with access to the heap.

+5
source

Most likely, the problem is that your large structure no longer fits into one cache line.

As I recall, conventional processors usually use a 32-byte cache line. This means that the data is read into the cache in blocks of 32 bytes at a time, and if you go through these 32 bytes, a second memory sample is required.

Looking at your structure, it starts with an int , which is 4 bytes (usually), and then std::string (I guess, although no namespace is specified), which in my standard library implementation (from VS2010) takes 28 bytes, which gives us only 32 bytes. This means that the initial pointer int and next will be placed in different lines of the cache, using twice as much cache space and requiring twice as much memory access if both elements are available during iteration.

If only a pointer is available, this should not change, though, since only the second line of the cache needs to be retrieved from memory.

If you always get access to the int and pointer, and the string is required less frequently, reordering the members can help:

 struct node_complex { int a; node_complex *nxt; string b; }; 

In this case, the next and int pointer are located next to each other in the same cache line, so they can be read without requiring additional memory to be read. But then you take the extra cost as soon as you need to read the string .

Of course, it’s also possible that your benchmarking code includes creating nodes or (intentional or other) copies created by nodes, which obviously also affects performance.

+3
source

I am not a spatial helper, but the problem with a "cache miss" occurs to me while reading your problem.

When you had a member, as the size of the structure becomes larger, it can also cache misses when navigating through a linked list (which, of course, is an unfriendly cache if you do not have nodes allocated in one block and not far from each other in memory).

I can not find another explanation.

However, we do not have a creation and a loop, so it’s still difficult for you to guess if you are not just using code that does not search the list in an efficient way.

+1
source

Perhaps the solution would be a linked list of pointers to your object. This can complicate the situation (if you do not use smart pointers, etc.), but it can increase the search time.

0
source

All Articles