Why new and delete so slowly in a loop in MSVC 2010

I had a problem when I tried to create and delete an instance of a class in a loop. The iteration time is completely different. As I understand it, this is due to the removal of objects from memory. However, I do not understand the behavior of this operation. Why is the time different? How to fix it? Time stays the same when I delete an object in a separate thread.

class NODE{ public: NODE(){} NODE* add(NODE* node) { children.push_back(node); return node; } virtual ~NODE() { for(vector<NODE*>::iterator it = children.begin(); it != children.end(); ++it) { delete *it; } } vector<NODE*> children; }; NODE* create() { NODE* node( new NODE() ); for (int i=0; i<200;i++) { NODE* subnode = node->add( new NODE()); for (int k=0; k<20; k++) subnode->add( new NODE()); } return node; } int main() { NODE* root; unsigned t; for (int i=0; i<30; i++){ t = clock(); cout << "Create... "; root = create(); delete root; cout<< clock()-t << endl; } } 

ADDED: I'm confused. When I run the program from VS, it works fine ...

+4
source share
5 answers

In addition to the other answers saying that you have to consider that the heap of Visual C ++ 10 runtime is thread safe. This implies that even when you have only one thread, you are faced with some overhead, which facilitates heap operations that are thread safe. So one of the reasons you see such poor results is because you are using a universal, but rather slow heap implementation.

The reason you get different times with / without the debugger is that a special heap is used when the program starts in the debugger (even in the Release configuration) , and this special heap is also relatively slow.

0
source

When you create an object, sometimes a new block of memory is allocated, and sometimes it fits into an existing block. This will cause the two distributions to potentially occupy different time intervals.

If you want to make temporary allocations and frees consistent, process them inside your application - grab a large chunk of memory and resource allocation from this block. Of course, when you need another piece, the distribution that makes it happen will take longer ... But with large pieces this should not happen often.

The only way to make the allocation and release from responsibility will take a completely agreed amount of time, this will slow down the fast ones until they take the maximum amount of time that any request can spend. This will be Harrison Bergeron's approach to optimizing performance. I do not recommend it.

+5
source

Heaps in real time currently exist, but in general, memory heap operations (dynamic allocations and deallocations) are a canonical example of a non-deterministic operation . This means that the execution time varies and does not even have a very good binding.

The problem is that you usually need contiguous blocks of memory merged into one block when they happen. If this is not done, in the end you will only have a bunch of tiny blocks, and a large selection may fail, although in fact there is really enough memory for it. For any given call, a merger may or may not occur, and the amount to be made may differ. It all depends on the distribution / release scheme your system has recently performed, which you do not plan at all. Therefore, we call it "non-deterministic."

If you do not like this behavior, there are two possibilities:

  • Switch to real-time heap usage. Your OS probably does not have a built-in device, so you have to buy or download it and use it for all operations with your memory. The one I used in the past is TLSF .
  • Do not dynamically allocate / free memory in your main loop (IOW: not after initialization). This is how we, real-time programmers, have learned to program for centuries.
+1
source

In the general case, it is impossible to predict the time of allocation / deallocation of memory. For example, times can vary greatly if a bunch of user space ends up on pages and requires more requests from the kernel, and later when accessing a new highlighted page causes a page error.

So, even if you move forward and implement your own heap using large chunks of memory, your allocation times will vary, because laziness is in the nature of the underlying memory system.

0
source

If you really want to know why its slow, you need to run a real profiler on it, such as AMD codeanalyst , clock not an accurate high timer timer.

The reason it will work every time depends on the paging of the base system, the processor load, and whether your data has been cached by the processor.

0
source

All Articles