Best strategy for profiling memory usage of my code (open source) and third-party code (closed source)

Question

Best strategy for profiling memory usage of my code (open source) and third-party code (closed source)

Soon, I will be tasked with making the correct memory profile for C / C ++ code, and will use CUDA to take advantage of GPU processing.

My initial thoughts were to create macros and operator overloads that would keep track of calls in malloc, free, delete, and new calls in my source code. I could just include a different header and use the __FILE__ and __LINE__ to print the memory calls to the log file. This type of strategy can be found here: http://www.almostinfinite.com/memtrack.html

What is the best way to track this usage in a linked third-party library? I guess I can pretty much track memory usage before and after function calls, right? In my macro / overload script, I can simply track the size of the requests to find out how much memory is being requested. How can I tell how much a third-party library uses? I also understand that tracking "for free" really does not give you any idea of how much code is used at any given time, since it does not necessarily return to the OS. I appreciate any discussion of this.

I really don't want to use any memory profiling tools like Totalview or valgrind, because they usually do a lot of other functions (checking boundaries, etc.) that seem to make the program very slow. Another reason for this is that I want it to be somewhat thread safe - the software uses MPI, I believe that it spawns processes. I am going to analyze this in real time so that I can flush log files or something that can be read by another process to visualize memory usage as the software starts up. It will also be primarily run on Linux.

thanks

+7

c ++ memory-management profiling cuda

Derek May 31, '11 at 19:27

source share

7 answers

ronag · Answer 1 · 2011-05-31T21:32:19+0000

Maybe valgrind and the Massif tool?

vrince · Answer 2 · 2011-06-01T13:18:27+0000

To track real-time memory consumption for my Linux programs, I just read /proc/[pid]/stat . This is a fairly easy operation, it may be insignificant in your case, if the third party library that you want to track does the subsequent work. If you want to have memory information while a third-party library is running, you can read the stat file in an independent thread or in another process. (A memory peak is rarely added before or after a function call! ...)

For CUDA / GPU, I think gDEBugger can help you. I'm not sure, but the memory analyzer doesn’t have a big impact on performance.

Charles L Wilcox · Answer 3 · 2011-06-03T01:28:19+0000

You can try the Google PerfTools coachman:

http://google-perftools.googlecode.com/svn/trunk/doc/heapprofile.html

He is very light; it literally replaces malloc / calloc / realloc / free to add toolkit code. It was primarily tested on Linux platforms.

If you compiled with debugging symbols, and your third-party libraries come with options for the debugging version, PerfTools should work just fine. If you don't have debug libraries, create your code anyway using debug symbols. This will give you detailed numbers for your code, and all the rest can be attributes of a third-party library.

St0rm · Answer 4 · 2011-07-05T22:49:21+0000

If you do not want to use an “external” tool, you can try using tools such as:

mtrace
It installs handlers for malloc, realloc and is free and writes each operation to a file. See Wikipedia I aligned for code usage examples.
dmalloc
This is a library that you can use in your code, and you can find memory leaks, errors that are set in turn, and the use of invalid addresses. You can also disable it at compile time with -DDMALLOC_DISABLE.

In any case, I would prefer not to use this approach. Instead, I suggest that you try and stress test your application while running it on a test server under valgrind (or any equivalent tool) and make sure that you are allocating memory correctly, and then let the application work without any verification of the memory allocation in production to increase speed. But, in fact, it depends on what your application is doing and what your needs are.

Marko · Answer 5 · 2015-07-30T13:59:46+0000

Maybe the linker option --wrap = can help you. A really good example can be found here: man ld

xDD · Answer 6 · 2011-05-31T19:56:39+0000

You can use the profiler included with Visual Studio 2010 Premium and Ultimate.

It allows you to choose between different methods of measuring performance, the most useful for you will probably be the selection of the processor, since it freezes your program at arbitrary time intervals and determines what functions it currently performs, thereby not making your program much slower.

M. Tibbits · Answer 7 · 2011-07-06T03:44:05+0000

I believe this question has two very separate answers. One for land C / C ++. And second place for the CUDA land.

On the processor:

I wrote my own replacements for new and deleted. They were terribly slow and didn't help much. I used totalview. I like the totalview for debugging OpenMP, but I very slowly agree to debug the memory. I have never tried valgrind. I heard similar things.

The only memory debugging tool I encountered with its salt is the Intel Parallel Inspector Memory Check. Note. Since I am a student, I managed to get an education license at a low price. . This suggests that it is amazing. It took me twelve minutes to find a memory leak that looked like half a million lines of code - I did not release the thrown error object, which I caught and ignored. I like this piece of software so much that when my raid failed / Win 7 ate my computer (I think auto-update and raid recovery at the same time), I stopped everything and rebuilt the computer, because I knew that it would take less time to recover the double boot (48 hours) than this might find a memory leak in another way. If you do not believe my outlandish statements, download the evaluation version .

On the GPU:

I think you're out of luck. For all the memory problems in CUDA, I essentially had to grow my own tools and wrappers around cudaMalloc , etc. It's not beautiful. nSight will buy something for you, but at the moment it’s not much more than just “here, how much you have allocated riiiight now”. And on this sad note, almost every performance issue I had with CUDA depended directly on my access to memory patterns (this or the size of my stream).

Best strategy for profiling memory usage of my code (open source) and third-party code (closed source)

More articles: