In managed code, what should I follow in order to maintain good performance?

I am initially a native C ++ programmer, in C ++ every process in your program is tied to your code, i.e. nothing happens if you do not want it. And every bit of memory is allocated (and freed) according to what you wrote. So, productivity is your responsibility, if you do good, you get excellent performance.

(Note. Please do not complain about code that you havenโ€™t written yourself, such as STL, this is unmanaged C ++ code in the end, this is a significant part).

But in managed code, such as code in Java and C #, you do not control every process, and the memory is โ€œhiddenโ€ to some extent or not under your control. And that makes performance something relatively unknown, basically you are afraid of poor performance.

So my question is: what problems and bold lines should I take care of and keep in mind in order to achieve good performance in managed code?

I could only think of some practices, such as:

  • Awareness of boxing and unboxing.
  • Choosing the right collection that best suits your needs and has the lowest operating costs.

But they never seem sufficient and even convincing! In fact, perhaps I should not have mentioned them.

Please note that I am not asking to compare C ++ VS C # (or Java) code, I just mentioned C ++ to explain the problem.

+7
source share
6 answers

There is no single answer. The only way to answer this question is: profile. Measure early and often. Bottlenecks are usually not where you expect them. Optimize what really hurts. For this we use mvc-mini-profiler, but any similar tool will work.

It seems you are focusing on the GC; Now it can sometimes be a problem, but usually only in certain cases; for most systems, the GC generator works great.

Obviously, external resources will be slow; caching can be critical: in odd scenarios with very long-lived data, there are tricks that you can do with structures to avoid a long collection of GEN-2; serialization (files, network, etc.), materialization (ORM) or just the wrong choice of collection / algorithm can be the biggest problem - you cannot know until you are measured.


Two things:

  • make sure you understand what IDisposable and "use" mean
  • Do not concatenate strings in loops mass concatenation is StringBuilder's task
+5
source

Reusing large objects is very important in my experience.

Objects in the heap of a large object implicitly generate 2, and therefore full GC is required to clear. And it's expensive.

+4
source

The main thing to keep in mind performance with managed languages โ€‹โ€‹is that your code can change the structure at runtime to better optimize.

For example, by default the JVM, most people use the Sun Hotspot VM, which actually optimizes your code when it is run by converting parts of the program into native code, embedding it on the fly, and other optimizations (such as CLR or other managed runtimes) that you never don't get it with C ++. In addition, Hotspot will also determine which parts of your code are used most efficiently and are optimized accordingly. Thus, you can see that optimizing performance on a managed system is a bit more complicated than on an unmanaged system, because you have an intermediate layer that can make code faster without your intervention.

Now I will refer to the law of premature optimization and say that you must first create the right solution, if performance becomes a problem, go back and measure what is really slow before trying to optimize.

+1
source

I would suggest a better understanding of garbage collection . You can find good books on this subject, for example. Garbage Collection Handbook (Richard Jones, Anthony Hosking, Eliot Moss).

Then your question is practically related to a specific implementation, and possibly even to a specific version. For example, Mono is used (for example, in version 2.4) to use Boehm's garbage collector, but now uses copy generation.

And don't forget that some GC methods can be extremely effective. Remember old A.Appel paper Garbage collection can be faster than stack distribution (but cache performance is much more important today, so the details are different).

I think it's enough to know about boxing (& unboxing) and distribution. Some compilers can optimize them (avoiding some of them).

Remember that GC performance can vary greatly. There are good GCs (for your application) and bad ones.

And some GC implementations are pretty fast. For example, inside Ocaml

I would not worry so much: premature optimization is evil.

(And memory management in C ++, even with smart pointers or with OTDRs, can often be seen as a method of garbage collection for poor people, and you do not have full control over what C ++ does, implement your ::operator new with operating system-specific calls, so you donโ€™t know a priori your performance)

0
source

.NET Generics do not specialize in reference types, which severely limits the number of nesting operations. It may (in certain performance hotspots) make sense to abandon the universal type of container in favor of a specific implementation that will be better optimized. (Note: this does not mean using .NET 1.x containers with an object element type).

0
source

you should: using large objects is very important in my experience.

Objects in the heap of a large object implicitly generate 2, and therefore full GC is required to clear. And it's expensive.

-one
source

All Articles