Should I rewrite my DSP routines in C / C ++, or am I familiar with unsafe pointers to C #?

I am currently writing a C # application that does a lot of digital signal processing, which is associated with a lot of small, finely tuned xfer operations. I wrote these routines using unsafe pointers, and they seem to work much better than I thought. However, I want the application to be as fast as possible.

Do I get any benefit from rewriting these routines in C or C ++, or should I stick to unsafe pointers? I would like to know what unsafe pointers bring to the table in terms of performance compared to C / C ++.

EDIT: I am not doing anything special inside these routines, just normal DSP stuff: cached data is transferred from one array to another with a lot of multiplications, additions, bit shifts, etc. on this way. I would expect the C / C ++ routines to look about the same (if not the same) as their C # components.

EDIT: Many thanks to everyone for all the smart answers. What I found out is that I won’t get any significant performance improvement just by making a direct port unless there is some kind of SSE optimization. Assuming all modern C / C ++ compilers can take advantage of this, I look forward to trying it. If anyone is interested in the results, just let me know and I will send them somewhere. (It may take some time).

+4
source share
13 answers

In fact, I really did what you ask only in the field of image processing. I started with unsafe C # pointers, and then switched to C ++ / CLI, and now I encode everything in C ++. And in fact, from there I switched from pointers in C ++ to the instructions of the SSE processor, so I went all the way. They haven’t reached assembler yet, although I don’t know if I should do it, I saw an article on CodeProject that showed that SSE can be as fast as the built-in assembler, I can find it if you want.

What happened to my move, my algorithm switched from 1.5-2 frames per second to C # with unsafe pointers up to 40 frames per second. C # and C ++ / CLI were definitely slower than C ++, even with pointers, I couldn’t get above 10 frames per second with these languages. As soon as I switched to C ++, I got about 15-20 frames per second instantly. A few more smart changes and SSEs gave me up to 40 frames per second. So yes, is it worth going down if you want speed in my experience. There is a clear increase in productivity.

+16
source

Another way to optimize DSP code is to cache it. If you have many filters to apply to your signal, you must apply all filters to each point, i.e. Your innermost loop should be above the filters, not over the data, for example:

for each n do t´[n] = h(g(f(t[n]))) 

This way you will unload the cache much less and most likely get a good speed increase.

+9
source

I think you should write your DSP routines either in C ++ (managed or unmanaged) or in C # using a solid design, but not trying to optimize everything from the very beginning, and then you should profile your code and find narrow places and try to optimize them.

Attempting to create “optimal” code from the very beginning will distract you from writing working code in the first place. Remember that 80% of your optimization will affect only 20% of your code, since in many cases only 10% of your code is responsible for 90% of your CPU time. (YMMV, as it depends on the type of application)

When I tried to optimize our use of alpha blending in our graphical toolset, I tried to first use the bare metal SIMD: the built-in assembler. I soon learned that it is better to use SIMD functions on top of a clean assembly, since the compiler can optimize readable C ++ with internal functions by rebuilding individual opcodes and maximizing the use of various processors in the CPU.

Do not underestimate the power of your compiler!

+6
source

Do I get any performance benefit from rewriting these routines in C / C ++ or should I stick to unsafe pointers?

In theory, this would not matter - the ideal compiler optimizes the code, whether C or C ++, in the best assembler.

In practice, however, C is almost always faster, especially for pointer-type algorithms. It is as close as possible to machine codes without coding in the assembly.

C ++ does not bring anything to the table in terms of performance - it is built as an object-oriented version of C, with much greater ability and ease of use for the programmer. Although for some things it will work better, because this application will be useful from the point of view of an object-oriented point of view, it was not intended to work better - it was intended to provide one more level of abstraction in order to simplify the programming of complex applications.

So, no, you most likely will not see a performance increase by switching to C ++.

However, for you, it is most likely more important to find out what it is necessary so as not to waste time on it - I think it would be advisable to transfer it and analyze it. It is possible that if your processor has specific instructions for use in C ++ or Java, and the compiler knows about them, it may use functions not available in C. It is unlikely, but possible.

However, DSP processors are, as you know, complex animals, and the closer you are to collecting, the better the performance you can get (i.e., by more manually tuning your code). C is much closer to assembly than C ++.

-Adam

+4
source

First, let me answer the question about “safe” and “insecure”: you said in your message “I want the application to be as fast as possible”, which means that you do not want to communicate with “safe” or “managed” pointers (don't even mention garbage collection).

As for your choice of languages: C / C ++ makes working with basic data much easier, without any overhead associated with the fancy containers that everyone uses these days. Yes, it’s good to be embraced by containers that prevented you from seg-faulting ... but a higher level of abstraction related to containers triggers your performance.

In my work, our code should work quickly. An example is our multiphase job repeaters that play with pointers and masking operations and fixed-point DSP filtering ... none of these smart tricks are possible without a low level of memory management and bit manipulation ==>, so I say "stick with C / C ++.

If you really want to be smart, write all your DSP code at a low C level. And then mix it up with safer containers / guided pointers ... when it gets to speed, you need to remove the training wheels .. they slow you down too much.

(FYI, regarding removing the training wheels: you need to check your C DSP code off-line to make sure that their use of the pointer is good ... o / w this will crash.)

EDIT: ps "seg faulting" is LUXURY for all PC / x86 developers. When you write the embedded code ... a seg error means that your processor will go into wuides and will only be restored by power cycling;).

+3
source

To know how you get performance gains, it's good to know the parts of the code that can cause bottlenecks.

Since you are talking about small transfers of memory, I assume that all the data will go into the processor cache. In this case, the only advantage you can achieve is to know how to work with internal processors. Typically, the compiler most familiar with internal processors is the C compiler. Therefore, I think you can improve performance by porting.

Another bottleneck will be due to skipping the CPU cache and memory due to the large number of memory transfers in your application. The biggest gain will be to minimize cache misses, which depend on the platform you are using and on the location of your data (locally or distributed through memory?).

But since you are already using unsafe pointers, you have this bit under your own control, so I assume the following: on this aspect, you are unlikely to benefit from port to C (or C ++).

Completion: you can transfer small parts of your application to C.

+2
source

Seeing that you are writing unsafe code already, I suppose it would be relatively easy to convert this to C dll and call them from C #. Do this after you have identified the slowest parts of your program, and then replace them with C.

+1
source

Your question is largely philosophical. The answer is: dont't optimize until you profile.

You ask if you will improve. Well, you get an improvement of N percent. If this is enough (for example, you need a code that executes 200 times in 20 milliseconds on any embedded system), you're fine. But what if this is not enough?

First you need to measure, and then find out whether it is possible to rewrite some parts of the code in one language, but faster. Perhaps you can reverse engineer data structures to avoid unnecessary calculations. Perhaps you can skip some reallocation of memory. Perhaps something is done with quadratic complexity when it can be done with linear complexity. You will not see it until you measure it. This is usually much less time-consuming than just rewriting everything in another language.

+1
source

C # does not support SSE (but there is a monoproject for SSE operations). Therefore, C / C ++ with SSE will definitely be faster.

However, you must be careful with the transitions from managed to native and family to managed, as they are quite expensive. Stay as long as possible in any world.

+1
source

Do you really want the application to be as fast as possible or just fast enough? This tells you what you should do next.

+1
source

If you insist on sticking to a manual roll without manual optimization in assembler or the like, C # should be fine. Unfortunately, this is a question that can only really be answered experimentally. You are already in an unmanaged pointer space, so I feel that the direct port in C ++ does not see a significant difference in speed.

I must say, however, that I have had a similar problem lately, and we ended up discarding the manual handle after trying Intel Integrated Performance Performances . The performance improvements we saw there were very impressive.

+1
source

Mono 2.2 now has SIMD support, so you can get the best out of managed code and raw speed in the world.

Perhaps you should also take a look at Using SSE in C # perhaps?

+1
source

I would suggest that if you have any algorithms in the DSP code that need to optimize them, you should really write them in the assembly, not in C or C ++.

In general, with modern processors and hardware, there are not many scenarios that require or require optimization efforts. Have you really identified any performance issues? If not, then it is probably best to stick with what you have. Unsafe C # is unlikely to be significantly slower than C / C ++, in most cases simple arithmetic.

Have you considered C ++ / CLI? Then you could get the best of both worlds. It will even allow you to use inline assembler if required.

0
source

All Articles