TI DSP Programming - Is C Enough or Do I Need Assembler?

I am going to write some image processing software for the Texas Instruments DaVinci platform. There are tools suitable for programming in C, but I wonder if you can really use the DSP processor to the full without resorting to assembler. Do you know of any speed comparisons between programs written in C and in assembler on this DSP platform?

+6
performance assembly signal-processing texas-instruments davinci
source share
8 answers

I used some other DSP DSPs, and C was usually fine. The usual approach is to start by writing everything in C and then profiling the code to see if you need to optimize something manually.

You can often do optimizations in C by tuning C code until you get the desired build result. It is important to know how the DSP works and what methods work faster or slower.

+10
source share

The TI compiler for C64x / C64x + DSP on OMAP3 includes support for the fact that TI calls "internal" function calls. They are not really function calls, they are just a way of telling the compiler which assembly operation code to use for an operation that cannot be directly expressed in C. This is especially useful for using the SIMD operation codes in C64x / C64x + DSP from C.

An example could be:

A = _add2 (B, C);

This SIMD command adds the low / high 16 bits of B and C together and stores the results in the low / high 16 bits of A. You cannot express it in regular C, but you can do it with the built-in C.

I used the built-in C to get closer to what you could do with a full-blown assembly language (within 5-10%). This is especially useful for video functions such as filtering and motion compensation (_dotpsu4!).

I usually compile with the -al switch and look at the pipeline to try to determine which function blocks are overloaded, and then look at my internal properties to see if I can rebalance the loop (if I use too many S-blocks, I could see if I can change the opcode to use block M).

It is also useful to remember that the C64x DSP has 64 registers, so loading local variables and never assigning the output of the command back to the same variable - this will adversely affect the compiler's ability to pipeline correctly.

+9
source share

C-Compiler (as far as I checked) does not fully use the architecture.

But you can avoid this because the DSP can be fast enough for the operations you need to perform.

So, it comes down to testing and profiling the code to see the details that need to be accelerated in order to make the system work.

+6
source share

C is usually a good place to start. You can get a general structure and algorithms to quickly hack and write most of the plumbing, which moves data around real math. Once this happens, and you are happy that your data structures are correct, you can look in the profiler and find out which routines should be manually compressed.

+6
source share

Depends on the C compiler and your definition of "fast enough." Standard C compilers often try to efficiently use special DSP hardware, for example:

  • Several memory banks that can be accessed in parallel
  • Fixed Point Data Types
  • Circular buffers
+2
source share

a simple speed comparison means nothing. Definitely c if more convenient than assembler. You need to measure the time cost of your system, if the c code meets your requirements for speed, you do not need to use assembler. If speed is not enough, you can profile your code, find out the most expensive source code, such as a loop code, and then optimize it!

+2
source share

I would stick with C until I find out that there is a hot spot that can benefit from the build. This is the “profiling” method that I use. You may be surprised that there are ways to speed up code that is not "hot", but rather intermediate function calls that may be removed.

+1
source share

Compilation using -O3 optimization. It is very powerful.
If it is not good enough, you can further optimize the generated assembler code to your liking, instead of encoding everything yourself in ASM from scratch.

0
source share