ARM Volume Effect Thumb / Thumb-2

I am working on an ARM Cortex-M3 controller with a Thumb-2 instruction set.

Loudspeaker mode is used to compress instructions to a 16-bit size. Thus, the code size is reduced. But with the usual Thumb mode, why is it said that performance is decreasing?

In the case of Thumb-2, the indicated performance is improved according to these two links:

Improve performance when one 16-bit command limits the functions available to the compiler.

The stated goal for Thumb-2 was to achieve code density similar to Thumb, with performance similar to ARM instructions installed in 32-bit memory.

What is this performance? Can someone give some examples related to this?

+6
source share
2 answers

Compared to the 32-bit ARM instruction set, the 16-bit thumb instruction set (not to mention thumb extensions2) takes up less space because instructions are half the size, but overall this is a performance hit because it requires more instructions to do the same as on the hand. There are fewer functions in the instruction set, and most instructions work only with the r0-r7 registers. Comparing Apples to Apples more instructions to perform the same action slower.

Now thumb2 extensions accepts previously undefined thumb instructions and creates 32-bit thumb instructions. Understand that there are several sets of thumb extensions. ARMv6m adds a couple dozen, perhaps. ARMv7m adds something like 150 instructions to the instruction set of the thumb, I don’t know what ARMv8 or the future is. Thus, assuming ARMv7m, they bridge the gap between what you can do with your thumb and what you can do in ARM. Thus, thumb2 is a smaller ARM instruction set like thumb, but not as small. Thus, more instructions may be required to do the same in thumb2 (suppose plus the thumb) compared to ARM, doing the same.

This gives an idea of ​​the problem, one instruction in the hand and its equivalent in the thumb.

ARM and r8,r9,r10 THUMB push {r0,r1} mov r0,r8 mov r1,r9 and r0,r1 mov r1,r10 and r0,r1 mov r8,r0 pop {r0,r1} 

Now the compiler will not do this, the compiler will know that it is aiming at the thumb and does something different, choosing other registers. You still have fewer registers and fewer options for each command:

 mov r0,r1 and r0,r2 

It still executes two instruction / execution cycles and two registers together without changing the operands and puts the result in the third register. Thumb2 has three registers, and so you return to the same instruction using the thumb2 extensions. And this thumb2 command allows r0-r15 in any of these three registers, where the thumb is limited to r0-r7.

Look at the ARMv5 Architecture Reference Guide, for each thumb instruction, it shows an equivalent ARM instruction. Then go to this ARM instruction and compare what you can do with this instruction, which you cannot do with the thumb instruction. This is a one-way path in which thumb (not thumb2) pointers are one-to-one with the ARM instruction. all thumb pointers have equivalent hand instructions. but not all hand instructions have equivalent thumb instructions. You should be able to see the compiler limit from this exercise when using the thumb instruction set. Then get the ARMv7m Architectural Reference Manual and look at the set of commands and compare the "all thumb options" encodings (those that include ARMv4T) and those that are limited by ARMv6 and / or v7 and see an extension of functions between the thumb and thumb2, and there are also thumb indexes2, which have not the slightest analogue. This should clarify that compilers should work with thumb and thumb2. You can go so far as to compare finger + thumb2 with fully bloated ARM instructions (is ARMv7 AR what it is called?). And look that thumb2 is much closer to ARM, but you lose, for example, conditional expressions for each command, so conditional execution on the finger becomes a mapping to branching over the code, where in ARM there can sometimes be if-then-else without branching ..

+6
source

Thumb-2 introduced variable-length instructions to the original Thumb; commands can now be a mixture of 16-bit and 32-bit. This means that you retain the advantage of the size of the original Thumb in everyday code, but now you have access to an almost complete set of ARM functions in more complex code, but without the additional ARM-interworking costs that Thumb had previously encountered.

In addition to the aforementioned access to the full set of registers from all operations with the register, Thumb-2 added back conditional execution without maintenance in the form of an IF-THEN (IT) block. The original Thumb removed the conditional execution ARM feature of a trademark on almost all instructions; this is now achieved in Thumb-2 by adding an IT instruction with conditions for the next four teams.

In addition, the instruction set itself was significantly expanded; for example, Cortex-M4F implements the DSP extension as well as the FPv4-SP floating-point extension. In fact, I believe that even NEON can be encoded in Thumb2.

+6
source

All Articles