I have the next version of C and ASM (presumably) of the same code. What it does is load 2 128-bit ints represented by two 64-bit ints, each of which registers (the first 4 * lower 32 bits, then 4 * higher 32 bits) and ADD / ADC for them. This is fairly simple code, and the ARM / ST manuals actually give the same example with 96-bit (3 ADD / ADC s).
Both versions work for simple calls (repeatedly adding (1 << x++) or 1..x). But for longer tests, the ARM node does not work (the board freezes). ATM I have no trap / debugging capability and I cannot use any printf() or liked it to find a test loss, which in any case does not matter, because there should be some basic error in the ASM version, so how version C works as expected.
I donβt understand, it is quite simple and very close to C assembly (without branching). I tried the "memory" limit (it was not necessary), I tried to keep the hyphen between the lower and upper 64-bit in the register and add this later using ADD(C).W , alignment using LDR / STR from LDRD / STRD etc. I assume the board is wrong, because some addition goes wrong and leads to a division by 0 or something like that. ASM GCC is lower and uses a similar basic technique, so I don't see a problem.
I'm really looking for the fastest way to make an addition, rather than fixing this code. It's a shame that you need to use constant register names, because there are no restrictions for specifying rX and rX+1 . It is also impossible to use as many registers as GCC, because they will be exhausted during the compilation process.
typedef struct I128 { int64_t high; uint64_t low; } I128; I128 I128add(I128 a, const I128 b) { #if defined(USEASM) && defined(ARMx) __asm( "LDRD %%r2, %%r3, %[alo]\n" "LDRD %%r4, %%r5, %[blo]\n" "ADDS %%r2, %%r2, %%r4\n" "ADCS %%r3, %%r3, %%r5\n" "STRD %%r2, %%r3, %[alo]\n" "LDRD %%r2, %%r3, %[ahi]\n" "LDRD %%r4, %%r5, %[bhi]\n" "ADCS %%r2, %%r2, %%r4\n" "ADC %%r3, %%r3, %%r5\n" "STRD %%r2, %%r3, %[ahi]\n" : [alo] "+m" (a.low), [ahi] "+m" (a.high) : [blo] "m" (b.low), [bhi] "m" (b.high) : "r2", "r3", "r4", "r5", "cc" ); return a; #else
GCC C version using "armv7m-none-eabi-gcc-4.7.2 -O3 -ggdb -fomit-frame-pointer -falign-functions = 16 -std = gnu99 -march = armv7e-m":
b082 sub sp, #8 e92d 0ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp} a908 add r1, sp,
Error of my ASM version:
b082 sub sp, #8 b430 push {r4, r5} a902 add r1, sp,