ARM Build Outline

for (int i = 0; i < 10000; i++) a[i] = b[i] + c[i] 

What is the ARM assembly for this high-level language?

Edit: I also assume that base address A is in R8, base address B is in R9, and base address C is in R10 and A, B, C are all int arrays

Very much appreciated

I tried:

 MOV R0, #0 ; Init r0 (i = 0) Loop: a[i] = b[i] + c[i] //How to fix this? ADD R0, R0, #1 ;Increment it CMP R0, #1000 ;Check the limit BLE Loop ;Loop if not finished 
+4
source share
4 answers

Assuming that this high-level language does not contradict C, you can use the arm C compiler to create assembly code from your fragment. For example, if test.c has the following:

 void test() { register int i asm("r0"); register int *a asm("r8"); register int *b asm("r9"); register int *c asm("r10"); for (i = 0; i < 10000; i++) { a[i] = b[i] + c[i]; } } 

you can run

 arm-linux-androideabi-gcc -O0 -S test.c 

to create a test.s file that will contain the assembly code for your test function, as well as some additional materials. You can see how your loop was compiled into the assembly below.

 <snipped> .L3: mov r2, r8 mov r3, r0 mov r3, r3, asl #2 add r3, r2, r3 mov r1, r9 mov r2, r0 mov r2, r2, asl #2 add r2, r1, r2 ldr r1, [r2, #0] mov ip, sl mov r2, r0 mov r2, r2, asl #2 add r2, ip, r2 ldr r2, [r2, #0] add r2, r1, r2 str r2, [r3, #0] mov r3, r0 add r3, r3, #1 mov r0, r3 .L2: mov r2, r0 ldr r3, .L5 cmp r2, r3 ble .L3 sub sp, fp, #12 ldmfd sp!, {r8, r9, sl, fp} bx lr <snipped> 

Now the problem with this approach is to trust that the compiler generates the best code for your research, which may not always be the case, but what you get is quick answers to your questions, such as above, not waiting for people :)

- extra -

GCC allows you to put variables in specific registers, see the relevant documentation .

You can get the cheat sheet here .

Newer versions of GCC create better hand code, as expected. The above cut is generated by version 4.4.3, and I can confirm Linaro 4.7.1 confirms my statement. Therefore, if you take my approach, use the latest toolchain you can get.

+6
source

http://www.peter-cockerell.net/aalp/html/ch-5.html

 ;Print characters 32..126 using a FOR loop-type construct ;R0 holds the character MOV R0, #32 ;Init the character .loop SWI WriteC ;Print it ADD R0, R0, #1 ;Increment it CMP R0, #126 ;Check the limit BLE loop ;Loop if not finished ; 
+4
source
 for (int i = 0; i < 10000; i++) a[i] = b[i] + c[i] mov r0,#0x2700 orr r0,#0x0010 top: ldr r1,[r9],#4 ldr r2,[r10],#4 add r1,r1,r2 str r1,[r8],#4 subs r0,#1 bne top 
+1
source

To build an answer on @alpera, you can also deploy a loop to do 4 operations at once, although regardless of whether you get a performance advantage, access to memory or stopping the pipeline around a branch is a big effect

 mov r11,#0x2700 orr r11,#0x0010 top: ldmia r9!, {r0-r3} ldmia r10!, {r4-r7} add r0,r0,r4 add r1,r1,r5 add r2,r2,r6 add r3,r3,r7 stmia r8!, {r0-r3} subs r11,#4 bne top 

If you have a NEON block, we can do this too - in this case it will parallelize loads, stocks and additions - actually reducing the problem to 5 commands that execute two iterations of the loop at the same time.

The AC compiler does not generate this code by default (or paralleize for NEON), since it should assume that the buffers used for reading and writing (r8, r10 and r11) can potentially overlap - therefore, writing through r8 can read immediately in the next iterate through r9 or r10. You can use the restrict modifier ( __restrict in C ++) to tell the compiler that it is not.

+1
source

All Articles