Rust compiles the loop into:
.LBB0_1: movupd xmm0, xmmword ptr [rcx + 8*rax - 48] movupd xmm1, xmmword ptr [rcx + 8*rax - 32] addpd xmm0, xmm0 addpd xmm1, xmm1 movupd xmmword ptr [rcx + 8*rax - 48], xmm0 movupd xmmword ptr [rcx + 8*rax - 32], xmm1 movupd xmm0, xmmword ptr [rcx + 8*rax - 16] movupd xmm1, xmmword ptr [rcx + 8*rax] addpd xmm0, xmm0 addpd xmm1, xmm1 movupd xmmword ptr [rcx + 8*rax - 16], xmm0 movupd xmmword ptr [rcx + 8*rax], xmm1 add rax, 8 cmp rax, 100006 jne .LBB0_1
So far, GCC 7.1.0 compiles to:
L6: movsd (%rbx), %xmm0 addq $8, %rbx addsd %xmm0, %xmm0 movsd %xmm0, -8(%rbx) cmpq %rbp, %rbx jne L6
Rust places an array in a data section, and C actually writes (a memset with a picture) to memory. This means that your OS running the application most likely displays a range and relies on virtual memory to do the right thing.
If you change the code to run the same cycle before the measurement, the execution time will be significantly reduced. This is actually faster than the C version on my machine. (possibly due to the unfolding of the loop)
unsafe { for i in 0..STREAM_ARRAY_SIZE { A[i] = 2.0E0 * A[i]; } } let now = Instant::now(); unsafe { for i in 0..STREAM_ARRAY_SIZE { A[i] = 2.0E0 * A[i]; } } let duration = now.elapsed();
viraptor
source share