Is memcpy somehow accelerated on an iPhone?

Question

Is memcpy somehow accelerated on an iPhone?

A few days ago, I wrote code, and I noticed that copying RAM with memcpy was much faster than copying into a for loop.

Now I don't have any measurements (maybe I did some time later), but since I remember the same RAM block that for qas copied to 300 ms or more with memcpy was copied after 20 ms or less.

Is memcpy hardware removed?

+4

iphone memcpy

grunge fightr Apr 24 '11 at 18:49

source share

5 answers

Chris jester-young · Answer 1 · 2011-04-24T18:52:06+0000

Well, I can't talk about Apple compilers, but gcc definitely treats memcpy as being built-in .

Matti Virkkunen · Answer 2 · 2011-04-24T18:57:54+0000

The built-in memcpy implementation tends to be optimized quite strongly for the platform in question, so it will usually be faster than the naive one for the loop.

Some optimizations include as many copies as possible at a time (not single bytes, but whole words, or if this processor supports it, even more), some degree of loop rotation, etc. Of course, the best optimization course depends on the platform, so it is usually better to stick with the built-in function.

In most cases, this is written by more experienced people than the user anyway.

Pete wilson · Answer 3 · 2011-04-24T18:58:23+0000

Sometimes memo-mem DMA is implemented in processors, therefore, yes, if such a thing exists on the iPhone, then, most likely, memcpy () uses it. Even if it was not implemented, I am not surprised at the advantage of 15 to 1, which, apparently, memcpy () has a character over your copy.

Moral 1: Always prefer memcpy () to strcpy (), if possible.
Moral 2: always prefer memmove () to memcpy (); always.

Toad · Answer 4 · 2011-11-19T22:01:29+0000

The newest iPhone has SIMD instructions on the ARM chip, allowing 4 simultaneous calculations. This includes moving memory.

In addition, if you create a highly optimized memcpy, you usually unroll the cycles by a certain amount and implement it as a duffs device

Jay · Answer 5 · 2011-04-24T18:59:12+0000

It looks like the ARM processor has instructions that can copy 48 bits to each access. I would argue that the lower overhead in large pieces is what you see.

Is memcpy somehow accelerated on an iPhone?

More articles: