What is the "rep; nop;" mean in assembly x86?

  • What does rep; nop mean rep; nop rep; nop ?
  • Is this the same as the pause statement?
  • Is this the same as rep nop (no semicolon)?
  • What is the difference with a simple nop instruction?
  • Does this apply equally to AMD and Intel processors?
  • (bonus) Where is the official documentation for these instructions?



Motivation on this issue.

After some discussion in the comments of another question, I realized that I did not know what rep; nop; means rep; nop; rep; nop; in assembly x86 (or x86-64). And also I could not find a good explanation on the Internet.

I know that rep is a prefix that means "repeat the next cx times command" (or at least that was in the old 16-bit x86 build). According to this Wikipedia pivot table , it seems that rep can only be used with movs , stos , cmps , lods , scas (but perhaps this restriction has been removed on newer processors). So, I would think that rep nop (without a semi-colony) would repeat the operation nop cx times.

However, after further searching, I was even more embarrassed. Rep seems to be rep; nop rep; nop and pause map to exactly the same opcode , and pause has slightly different behavior than just nop . Some old mail since 2005 said different things:

  • "try not to burn too much energy"
  • "this is equivalent to nop with only 2 byte encoding."
  • "it's magic on Intel. Its kind of" nop, but let another HT brother work "
  • "This is a break from Intel and a quick addition to Athlon."

With these different opinions, I could not understand the correct meaning.

It is used in the Linux kernel (on i386 and x86_64 ) along with this comment: /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ Also used used in BeRTOS , with the same comment.

+69
assembly x86 x86-64 cpu
Aug 16 '11 at 23:12
source share
2 answers

rep; nop rep; nop really matches the pause statement ( F390 ). It can be used for assemblers that do not yet support the pause command. On previous processors, it just did nothing, like nop , but in two bytes. On new processors that support hyperthreading, it is used as a hint to the processor that you run spinloop to improve performance. From Intel's Reference Guide :

Improves the performance of spin-wait loops. When performing a spin-wait cycle, the Pentium 4 or Intel Xeon processor experiences a serious performance decrease when exiting the cycle, since it detects a possible violation of the memory order. The PAUSE statement tells the processor that the code sequence is a wait cycle. The processor uses this advice in order to avoid disturbing the memory order in most situations, which greatly improves processor performance. For this reason, it is recommended that the PAUSE statement be placed in all wait wait cycles.

+64
Aug 16 '11 at 23:22
source share

Prefixes that do not apply to the command are ignored. However, future processors may use this sequence of bytes to encode a new instruction. (Yes, the x86 operations space is so limited that they do crazy things like this, and yes, it complicates the decoders.)

In this case, this means that you can use pause in spinloops without breaking compatibility . Older processors that are not aware of pause will decode it as NOP without prejudice. On new processors, you get the advantage of energy-saving / HT friendliness, and you avoid mistakenly speculating with memory when the memory you rotate into changes and you leave the spin cycle.




Links to Intel manuals and many other useful materials on the wiki x86 help page: / tags / x86 / info

Another case of the meaningless rep prefix, which becomes a new instruction for new processors: lzcnt - F3 0F BD /r . On processors that do not support this instruction (there is no LZCNT function flag in their CPUID), it is decoded as rep bsr , which works the same as bsr . Thus, on older processors, it produces 32 - expected_result and undefined when the input signal was zero.




One case of a meaningless rep prefix that probably will never decode differently: rep ret is used by default by gcc when targeting β€œshared” CPUs (ie, not targeting a specific processor with -march or -mtune ), and don't target AMD K8 or K10.) It will be several decades before anyone can make a processor that decodes rep ret as something other than ret because it is present in most binaries on most Linux distributions. See What does `rep ret` mean?

+6
Nov 10 '15 at 20:46
source share



All Articles