Combining Prefixes in SSE

In SSE, the prefixes 066h (operand override) 0F2H (REPNE) and 0F3h (REPE) are part of the opcode.

On non-SSE, 066h switches between 32-bit (or 64-bit) and 16-bit operation. 0F2H and 0F3h are used for string operations. They can be combined so that 066h and 0F2H (or 0F3h ) can be used in the same instruction, because that makes sense. What is the behavior in the SSE instruction? For example, we have (now ignoring mod / rm):

0f 58 β†’ addps

66 0f 58 β†’ addpd

f2 0f 58 β†’ addsd

f3 0f 58 β†’ addss

But what is it?

66 f2 0f 58

And what about?

f2 66 0f 58

Not to mention that it has two conflicting REP prefixes:

f2 f3 0f 58

What is the specification for them?

+7
assembly x86 sse
source share
1 answer

I don’t remember seeing any specification about what you would expect in the case of insanely combining random prefixes, so I think that the behavior of the processor may be β€œundefined” and possibly specific to the processor. (It is clear that some things are indicated, for example, in Intel documents, but many cases are not considered). And some combinations may be reserved for future use.

My naive assumptions, as a rule, were that additional prefixes would be non-ops, but there is no guarantee. This seems reasonable given that, for example, some optimization guides recommend using multiple NOP bytes (canonically 90h ) by prefix 66h , for example:

 db 66h, 90h; 2-byte NOP db 66h, 66h, 90h; 3-byte NOP db 66h, 66h, 66h, 90h; 4-byte NOP 

However, I also know that the CS and DS segment redefinition prefixes have new features like the SSE2 branch hint prefix (predicts branch = 3Eh = DS override; forecast branch is not accepted = 2Eh = CS override) when applying conditional branch instructions.

Anyway, I reviewed your examples above, always setting XMM1 all 0 and XMM7 all 0FFh on

 pxor xmm1, xmm1 ; xmm1 <- 0s pcmpeqw xmm7, xmm7 ; xmm7 <- FFs 

and then the corresponding code with arguments xmm1, xmm7 . What I observed (32-bit code on Win64 and Intel T7300 Core 2 Duo):

1) there is no change for addsd by adding the 66h prefix

 db 66h addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF 

2) there are no changes for addss by adding the prefix 0F2h

 db 0f2h addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF 

3) However, I noticed a change by addpd to 0F2h :

 db 0f2h addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF 

In this case, the result in XMM1 was 0000000000000000FFFFFFFFFFFFFFFFh instead of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh .

So, my conclusion is that you cannot make any assumptions and expect <undefined behavior . I would not be surprised if you could find some clues in the fog of Agner .

+3
source share

All Articles