I donβt remember seeing any specification about what you would expect in the case of insanely combining random prefixes, so I think that the behavior of the processor may be βundefinedβ and possibly specific to the processor. (It is clear that some things are indicated, for example, in Intel documents, but many cases are not considered). And some combinations may be reserved for future use.
My naive assumptions, as a rule, were that additional prefixes would be non-ops, but there is no guarantee. This seems reasonable given that, for example, some optimization guides recommend using multiple NOP bytes (canonically 90h ) by prefix 66h , for example:
db 66h, 90h; 2-byte NOP db 66h, 66h, 90h; 3-byte NOP db 66h, 66h, 66h, 90h; 4-byte NOP
However, I also know that the CS and DS segment redefinition prefixes have new features like the SSE2 branch hint prefix (predicts branch = 3Eh = DS override; forecast branch is not accepted = 2Eh = CS override) when applying conditional branch instructions.
Anyway, I reviewed your examples above, always setting XMM1 all 0 and XMM7 all 0FFh on
pxor xmm1, xmm1 ; xmm1 <- 0s pcmpeqw xmm7, xmm7 ; xmm7 <- FFs
and then the corresponding code with arguments xmm1, xmm7 . What I observed (32-bit code on Win64 and Intel T7300 Core 2 Duo):
1) there is no change for addsd by adding the 66h prefix
db 66h addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF
2) there are no changes for addss by adding the prefix 0F2h
db 0f2h addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF
3) However, I noticed a change by addpd to 0F2h :
db 0f2h addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF
In this case, the result in XMM1 was 0000000000000000FFFFFFFFFFFFFFFFh instead of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh .
So, my conclusion is that you cannot make any assumptions and expect <undefined behavior . I would not be surprised if you could find some clues in the fog of Agner .