With the restriction that I can only use SSE and SSE2 commands, I need to replace the least significant (0) element of the 4-element vector __m128i with element 0 from another vector.
For floating point vectors, the task is simple: you can use the built-in _mm_move_ss () to force the element to replace element 0 from another vector. It generates one movss command, therefore it is quite efficient.
Using two built-in functions, you can also convince the compiler to use one SSE movss command to move integer data. The source code is as follows:
__m128i NewVector = _mm_castps_si128(_mm_move_ss(_mm_castsi128_ps(Take3FromThisVector), _mm_castsi128_ps(Take1FromThisVector)));
It looks a little dirty, but with a suitable number of comments it may be acceptable, especially since it generates a minimum of instructions. In its typical use, everything is optimized for storage in hmm registers.
My question is this:
Since this is a movss instruction, where "ss" means a single precision single-point , it is normal for it to move integer data that could potentially contain some "special" or "illegal" (floating point) combo bits in any of vector positions?
The obvious alternative, which I also implemented and tested, is AND the first vector with a mask, then OR in the second vector, which contains only one value in the least significant element, while all the others are zero. As you can imagine, this gives more instructions.
I tried the casting approach, which I showed above, and this does not seem to cause any problems, but I, in particular, note that there is no embedded code that performs the same operation for integer data. It seems that Intel would provide one if it were good for integer data - like _mm_move_epi32 or the like. And so I am skeptical about this, this is a good idea.
I performed some searches, for example, "can the movss command raise a floating point exception", but did not find any information that could answer my question.
Thanks in advance for the knowledge you are willing to share.
-Noel