_mm_broadcast_ss has architecture-impaired flaws that are largely hidden by the SSE mm API. The most important difference is the following:
- _mm_broadcast_ss is limited to loading values ββonly from memory.
What does this mean if you explicitly use _mm_broadcast_ss in a situation where the source is not in memory, then the result will most likely be less effective than the result of using _mm_set1_ps. This situation usually occurs when loading instantaneous values ββ(constants) or when using the result of a recent calculation. In these situations, the result will be case-sensitive by the compiler. To use a value for broadcasting, the compiler must return the value back to memory. Alternatively, pshufd can be used to splat directly from the register.
_mm_set1_ps is determined by the implementation, and is not mapped to a specific basic operation (instruction) of the processor. This means that it can use one of several SSE instructions to execute splat. A smart compiler with AVX support enabled must use vbroadcastss internally when necessary, but it depends on the state of the AVX compiler optimizer implementation.
If you are very sure that you are loading from memory - for example, iterating over an array of data, then direct use of broadcast transmission is okay. But if you have any doubts, I would recommend sticking with _mm_set1_ps.
And in the specific case of static const float you absolutely do not want to use _mm_broadcast_ss ().
source share