I donβt know if something is missing in my understanding of how the internal functions of AVX work with std::array , but I have a strange problem with Clang when I combine these two.
Code example:
std::array<__m256, 1> gen_data() { std::array<__m256, 1> res; res[0] = _mm256_set1_ps(1); return res; } void main() { auto v = gen_data(); float a[8]; _mm256_storeu_ps(a, v[0]); for(size_t i = 0; i < 8; ++i) { std::cout << a[i] << std::endl; } }
Exit Clang 3.5.0 (the top 4 floats are garbage data):
1
1
1
1
8.82272e-39
0
5.88148e-39
0
Exit from GCC 4.8.2 / 4.9.1 (expected):
1
1
1
1
1
1
1
1
If I instead pass v to gen_data as the output parameter, it works fine on both compilers. I agree that this could be a bug in Clang, however I don't know if this could be undefined (UB) behavior. Testing with Clang 3.7 * (svn build) and Clang seems to give my expected result. If I switch to 128-bit SSE functions ( __m128 ), then all compilers will give the same expected results.
So my questions are:
- Is there any UB here? Or is it just a bug in Clang 3.5.0?
- As far as I understand, __m256 is just a 32-byte aligned memory chunk? Or is there something special about this that I have to be careful?
source share