Std :: embedded AVX array

I don’t know if something is missing in my understanding of how the internal functions of AVX work with std::array , but I have a strange problem with Clang when I combine these two.

Code example:

 std::array<__m256, 1> gen_data() { std::array<__m256, 1> res; res[0] = _mm256_set1_ps(1); return res; } void main() { auto v = gen_data(); float a[8]; _mm256_storeu_ps(a, v[0]); for(size_t i = 0; i < 8; ++i) { std::cout << a[i] << std::endl; } } 

Exit Clang 3.5.0 (the top 4 floats are garbage data):

  1
 1
 1
 1
 8.82272e-39
 0
 5.88148e-39
 0 

Exit from GCC 4.8.2 / 4.9.1 (expected):

  1
 1
 1
 1
 1
 1
 1
 1 

If I instead pass v to gen_data as the output parameter, it works fine on both compilers. I agree that this could be a bug in Clang, however I don't know if this could be undefined (UB) behavior. Testing with Clang 3.7 * (svn build) and Clang seems to give my expected result. If I switch to 128-bit SSE functions ( __m128 ), then all compilers will give the same expected results.

So my questions are:

  • Is there any UB here? Or is it just a bug in Clang 3.5.0?
  • As far as I understand, __m256 is just a 32-byte aligned memory chunk? Or is there something special about this that I have to be careful?
+5
source share
1 answer

It looks like this is a clang error that is now fixed, we can see this error report from this report, which demonstrates a very similar problem using regular arrays.

Assuming std::array implements its storage like this:

 T elems[N]; 

which both libc++ and libstdc++ seem to be running, this should be similar. One comment says:

However, lib ++ std::array<__m256i, 1> does not work at any optimization level.

The error report was based on this SO question: Is this incorrect code generation by arrays of __m256 values ​​clang error? which is very similar, but deals with the case of a regular array.

The error report contains one possible workaround, which is enough to declare to the OP:

In my actual code, num_vectors computed based on some C ++ template parameters for type simd_pack . In many cases, this is 1, but also often greater than 1. Your observation gives me an idea; I could try introducing a template specialization that catches the case when num_vectors == 1 . Instead, you can use only one __m256 element instead of an array of size 1. I will need to check as much as possible.

+5
source

Source: https://habr.com/ru/post/1215964/


All Articles