Weird SSE with features

I played with D built-in assembler and SSE, but found something that I do not understand. When I try to add two float4 vectors right after the declaration, the calculation is correct. If I put the calculation in a separate function, I get a series of nan s.

 //function contents identical to code section in unittest float4 add(float4 lhs, float4 rhs) { float4 res; auto lhs_addr = &lhs; auto rhs_addr = &rhs; asm { mov RAX, lhs_addr; mov RBX, rhs_addr; movups XMM0, [RAX]; movups XMM1, [RBX]; addps XMM0, XMM1; movups res, XMM0; } return res; } unittest { float4 lhs = {1, 2, 3, 4}; float4 rhs = {4, 3, 2, 1}; println(add(lhs, rhs)); //float4(nan, nan, nan, nan) //identical code starts here float4 res; auto lhs_addr = &lhs; auto rhs_addr = &rhs; asm { mov RAX, lhs_addr; mov RBX, rhs_addr; movups XMM0, [RAX]; movups XMM1, [RBX]; addps XMM0, XMM1; movups res, XMM0; } //end identical code println(res); //float4(5, 5, 5, 5) } 

The assembly is functionally identical (as far as I can tell) this link .

Edit: I am using a custom float4 structure (now its just an array) because I want to have an add function like float4 add(float4 lhs, float rhs) . At the moment, this leads to a compiler error:

Error: expected floating point constant expression instead of rhs

Note. I am using DMD 2.071.0

+6
source share
1 answer

Your code is wierd, which version of dmd are you using? This works as intended:

 import std.stdio; import core.simd; float4 add(float4 lhs, float4 rhs) { float4 res; auto lhs_addr = &lhs; auto rhs_addr = &rhs; asm { mov RAX, lhs_addr; mov RBX, rhs_addr; movups XMM0, [RAX]; movups XMM1, [RBX]; addps XMM0, XMM1; movups res, XMM0; } return res; } void main() { float4 lhs = [1, 2, 3, 4]; float4 rhs = [4, 3, 2, 1]; auto r = add(lhs, rhs); writeln(r.array); //float4(5, 5, 5, 5) //identical code starts here float4 res; auto lhs_addr = &lhs; auto rhs_addr = &rhs; asm { mov RAX, lhs_addr; mov RBX, rhs_addr; movups XMM0, [RAX]; movups XMM1, [RBX]; addps XMM0, XMM1; movups res, XMM0; } //end identical code writeln(res.array); //float4(5, 5, 5, 5) } 
+2
source

All Articles