Complex well multiplication is defined as:
((c1a * c2a) - (c1b * c2b)) + ((c1b * c2a) + (c1a * c2b))i
So, your 2 components in the complex number will be
((c1a * c2a) - (c1b * c2b)) and ((c1b * c2a) + (c1a * c2b))i
Suppose you use 8 floats to represent 4 complex numbers, defined as follows:
c1a, c1b, c2a, c2b c3a, c3b, c4a, c4b
And you want to do (c1 * c3) and (c2 * c4) at the same time, your SSE code will look "something" like this:
(Note: I used MSVC under windows, but the principle will be the same).
__declspec( align( 16 ) ) float c1c2[] = { 1.0f, 2.0f, 3.0f, 4.0f }; __declspec( align( 16 ) ) float c3c4[] = { 4.0f, 3.0f, 2.0f, 1.0f }; __declspec( align( 16 ) ) float mulfactors[] = { -1.0f, 1.0f, -1.0f, 1.0f }; __declspec( align( 16 ) ) float res[] = { 0.0f, 0.0f, 0.0f, 0.0f }; __asm { movaps xmm0, xmmword ptr [c1c2]
What I did above, I simplified the math a bit. Assuming the following:
c1a c1b c2a c2b c3a c3b c4a c4b
Rebuilding, I end with the following vectors
0 => c1a c1b c2a c2b 1 => c3b c3b c4b c4b 2 => c3a c3a c4a c4a 3 => c1b c1a c2b c2a
Then I multiply 0 and 2 together to get:
0 => c1a * c3a, c1b * c3a, c2a * c4a, c2b * c4a
Then I multiply 3 and 1 together to get:
3 => c1b * c3b, c1a * c3b, c2b * c4b, c2a * c4b
Finally, I flip the signs of the pair of floats to 3
3 => -(c1b * c3b), c1a * c3b, -(c2b * c4b), c2a * c4b
So I can add them together and get
(c1a * c3a) - (c1b * c3b), (c1b * c3a ) + (c1a * c3b), (c2a * c4a) - (c2b * c4b), (c2b * c4a) + (c2a * c4b)
This is what we were after :)