If you try to avoid casting in a sensible way due to the hacking of the data structure, you will end up shuffling the memory / words around which will kill any performance you hope to get from NEON.
You can probably flush quadrants in double registers, but another way might not be possible.
It all comes down to this. Each command has several bits for register indexing. If a command expects quad registers, it will count two-on-two registers such as Q (2 * n), Q (2 * n + 1) and use only n in the encoded instruction, (2 * n + 1) will implicit for the kernel, If any point in the code that you are trying to make two doubles into a square, you may be in a position where they will not sequentially force the compiler to move around the registers onto the stack and back to get a consistent layout.
I think this is the same answer in different words fooobar.com/questions/1215783 / ...
NEON instructions are for streaming, you load large chunks from memory, process them, and then save what you want. It should be very simple mechanics, if you do not lose the extra performance that it offers, which will make people ask why you are trying to use Neon, first of all, making life more difficult for yourself.
Think of NEON as immutable value types and operations.
source share