The answer to my previous question showed that Haskell represents plusWord2# as llvm.uadd.with.overflow . I want to make a long addition with a carry, for example, how the x86 x86 instruction works. This instruction not only adds two arguments, but also adds the contents of the carry bits.
Then you can add long numbers as follows:
ADD x1 y1 ADC x2 y2 ADC x3 y3 ...
The result in one team per word (excluding any movements, etc.).
I looked at the GMP library and how he made a long addition to his general C code. Here is an excerpt from mpn/generic/add_n.c
sl = ul + vl; cy1 = sl < ul; rl = sl + cy; cy2 = rl < sl; cy = cy1 | cy2;
Note that it saves the carry bit from both the original addition and from the addition of carry bits. Only one of these operations can be carried forward, so the transfer may be too long.
GMP obviously has specific assembler code for certain platforms, but I thought that common code would be a good basis, since it would apparently be written to compile into reasonable code. The operation of the plusWord2# primitive means that I don’t need to do stupid comparisons to get the carry bit, so I implemented the general GMP code in Haskell, as shown below:
addWithCarry :: Word
Unfortunately, this leads to the fact that the x86 code saves the carry bit into the register, and then adds the carry bit to it own, and also adds two numbers, resulting in three or four commands per word instead of one.
So I'm wondering how I can combine llvm.uadd.with.overflow to create an x86 ADC instruction chain to implement multi-point arithmetic. If I had LLVM code that provided an efficient long add-on, I was hoping I could then convert it back to the Haskell primitive operating systems to get an efficient add-on directly from the Haskell code.
Notes:
I could, of course, use Haskell FFI to invoke an inline assembly or GMP, but that would stop inlining, and I suspect you're relatively slow compared to inline code for small (i.e. <= 256 bits) operands.
I found that “clang” has __builtin_addc , a form of adding three arguments that takes not only two numbers, but also GHC does not compile through clang, so I don’t see how useful this is.