As Michael Barr said, although the overhead is highly dependent on your platform, the overhead is definitely less than the time it takes to send them by cable.
rough estimate:
800 Mbit / s payload on excellent gigabit wire, 25 M-floats / second.
On one core of 2 GHz, which gives you a whopping 80 clock cycles for each value converted to breakeven, itโs still less and you will save time. This should be more than enough for all architectures :)
A simple load storage cycle (prohibition of all cache delays) should be below 5 cycles per value. With alternating instructions, SIMD extensions and / or parallelization across multiple cores, you are likely to perform multiple conversions in a single loop.
In addition, the recipient will be happy to process only half of the data. Remember that memory access time is non-linear.
The only thing that can be against conversion is that the transfer should have a minimum load on the processor: modern architecture can transfer data from disk / memory to the bus without CPU intervention. However, with the above figures, I would say that in practice this does not matter.
[edit]
I checked some numbers, the 385 coprocessor would really take about 70 cycles for the load-store cycle. In the original pentium, you carry three cycles without any parallelization.
So, if you did not run the 386 gigabit network ...
peterchen
source share