Obviously, this is a difficult question. This question may also include processor cells. And probably there is not a single answer that would be correct for other related questions.
In my experience, any implementation performed in an abstract way, that is, a compiled implementation at a high level or at a machine level, will inevitably have a cost of execution, especially in the implementation of a complex algorithm. This applies to both FPGAs and processors of any type. An FPGA designed specifically for implementing a complex algorithm will work better than an FPGA whose processing elements are common, which allows it to be programmed from input control, data input / output control registers, etc.
Another common example where FPGAs can be much higher are cascading processes, where the inputs for another become the process outputs, and they cannot be executed simultaneously. Cascading processes in FPGAs are simple and can significantly reduce memory I / O requirements, while processor memory will be used to efficiently cascade two or more processes that have data dependencies.
The same can be said for the GPU and CPU. Algorithms implemented in C, executed on the processor, designed without taking into account their inherent performance characteristics of the cache memory or the main memory system, will not work as well as implemented. Of course, not considering that these performance characteristics simplify implementation. But at the cost of execution.
Without direct experience with the GPU, but knowing its inherent problems with the performance of the memory system, he will also be prone to performance problems.
RobW Aug 15 '09 at 14:42 2009-08-15 14:42
source share