CUDA vs FPGA?

I am developing a product with heavy calculations in 3D graphics, at the closest points and search ranges . Some hardware optimization would be helpful. Although I know little about this, my boss (who does not have software) protects FPGA (because it can be adapted), while our junior developer protects GPGPU from CUDA because it is cheap, hot and open. Although I feel that I am lacking in judgment in this matter, I believe that CUDA is the way to go because I am concerned about flexibility, our product is still in a strong development.

So, to paraphrase the question, are there any reasons for FPGAs at all? Or is there a third option?

+50
hardware cuda fpga
Nov 25 '08 at 15:35
source share
14 answers

I investigated the same question a while ago. After talking with people who worked on FPGA, this is what I get:

  • FPGAs are great for real-time systems, where even 1 ms delay can be too long. This does not apply to your case;
  • FPGAs can be very fast, especially for well-defined digital signal processing methods (for example, radar data), but good ones are much more expensive and specialized than even professional GPGPUs;
  • FPGAs are rather cumbersome to program. Since there is a hardware configuration component for compilation, this can take several hours. It seems to be more suitable for electronic engineers (who are usually those who work with FPGAs) than software developers.

If you can do the CUDA work for you, this is probably the best option at the moment. It will certainly be more flexible than FPGA.

Other options include ATI's Brooke, but until something big happens he is simply not as well received as CUDA. After that, there are still traditional HPC options (x86 / PowerPC / Cell clusters), but they are all quite expensive.

Hope this helps.

+39
Nov 25 '08 at 15:48
source share
— -

We made some comparisons between FPGA and CUDA. One thing where CUDA shines is if you can really articulate your problem in SIMD mode and access the shared memory. If the memory accesses are not combined (1), or if you have different control flows in different flows, then the GPU can dramatically lose its performance, and FPGA can surpass it. Another thing is when your operation is very small, but you have a huge amount of it. But you cannot (for example, due to synchronization) not to run it in a cycle in one core, then your access time to the GPU core exceeds the calculation time.

In addition, the power of FPGAs may be better (depending on your application scenario, i.e. the GPU is cheaper (from the point of view of Watts / Flop) when computing it all the time).

Disabling FPGA also has some drawbacks: IO can be one (we had an application, we needed 70 GB / s, no problem for the GPU, but to get this amount of data in the FPGA, which you need for regular design, more contacts than available ) Another drawback is time and money. FPGA is much more expensive than the best GPU, and development time is very long.

(1) Simultaneous access from different threads to memory should be sequential. It is sometimes very difficult to achieve.

+46
Dec 02 '08 at 13:26
source share

I would go with CUDA.
I have been working on image processing and have tried hardware add-ons for many years. First we had an i860, then Transputer, then DSP, then FPGA and direct-compiliation-to-hardware.
What was inevitable was that by the time the hardware boards were really debugging and reliable, and the code was ported to them - ordinary processors had advanced to defeat them, or the hosting architecture had changed, and we could not use the old boards or board creators went broke.

Adhering to something like CUDA, you are not tied to one small specialist manufacturer of FPGA boards. GPU performance improves faster than processors and is funded by gamers. This is the core technology and, therefore, in the future, possibly merging with multi-core processors and thus protect your investment.

+14
Nov 25 '08 at 16:20
source share

FDA

  • What you need:
    • Learn VHDL / Verilog (and trust me you won’t)
    • Buy hw for testing, licenses for synthesis tools
    • If you choose some good structure (ex: RSoC )
      • Design development (and this may take years)
    • If you do not:
      • DMA driver, hw, super expensive synthesis tools.
      • tons of knowledge about tires, memory card, hw synthesis.
      • build hw, buy ip servers
      • Design development
  • For example, an average FPGA pcie with a Xilinx virtex-6 chip costs more than $ 3,000.
  • Result:
    • If you haven’t paid by the government, you don’t have enough money.

GPGPU (CUDA / OpenCL)

  • You already have hw for testing.
  • Compare with FPGA materials:
    • Everything is well documented.
    • All cheap
    • Everything is working
    • Everything is well integrated into programming languages.
  • There is also a GPU cloud.
  • Result:
    • You just need to download sdk and you can start.
+8
Feb 21 '15 at 17:26
source share

An FPGA-based solution is more likely to be more expensive than a CUDA.

+4
Jun 24 '09 at 6:54
source share

CUDA has a fairly substantial base of sample code and SDKs , including the base BLAS server . Try to find some examples that are similar to what you are doing, perhaps also looking at the GPU Gems series of books to evaluate how well CUDA will fit your applications. In terms of logistics, I would say CUDA is easier to operate and much cheaper than any professional FPGA development toolkit.

At some point, I looked at CUDA models for modeling claims reserves. There is a pretty good series of lectures related to the training website. On Windows, you need to make sure that CUDA works on the card without displays, since the graphics subsystem has a watchdog timer that will start any process that runs for more than 5 seconds. This does not happen on Linux.

Any mahcine with two PCI-e x16 slots should support this. I used the HP XW9300, which you can get with ebay pretty cheaply. If you do, make sure that it has two CPUs (not one dual-core processor), since the PCI-e slots live on separate Hypertransport buses, and you need two processors in the machine to activate both buses.

+3
Nov 25 '08 at 16:02
source share

Obviously, this is a difficult question. This question may also include processor cells. And probably there is not a single answer that would be correct for other related questions.

In my experience, any implementation performed in an abstract way, that is, a compiled implementation at a high level or at a machine level, will inevitably have a cost of execution, especially in the implementation of a complex algorithm. This applies to both FPGAs and processors of any type. An FPGA designed specifically for implementing a complex algorithm will work better than an FPGA whose processing elements are common, which allows it to be programmed from input control, data input / output control registers, etc.

Another common example where FPGAs can be much higher are cascading processes, where the inputs for another become the process outputs, and they cannot be executed simultaneously. Cascading processes in FPGAs are simple and can significantly reduce memory I / O requirements, while processor memory will be used to efficiently cascade two or more processes that have data dependencies.

The same can be said for the GPU and CPU. Algorithms implemented in C, executed on the processor, designed without taking into account their inherent performance characteristics of the cache memory or the main memory system, will not work as well as implemented. Of course, not considering that these performance characteristics simplify implementation. But at the cost of execution.

Without direct experience with the GPU, but knowing its inherent problems with the performance of the memory system, he will also be prone to performance problems.

+3
Aug 15 '09 at 14:42
source share

I am a CUDA developer with very personal experience with FPGA: s, however I tried to find comparisons between them.

What I have done so far:

The GPU has a much higher (available) peak performance. It has a more favorable FLOP / Watt ratio. It's cheaper. It is developing faster (pretty soon you will have literally “real” TFLOP). Easier to program (read an article about this non-personal opinion)

Please note that I say “real / affordable” to distinguish the numbers that you see in the GPGPU commercial.

BUT gpu is not more favorable when you need to do random access to data. This will hopefully change with the new Nvidia Fermi architecture, which has an additional l1 / l2 cache.

my 2 cents

+3
Nov 20 '09 at 12:47
source share

This is an old thread that began in 2008, but it would be useful to find out what has happened with FPGA programming since: 1. C for FPGA gates is the main development for many companies with huge time savings and Verilog / SystemVerilog HDL. In C to the gate. The system level is the hard part. 2. OpenCL on the FPGA has been around for 4 years, including Microsoft (Asure) and Amazon F1 (API) floating-point and cloud deployments. OpenCL system design is relatively easy due to a very well defined memory model and API between the host and computing devices.

Software users just need to learn a little about the FPGA architecture in order to be able to do things that CANNOT be possible with GPUs and processors, since they are fixed silicon and do not have broadband (100 Gbit +) interfaces for the outside world. Scaling the geometry of the chips becomes impossible, and also does not release more heat from one chip package without melting, so this looks like the end of the road for single-chip chips. My thesis here is that the future relates to parallel programming of multi-chip systems, and FPGAs have an excellent opportunity to get ahead of the game. Check out http://isfpga.org/ if you have performance issues, etc.

+3
May 05 '17 at 19:49
source share

What are you using? Who is your customer? Without even knowing the answers to these questions, I would not use FPGA if you do not build the system in real time and do not have electrical engineers in your team who know hardware description languages ​​such as VHDL and Verilog. There are many, and this requires a different mood than conventional programming.

+2
Oct 21 '09 at 20:26
source share

Others gave good answers, just wanted to add a different point of view. Here is my review of a document published in ACM Computing Surveys 2015 (see permalink here ) that compares GPUs with FPGAs and CPUs in terms of energy efficiency metrics. Most reports say: FPGAs are more energy efficient than GPUs, which in turn is more energy efficient than a processor. Since energy budgets are fixed (depending on cooling capabilities), the energy efficiency of FPGA means that you can do more calculations within the same power consumption with FPGA and, thus, get better performance with FPGA than with GPU. Of course, the limitations of FPGA are also taken into account, as mentioned by others.

+2
Jun 10 '15 at 19:16
source share

FPGAs fell out of favor in the HPC sector because they were terrible for the program. CUDA works because it works much better and still gives you good performance. I would go with the fact that the HPC community has passed and do it in CUDA. It is simpler, it is cheaper, it is more convenient to maintain.

+1
Feb 11 '13 at 22:20
source share

lately GTC'13 many HPC people have agreed that CUDA will remain here. FGPA is bulky, CUDA is becoming more mature, supporting Python / C / C ++ / ARM .. anyway, it was a dated question

+1
Mar 23 '13 at 2:15
source share

FPGAs will not be favored by those with a bias in software because they need to learn HDL or at least understand systemC.

For those who have FPGA hardware deviation, the first option will be considered.

In reality, a solid understanding of both is required, and then an objective decision can be made.

OpenCL is designed to work on both FPGA and GPU, even CUDA can be ported to FPGA.

FPGA and GPU Accelerators Can Be Used Together

So this is not the case that is better than one or the other. There is also a discussion about CUDA vs OpenCL

Again, if you have not optimized and compared the results with your specific application, you cannot know with 100% certainty.

Many will simply go with CUDA because of their commercial nature and resources. Others will work with openCL because of their versatility.

+1
Jul 20 '16 at 1:44
source share



All Articles