Intel FFT Performance

Which processor will work better, i5-2500K or i7-960 , relative to some FFT operations per second, for example: a complex FFT in place along the 16 KB buffer?

I ask about this because I would like to saturate all the cores and all the threads, and since i7 has 8 threads and i5 is only 4, my main problem is that SSE instructions can work in parallel on all 8 logical threads.

+4
source share
2 answers

This test is http://ixbtlabs.com/articles3/cpu/ci7-turbo-ht-p1.html?pages=ci7-turbo-ht-p1.html

shows that the gain from enabling HT on i7 was 0% for FFT. (Table of scientific applications, FFT line). FFT was from MATLAB (based on a library called FFTW).

The i7-960 has 4 cores and 8 threads from HyperThreading (HT). As ixbt has shown, HT will not help calculate more FFT, so I recommend you buy the new i5-2500 with the same four cores, but with a higher frequency, more turbocharging (dynamic acceleration) and newer technology.

In addition, this “i5” has the following microarchitecture (SNB - Sandy Bridge) and has AVX (twice as many FLOPS per GHz). If FFT can use it (use a modern library and a modern compiler), it should almost double the performance of FFT (unless we consider the memory limitations of bw). Intel says that in their new MKL there are 1.8 times from AVX: http://software.intel.com/en-us/articles/intel-avx-optimization-in-intel-mkl-v103/

AVX / NHM acceleration (with AVX support compared to Nehalem NHM) is 1.8x for CFDs with 1 radix-2 frames with N = 1024

So, the i5-2500 is 1.8 times better than on the AVX tick, it has a bit more GHz (from both spec and TurboBoost), and it supports faster memory (DDR3-1066 for NHM and DDR3-1333 for i5 SND).

+4
source

I would say no, one of the things related to i7 with 8 threads is that during context switches (which will happen more often due to logical cores) the FPU state is NOT SAVED, therefore this means that after resuming FPU operations it has to refill FPU structures so that it can complete the operation. From what I can tell, the i5-2500k will do it faster, since threads only compete for the core, and not for a higher competing speed, in order to use FPU (of which there are only 4).

PS: Maybe I'm wrong, because I'm not sure about the specifics of 960, but this is what I found from the work that I did in the past.

0
source

All Articles