Just to rely on another poster, there is a good answer, some high-level discussions about what problems GPUs have, and why.
GPUs have evolved differently compared to processors due to their different origins. Compared to cores, GPU cores contain more ALUs and FP hardware, as well as less control logic and cache. This means that GPUs can provide greater efficiency for direct computing, but only code with a regular control flow and smart memory access patterns will have the best advantage: up to TFLOPS for SP FP code. GPUs are designed for high-performance devices with high latency at the control and memory levels. The globally available memory has a long wide bus, so that soklokalnye (continuous and consistent) memory access provides good bandwidth, despite the long delay. The delays are hidden, requiring a massive stream of parallelism and providing essentially zero context switching to hardware. GPUs use a SIMD-like SIMT model, in which groups of cores are executed in SIMD lock (different groups can diverge freely), without forcing the programmer to reckon with this fact (except to achieve the best performance: on Fermi this can make a difference of up to 32x). SIMT lends itself to a parallel data programming model, whereby data independence is used to perform similar processing on a large data array. Attempts are being made to generalize the GPUs and their programming model, as well as to facilitate programming for good performance.
source share