CUDA "cores" can be thought of as SIMD bands.
First, recall that the term "CUDA core" is the nVIDIA marketing language. They are not cores in the same way that a processor has cores. Similarly, “CUDA threads” do not match the threads we know about processors.
The equivalent of the CPU core in the GPU is a "symmetric multiprocessor" : it has its own scheduler / command manager, its own L1 cache, its own shared memory, etc. These are blocks of CUDA threads, not deformations, which are assigned to the GPU core, i.e. streaming multiprocessor. Inside SM, warps are chosen so that they are scheduled instructions for the entire warp. From a CUDA point of view, these are 32 separate threads that are blocked by instructions; but this is actually no different from saying that the warp is like a single thread that only executes 32-channel SIMD instructions. Of course, this is not a perfect analogy, but I feel it sounds beautiful. Something that you lack on the CPU SIMD lines is masking, which is actively performed by bands where inactive bands will not have the effect of setting the values of the active memory registers, writing to memory, etc.
I hope this makes an intuitive sense for you (or perhaps you yourself figured it out in the last 2 years).
einpoklum
source share