Is access to multi-threaded memory faster than access to single-threaded memory?
Suppose we are in C. A simple example is as follows. If I have a giant array A , and I want to copy A to array B with the same size as A Does multithreading use to copy a copy of memory faster than with a single thread? How many threads are suitable for this type of memory?
EDIT: Let me put the question narrower. First of all, we are not considering the GPU case. Optimizing memory access is very important and effective when programming on the GPU. In my experience, we should always be careful in memory operations. On the other hand, this is not always the case when we work on a processor. Also, ignore SIMD instructions such as avx and sse. It will also cause memory performance problems when the program has too many memory access operations, and not many computational operations. Suppose we work with x86 architecture with 1-2 processors. Each processor has several cores and a four-channel memory interface. The main memory is DDR4, as is customary today.
My array is an array of double-precision floating-point numbers with a size similar to the size of the processor's L3 cache, which is approximately 50 MB. Now I have two cases: 1) copy this array to another array with the same size, using an elemental copy or using memcpy. 2) combine many small arrays into this giant array. Both real-time operations mean they must be completed as quickly as possible. Does multithreading give acceleration or a drop down menu? What factor in this case affects the performance of memory operations?
Someone said that it will mainly depend on DMA performance. I think when we do memcpy. What if we make a basic copy, will skip the processor cache first?
c multithreading memory
user3677630
source share