In general, when it comes to modern GPUs, a larger data collection system is better. You do not say which API you are using (OpenGL, CUDA, etc.), but you can think of it this way:
chunk_time = overhead_time + (num_of_elements/num_of_chunks) * per_element_time total_time = chunk_time * num_of_chunks
You will get the overhead for both memory transfer and code execution for each piece of data that you send. You may have other restrictions depending on how huge your data is: for example, the maximum size of texture binding in OpenGL depends on the implementation. I think 1k on 1k should be a safe minimum with equipment for the last 5 years or so, but the latest cards can handle 8k on 8k.
source share