Using a card for a matrix is ββreally inefficient, and the way you used it, it wonβt even support rare arrays especially well.
I suggest you use double [] [], where you block every row (or column, if that's better). If the matrix is ββsmall enough, you might be better off using only one processor, as this can save you a bit of overhead.
I would suggest you create more threads than you have kernels. For tasks with an intensive processor, using a larger thread may be slower rather than faster.
The matrix is ββ100k * 50 at maximum
EDIT: Depending on the operation being performed, I would try to make sure that you have a shorter size so that you can handle each long size in a different thread efficiently.
eg
double[][] matrix = new double[50][100*1000]; for(int i=0;i<matrix.length;i++) { final double[] line = matrix[i]; executorService.submit(new Runnable() { public void run() { synchronized(line) { processOneLine(line); } } }); }
This allows everyone that you perform to execute at the same time, since they do not share any data structures. They can also effectively access each double effect because they are continuous in memory and stored as efficiently as possible. that is, 100K doubles uses about 800 KB, but List<Double> uses 2800 KB, and each value can be randomly ordered in memory, which means that your cache should work a lot harder.
thanks, but actually I have only 80 cores
To use 80 cores efficiently, you may need to break longer strings into two or four so that you can keep all cores busy or find a way to perform more than one operation at a time.
source share