The parallelSort function will use a thread for each processor core that you have on your computer. In particular, parallelSort runs tasks in the ForkJoin shared thread pool. If you have only one core, you will not see an improvement over single-threaded sorting.
If you have only a few cores, you will have some initial cost associated with creating new threads, which will mean that for relatively small arrays you will not see a linear increase in performance.
The comparison function for comparing doubles is not an expensive function. I think that in this case, 1,000,000 elements can be safely considered small, and the benefits of using multiple threads are outweighed by the initial costs of creating these threads. Since the initial costs will be fixed, you should see a performance gain when using large arrays.
source share