You can use rdd.repartition(#partitions) after downloading the file. This is related to the cost of shuffling, so you need to evaluate if the increase in parallelization performance is distributed randomly for this initial cost.
Another way is to perform any transformations (map, filter, ...) on the initial section and use any shuffling stage already present in your pipeline to redistribute RDD. eg.
rdd.map().filter().flatMap().sortBy(f, numPartitions=new
maasg source share