No, because it is highly dependent on the application, resources and data. There are some hard limits ( like various 2 GB limits ), but otherwise you need to configure the task for the task. Some factors to consider:
- single row / element size
- the cost of a typical operation. If small sections and operations are cheap, then planning costs can be much higher than the cost of processing data.
- the cost of processing a partition when performing operations with partitions (for example, for example).
, CombineFileInputFormat , /. :
sc.hadoopFile(
path,
classOf[CombineTextInputFormat],
classOf[LongWritable], classOf[Text]
).map(_._2.toString)