sc.textFilereading does not start. It simply defines the structure of resident data that can be used for further processing.
Only when an action is invoked in RDD will Spark create a strategy to perform all necessary transformations (including reading), and then return the result.
, , , , Spark ( ), , .
(defaultMinPartitions), , , java- (InputSplit HDFS) , , textFile. , ( ). , , :
sc.textFile(file, numPartitions)
.count()
, : reduceByKey