How many partitions does Spark create when loading a file from an S3 bucket?

If the file is downloaded from HDFS by default, one section per block is created from the spark. But how does a spark solve partitions when a file is loaded from an S3 bucket?

+4
source share
2 answers

See code org.apache.hadoop.mapred.FileInputFormat.getSplits().

The block size depends on the implementation of the S3 file system (see FileStatus.getBlockSize()). For instance. S3AFileStatusjust set it equal 0(and then FileInputFormat.computeSplitSize()comes into play).

Also, you don't get partitions if your InputFormat is not shared :)

0
source

Spark S3, , HDFS S3 : . :

val inputRDD = sc.textFile("s3a://...")
println(inputRDD.partitions.length)

, .

0

All Articles