Changing the output file name in Spark Streaming

I am launching a Spark job that works very well as far as the logic goes. However, the name of my output files is in the format part-00000, part-00001, etc. When I use saveAsTextFile to save the files in the s3 bucket. Is there a way to change the name of the output file?

Thank.

+4
source share
2 answers

In Spark, you can use saveAsNewAPIHadoopFile and set the mapreduce.output.basename parameter in hadoop configuration to change the prefix (only the "part" prefix)

val hadoopConf = new Configuration()
hadoopConf.set("mapreduce.output.basename", "yourPrefix")

yourRDD.map(str => (null, str))
        .saveAsNewAPIHadoopFile(s"$outputPath/$dirName", classOf[NullWritable], classOf[String],
          classOf[TextOutputFormat[NullWritable, String]], hadoopConf)

: yourPrefix-r-00001

hadoop Spark , (hadoop) (). , .

, TextOutputFormat FileOutputFormat getUniqueFile. p >

+4

[ Java]

, :

JavaRDD<Text> rows;

customPrefix-r-00000.

Configuration hadoopConf = new Configuration();
hadoopConf.set("mapreduce.output.basename", "customPrefix");

rows.mapToPair(row -> new Tuple2(null, row)).saveAsNewAPIHadoopFile(outputPath, NullWritable.class, Text.class, TextOutputFormat.class, hadoopConf);

!!

0

All Articles