Changing the output file name in Spark Streaming

Question

Changing the output file name in Spark Streaming

I am launching a Spark job that works very well as far as the logic goes. However, the name of my output files is in the format part-00000, part-00001, etc. When I use saveAsTextFile to save the files in the s3 bucket. Is there a way to change the name of the output file?

Thank.

+4

hadoop apache-spark spark-dataframe spark-streaming

Bharath Jun 22 '16 at 15:27

source share

2 answers

[ Java]

, :

JavaRDD<Text> rows;

customPrefix-r-00000.

Configuration hadoopConf = new Configuration();
hadoopConf.set("mapreduce.output.basename", "customPrefix");

rows.mapToPair(row -> new Tuple2(null, row)).saveAsNewAPIHadoopFile(outputPath, NullWritable.class, Text.class, TextOutputFormat.class, hadoopConf);

!!

0

chandan kharbanda 02 . '17 17:22

RojoSam · Accepted Answer · 2016-06-22T16:43:33+0000

In Spark, you can use saveAsNewAPIHadoopFile and set the mapreduce.output.basename parameter in hadoop configuration to change the prefix (only the "part" prefix)

val hadoopConf = new Configuration()
hadoopConf.set("mapreduce.output.basename", "yourPrefix")

yourRDD.map(str => (null, str))
        .saveAsNewAPIHadoopFile(s"$outputPath/$dirName", classOf[NullWritable], classOf[String],
          classOf[TextOutputFormat[NullWritable, String]], hadoopConf)

: yourPrefix-r-00001

hadoop Spark , (hadoop) (). , .

, TextOutputFormat FileOutputFormat getUniqueFile. p >

Changing the output file name in Spark Streaming

More articles: