1) There is no direct support in the saveAsTextFile method for managing the file output name. You can try using saveAsHadoopDataset to control the base name of the output file.
For example: instead of part-00000 you can get yourCustomName-00000.
Keep in mind that you cannot control the 00000 suffix with this method. This spark is automatically assigned to each section during recording so that each section writes to a unique file.
To control this, as mentioned in the comments above, you should write your own custom OutputFormat.
SparkConf conf=new SparkConf(); conf.setMaster("local").setAppName("yello"); JavaSparkContext sc=new JavaSparkContext(conf); JobConf jobConf=new JobConf(); jobConf.set("mapreduce.output.basename", "customName"); jobConf.set("mapred.output.dir", "outputPath"); JavaRDD<String> input = sc.textFile("inputDir"); input.saveAsHadoopDataset(jobConf);
2) A workaround would be to record the output, as well as the output location, and use Hadoop FileUtil.copyMerge to form the merged file.
sujit source share