I am using below features. it uploads the file to s3. it downloads about 60 gb, gz files in 4-6 minutes.
ctx.hadoopConfiguration().set("mapred.textoutputformat.separator",
",");
counts.saveAsHadoopFile(s3outputpath, Text.class, Text.class,
TextOutputFormat.class);
Make sure you create more output files. more smaller files will make downloading faster.
API
saveAsHadoopFile [F <: org.apache.hadoop.mapred.OutputFormat [_,]] (: String, keyClass: [], valueClass: [], outputFormatClass: [F], : [<: org.apache.hadoop.io.compress.CompressionCodec]):
RDD Hadoop , .