So, I save the spark RDD in the S3 bucket using the following code. Is there a way to compress (in gz format) and save instead of saving as a text file.
help_data.repartition(5).saveAsTextFile("s3://help-test/logs/help")
saveAsTextFile method takes an optional argument that defines the compression codec class:
saveAsTextFile
help_data.repartition(5).saveAsTextFile( path="s3://help-test/logs/help", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec" )