How to save RDD spark in gzip format via pyspark

So, I save the spark RDD in the S3 bucket using the following code. Is there a way to compress (in gz format) and save instead of saving as a text file.

help_data.repartition(5).saveAsTextFile("s3://help-test/logs/help") 
+6
source share
1 answer

saveAsTextFile method takes an optional argument that defines the compression codec class:

 help_data.repartition(5).saveAsTextFile( path="s3://help-test/logs/help", compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec" ) 
+10
source

All Articles