Say I have a Spark DF that I want to save to a CSV file on disk. In Spark 2.0.0+, you can convert a DataFrame(DataSet[Rows]) as a DataFrameWriter and use the .csv method to write the file.
Function is defined as
def csv(path: String): Unit path : the location/folder name and not the file name.
Spark stores the csv file at the location specified when creating the CSV files named - part - *. csv.
Is there a way to save the CSV with the specified file name instead of the - * part. csv? Or can you specify a prefix instead of part-r?
The code:
df.coalesce(1).write.csv("sample_path")
Current Output:
sample_path | +-- part-r-00000.csv
Desired Result:
sample_path | +-- my_file.csv
Note. The coalesce function is used to output a single file, and the artist has enough memory to collect DF without a memory error.
scala csv apache-spark pyspark
Spandan brahmbhatt
source share