Difference between map and udf

When I work with DataFrames in Spark, I sometimes have to edit only the values โ€‹โ€‹of a specific column in this DataFrame. E.g. if I have a count field in my data framework, and if I would like to add 1 to each count value, then I could either write a custom udf to do the job using the withColumn DataFrames function, or I could make a map in a DataFrame, and then extract another DataFrame from the resulting RDD.

I would like to know how udf works under the hood. Give me a comparison in using map / udf in this case. What is the difference in performance?

Thanks!

+5
source share
1 answer

Just map more flexible than udf . With map there is no limit to the number of columns that you can manipulate inside a row. Suppose you want to get a value for 5 columns of data and delete 3 columns. You will need to do withColumn / udf 5 times, then a select . Using the 1 map function you can do all this.

+2
source

All Articles