Difference between map and udf

Question

Difference between map and udf

When I work with DataFrames in Spark, I sometimes have to edit only the values of a specific column in this DataFrame. E.g. if I have a count field in my data framework, and if I would like to add 1 to each count value, then I could either write a custom udf to do the job using the withColumn DataFrames function, or I could make a map in a DataFrame, and then extract another DataFrame from the resulting RDD.

I would like to know how udf works under the hood. Give me a comparison in using map / udf in this case. What is the difference in performance?

Thanks!

+5

scala apache-spark udf

void Aug 19 '16 at 12:28

source share

1 answer

David · Answer 1 · 2016-08-19T13:45:38+0000

Just map more flexible than udf . With map there is no limit to the number of columns that you can manipulate inside a row. Suppose you want to get a value for 5 columns of data and delete 3 columns. You will need to do withColumn / udf 5 times, then a select . Using the 1 map function you can do all this.

Difference between map and udf

More articles: