How to rename column names in spark SQL

I have a dataframe with custom column names like

Journey channelA channelB channelC j1 1 0 0 j1 0 1 0 j1 1 0 0 j2 0 0 1 j2 0 1 0 

By configurable, I mean that there can be "n" channels in a data frame.

Now I need a transformation in which I need to find the sum of all the channels, something like

 df.groupBy("Journey").agg(sum("channelA"), sum("channelB"), sum("channelC")) 

Output Result:

 Journey sum(channelA) sum(channelB) sum(channelC) j1 2 1 0 j2 0 1 1 

Now I want to rename the column names to the original names, and I could do this with

 .withColumnRenamed("sum(channelA)", channelA) 

but, as I mentioned, the channel list is customizable, and I would like the general column rename operator to rename all my summarized columns to the names of the original columns in order to get the expected data frame as:

 Journey channelA channelB channelC j1 2 1 0 j2 0 1 1 

Any suggestions for approaching this

+6
source share
2 answers

To rename the DataFrame columns dynamically, you can use the toDF method (scala.collection.Seq colNames) , while you can dynamically fill colNames with the original column names.

So you can fill in a dynamic sequence like this:

 val columnsRenamed = Seq("Journey", "channelA", "channelB","channelC") 

and then call the toDF method:

 df = df.toDF(columnsRenamed: _*) 

The reason for the statement : _* is to pass the form Seq[String] to String* .

+15
source

It could also be renamed as follows. Let's say the input df has the form inputDf: DataFrame with columns _1, _2.

 val newDf = inputDf.selectExpr("_1 as x1", "_2 as X2") * as -> maps to alias 

Other detailed answers can be found here: Renaming column names of data frame in spark scala

0
source

All Articles