I have a dataframe with custom column names like
Journey channelA channelB channelC j1 1 0 0 j1 0 1 0 j1 1 0 0 j2 0 0 1 j2 0 1 0
By configurable, I mean that there can be "n" channels in a data frame.
Now I need a transformation in which I need to find the sum of all the channels, something like
df.groupBy("Journey").agg(sum("channelA"), sum("channelB"), sum("channelC"))
Output Result:
Journey sum(channelA) sum(channelB) sum(channelC) j1 2 1 0 j2 0 1 1
Now I want to rename the column names to the original names, and I could do this with
.withColumnRenamed("sum(channelA)", channelA)
but, as I mentioned, the channel list is customizable, and I would like the general column rename operator to rename all my summarized columns to the names of the original columns in order to get the expected data frame as:
Journey channelA channelB channelC j1 2 1 0 j2 0 1 1
Any suggestions for approaching this
source share