I have a dataframe with a MapType column where the key is id and the value is another StructType with two numbers, a counter and an income.
It looks like this:
+--------------------------------------+
| myMapColumn |
+--------------------------------------+
| Map(1 -> [1, 4.0], 2 -> [1, 1.5]) |
| Map() |
| Map(1 -> [3, 5.5]) |
| Map(1 -> [4, 0.1], 2 -> [6, 101.56]) |
+--------------------------------------+
Now I need to sum these two values for the identifier, and the result will be:
+----------------------+
| id | count | revenue |
+----------------------+
| 1 | 8 | 9.6 |
| 2 | 7 | 103.06 |
+----------------------+
I really don’t know how to do this, and could not find the documentation for this special case. I tried to use Dataframe.groupBy but could not get it to work :(
Any ideas?
I am using Spark 1.5.2 with Python 2.6.6
source
share