I successfully created row_number() partitionBy in Spark using Window, but would like to sort it in descending order, and not by default. Here is my working code:
from pyspark import HiveContext from pyspark.sql.types import * from pyspark.sql import Row, functions as F from pyspark.sql.window import Window data_cooccur.select("driver", "also_item", "unit_count", F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count")).alias("rowNum")).show()
This gives me this result:
+------+---------+----------+------+ |driver|also_item|unit_count|rowNum| +------+---------+----------+------+ | s10| s11| 1| 1| | s10| s13| 1| 2| | s10| s17| 1| 3|
And here I add desc () to sort in descending order:
data_cooccur.select("driver", "also_item", "unit_count", F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count").desc()).alias("rowNum")).show()
And get this error:
AttributeError: WindowSpec object does not have desc attribute
What am I doing wrong here?
source share