Spark SQL Row_number () PartitionBy Sort Desc

Question

Spark SQL Row_number () PartitionBy Sort Desc

I successfully created row_number() partitionBy in Spark using Window, but would like to sort it in descending order, and not by default. Here is my working code:

 from pyspark import HiveContext from pyspark.sql.types import * from pyspark.sql import Row, functions as F from pyspark.sql.window import Window data_cooccur.select("driver", "also_item", "unit_count", F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count")).alias("rowNum")).show()

This gives me this result:

  +------+---------+----------+------+ |driver|also_item|unit_count|rowNum| +------+---------+----------+------+ | s10| s11| 1| 1| | s10| s13| 1| 2| | s10| s17| 1| 3|

And here I add desc () to sort in descending order:

 data_cooccur.select("driver", "also_item", "unit_count", F.rowNumber().over(Window.partitionBy("driver").orderBy("unit_count").desc()).alias("rowNum")).show()

And get this error:

AttributeError: WindowSpec object does not have desc attribute

What am I doing wrong here?

+19

python apache-spark pyspark window-functions apache-spark-sql

jKraut Feb 06 '16 at 22:17

source share

3 answers

Or you can use SQL code in Spark-SQL:

 from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .master('local[*]')\ .appName('Test')\ .getOrCreate() spark.sql(""" select driver ,also_item ,unit_count ,ROW_NUMBER() OVER (PARTITION BY driver ORDER BY unit_count DESC) AS rowNum from data_cooccur """).show()

+1

kennyut May 15, '19 at 18:31

source share

Update Actually, I tried to figure this out, and this does not seem to work. (actually it gives an error). The reason this didn't work is because I had this code when calling display() in the Databricks (the code after calling display() never ran). It looks like orderBy() in the data frame and orderBy() in the window not really the same. I will keep this answer only for negative confirmation

~~Starting with PySpark 2.4 (and possibly earlier), just adding the ascending=False keyword to the orderBy call works for me.~~

Ex.

personal_recos.withColumn("row_number", F.row_number().over(Window.partitionBy("COLLECTOR_NUMBER").orderBy("count", ascending=False)))

and

personal_recos.withColumn("row_number", F.row_number().over(Window.partitionBy("COLLECTOR_NUMBER").orderBy(F.col("count").desc())))

seem to give me the same behavior.

0

information_interchange Aug 2 '19 at 14:00

source share

zero323 · Accepted Answer · 2016-02-07T03:35:48+0000

desc should apply to the column, not to the window definition. You can use either the method in the column:

 from pyspark.sql.functions import col F.rowNumber().over(Window.partitionBy("driver").orderBy(col("unit_count").desc())

or autonomous function:

 from pyspark.sql.functions import desc F.rowNumber().over(Window.partitionBy("driver").orderBy(desc("unit_count"))

Spark SQL Row_number () PartitionBy Sort Desc

More articles: