How to assemble a single column in Spark?

I would like to perform an action on a single column. Unfortunately, after I converted this column, it is no longer part of the data frame from which it was derived, but became a Column object. As such, it cannot be assembled.

Here is an example:

df = sqlContext.createDataFrame([Row(array=[1,2,3])]) df['array'].collect() 

This results in the following error:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'Column' object is not callable 

How can I use the collect() function for a single column?

+11
dataframe apache-spark pyspark apache-spark-sql spark-dataframe
source share
1 answer

Spark> = 2.0

Starting with Spark 2.0.0, you need to explicitly specify .rdd to use flatMap

 df.select("array").rdd.flatMap(lambda x: x).collect() 

Spark & ​​lt; 2.0

Just select and flatMap :

 df.select("array").flatMap(lambda x: x).collect() ## [[1, 2, 3]] 
+17
source share

All Articles