Scala DataFrame: Explode Array

I use spark libraries in Scala. I created a DataFrame using

val searchArr = Array( StructField("log",IntegerType,true), StructField("user", StructType(Array( StructField("date",StringType,true), StructField("ua",StringType,true), StructField("ui",LongType,true))),true), StructField("what",StructType(Array( StructField("q1",ArrayType(IntegerType, true),true), StructField("q2",ArrayType(IntegerType, true),true), StructField("sid",StringType,true), StructField("url",StringType,true))),true), StructField("where",StructType(Array( StructField("o1",IntegerType,true), StructField("o2",IntegerType,true))),true) ) val searchSt = new StructType(searchArr) val searchData = sqlContext.jsonFile(searchPath, searchSt) 

Now I have to blow up the what.q1 field, which should contain an array of integers, but the documentation is limited: http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame. html # explode (java.lang.String,% 20java.lang.String,% 20scala.Function1,% 20scala.reflect.api.TypeTags.TypeTag)

So far I have tried several things without much success.

 val searchSplit = searchData.explode("q1", "rb")(q1 => q1.getList[Int](0).toArray()) 

Any ideas / examples of using an explosion in an array?

+5
source share
1 answer

Have you tried using UDF in the what field? Something like this might be useful:

 val explode = udf { (aStr: GenericRowWithSchema) => aStr match { case null => "" case _ => aStr.getList(0).get(0).toString() } } val newDF = df.withColumn("newColumn", explode(col("what"))) 

Where:

  • getList (0) returns the q1 field
  • get (0) returns the first element of "q1"

I'm not sure, but you can try using getAs [T] (fieldName: String) instead of getList (index: Int) .

0
source

All Articles