SparkSQL - access to nested Row structures (field1, field2 = Row (..))

I need help with nested structure in SparkSQL using sql method. I created a data frame on top of an existing RDD (dataRDD) with a structure like this:

schema=StructType([ StructField("m",LongType()) ,
                  StructField("field2", StructType([
                     StructField("st",StringType()),
                     StructField("end",StringType()),
                     StructField("dr",IntegerType()) ]) )
                  ])

printSchema () returns this:

root
 |-- m: long (nullable = true)
 |-- field2: struct (nullable = true)
 |    |-- st: string (nullable = true)
 |    |-- end: string (nullable = true)
 |    |-- dr: integer (nullable = true)

Creating a data frame from RDD data and applying the circuit works well.

df= sqlContext.createDataFrame( dataRDD, schema )
df.registerTempTable( "logs" )

But getting the data does not work:

res = sqlContext.sql("SELECT m, field2.st FROM logs") # <- This fails 

...org.apache.spark.sql.AnalysisException: cannot resolve 'field.st' given input columns msisdn, field2;

res = sqlContext.sql("SELECT m, field2[0] FROM logs") # <- Also fails
...org.apache.spark.sql.AnalysisException: unresolved operator 'Project [field2#1[0] AS c0#2];

res = sqlContext.sql("SELECT m, st FROM logs") # <- Also not working
...cannot resolve 'st' given input columns m, field2;

So how can I access the nested structure in SQL syntax? Thanks

+4
source share
1 answer

You had something else in your testing because the field2.stcorrect syntax is:

case class field2(st: String, end: String, dr: Int)

val schema = StructType(
  Array(
    StructField("m",LongType),
    StructField("field2", StructType(Array(
      StructField("st",StringType),
      StructField("end",StringType),
      StructField("dr",IntegerType) 
    )))
  )
)

val df2 = sqlContext.createDataFrame(
  sc.parallelize(Array(Row(1,field2("this","is",1234)),Row(2,field2("a","test",5678)))),
  schema
)

/* df2.printSchema
root
 |-- m: long (nullable = true)
 |-- field2: struct (nullable = true)
 |    |-- st: string (nullable = true)
 |    |-- end: string (nullable = true)
 |    |-- dr: integer (nullable = true)
*/

val results = sqlContext.sql("select m,field2.st from df2")

/* results.show
m st
1 this
2 a
*/

Look at your error message: cannot resolve 'field.st' given input columns msisdn, field2- fieldvs. field2. Check your code again - names do not line up.

+3

All Articles