I need help with nested structure in SparkSQL using sql method. I created a data frame on top of an existing RDD (dataRDD) with a structure like this:
schema=StructType([ StructField("m",LongType()) ,
StructField("field2", StructType([
StructField("st",StringType()),
StructField("end",StringType()),
StructField("dr",IntegerType()) ]) )
])
printSchema () returns this:
root
|-- m: long (nullable = true)
|-- field2: struct (nullable = true)
| |-- st: string (nullable = true)
| |-- end: string (nullable = true)
| |-- dr: integer (nullable = true)
Creating a data frame from RDD data and applying the circuit works well.
df= sqlContext.createDataFrame( dataRDD, schema )
df.registerTempTable( "logs" )
But getting the data does not work:
res = sqlContext.sql("SELECT m, field2.st FROM logs")
...org.apache.spark.sql.AnalysisException: cannot resolve 'field.st' given input columns msisdn, field2;
res = sqlContext.sql("SELECT m, field2[0] FROM logs")
...org.apache.spark.sql.AnalysisException: unresolved operator 'Project [field2#1[0] AS c0#2];
res = sqlContext.sql("SELECT m, st FROM logs")
...cannot resolve 'st' given input columns m, field2;
So how can I access the nested structure in SQL syntax? Thanks
source
share