I am trying to save a Dataframe in a persistent Hive table in Spark 1.3.0 (PySpark). This is my code:
sc = SparkContext(appName="HiveTest") hc = HiveContext(sc) peopleRDD = sc.parallelize(['{"name":"Yin","age":30}']) peopleDF = hc.jsonRDD(peopleRDD) peopleDF.printSchema() #root # |-- age: long (nullable = true) # |-- name: string (nullable = true) peopleDF.saveAsTable("peopleHive")
The hive output table is expected:
Column Data Type Comments age long from deserializer name string from deserializer
But the actual Hive output table of the above code:
Column Data Type Comments col array<string> from deserializer
Why isn't a Hive table the same layout as a DataFrame? How to achieve the expected result?
hive apache-spark apache-spark-sql
Mirko
source share