I tried to use the JSON file as a small DB. After creating the template table in the DataFrame, I requested it using SQL and got an exception. Here is my code:
val df = sqlCtx.read.json("/path/to/user.json") df.registerTempTable("user_tt") val info = sqlCtx.sql("SELECT name FROM user_tt") info.show()
df.printSchema() result:
root |-- _corrupt_record: string (nullable = true)
My JSON file:
{ "id": 1, "name": "Morty", "age": 21 }
Exeption:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'name' given input columns: [_corrupt_record];
How can i fix this?
UPD
_corrupt_record is
+--------------------+ | _corrupt_record| +--------------------+ | {| | "id": 1,| | "name": "Morty",| | "age": 21| | }| +--------------------+
UPD2
This is strange, but when I rewrite my JSON to make it oneliner, everything works fine.
{"id": 1, "name": "Morty", "age": 21}
So the problem is newline .
UPD3
I found the following sentence in the docs:
Please note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate standalone virtual JSON object. As a result, a regular multi-line JSON file most often fails.
It is not possible to save JSON in this format. Is there any workaround to get rid of the layered JSON structure or convert it to oneliner?
source share