SQL: explode an array

I have a table containing JSON objects. Each JSON object contains an array in square brackets, separated by commas.

How can I access any element of an array of square brackets, for example, "Matt", using SQL?

{"str": [ 1, 134, 61, "Matt", {"action.type":"registered","application":491,"value":423,"value2":12344}, ["application"], [], "49:0" ] } 

I use "Hive" on Hadoop. If you know how to do this in SQL, that's fine :)

+4
source share
2 answers

You can do it in Hive as follows:

First you need JSON SerDe (Serializer / Deserializer). The most functional I've seen is https://github.com/rcongiu/Hive-JSON-Serde/ . SerDe from Peter Sankauskas cannot cope with JSON this complex. Starting with this post you will need to compile SerDe with Maven and put a JAR where your Hive session can achieve this.

Next, you will need to change the JSON format. The reason is that Hive uses a strongly typed view of arrays, so mixing integers and other things will not work. Consider the transition to the structure as follows:

 {"str": { n1 : 1, n2 : 134, n3 : 61, s1: "Matt", st1: {"type":"registered","app":491,"value":423,"value2":12344}, ar1: ["application"], ar2: [], s2: "49:0" } } 

Then you will need to put JSON on one line. I'm not sure if this is a quirk of Hive or SerDe, but you need to do this.

Then copy the data to HDFS.

Now you are ready to define the table and query:

 ADD JAR /path/to/jar/json-serde-1.1.2-jar-with-dependencies.jar; CREATE EXTERNAL TABLE json ( str struct< n1 : int, n2 : int, n3 : int, s1 : string, st1 : struct < type : string, app : int, value : int, value2 : int>, ar1 : array<string>, ar2 : array<string>, s2 : string > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/hdfs/path/to/file'; 

With this, you can run interesting nested queries, for example:

 select str.st1.type from json; 

Last but not least, as it is so specific to Hive, it would be helpful to update the question and tags.

+5
source

You cannot, unless you are using something very specific to your database engine, and you have not indicated which database engine you are using.

The reason you cannot is because SQL / RDBMS is not intended for this type of storage. I recommend either using the correct normalization or a NoSQL solution depending on your needs.

+1
source

All Articles