How to query and index JSON data nested at several levels in PostgreSQL 9.3+?

In PostgreSQL 9.3, I store some pretty complex JSON objects with arrays nested in arrays. This snippet is not real data, but illustrates the same concept:

{ "customerId" : "12345", "orders" : [{ "orderId" : "54321", "lineItems" : [{ "productId" : "abc", "qty" : 3 }, { "productId" : "def", "qty" : 1 }] } } 

I want SQL queries to be able to work with lineItem objects ... not only inside this separate JSON structure, but through all the JSON objects in this column of the table. For example, an SQL query that returns all the different productId , and their total qty . To prevent this request from being received all day, I would probably need an index in lineItem or its child fields.

Using https://stackoverflow.com/a/2124324/2124324 ... , I figured out how to write a query that works:

 SELECT line_item->>'productId' AS product_id, SUM(CAST(line_item->>'qty' AS INTEGER)) AS qty_sold FROM my_table, json_array_elements(my_table.my_json_column->'orders') AS order, json_array_elements(order->'lineItems') AS line_item GROUP BY product_id; 

However, this original StackOverflow question was about data that was only nested at one level , not two. I expanded the same concept (that is, the β€œside joint” in the FROM ) by adding an additional side joint to dive one level deeper. However, I'm not sure if this is the best approach, so the first part of my question is: what is the best approach for querying JSON data, an arbitrary number of levels deep in JSON objects

In the second part of this, by creating an index for such nested data, https://stackoverflow.com/a/2124324/ ... again processes data nested at only one level. However, I just got completely lost, and my head swims, trying to figure out how I would apply this to a deeper number of levels. Can someone suggest a clear approach to indexing data, which is at least two levels, as is the case with lineItems above?

+8
json sql postgresql
source share
1 answer

To deal with the problem of infinite recursion, you need to use a recursive CTE to work with each individual json element in each row of the table:

 WITH RECURSIVE raw_json as ( SELECT * FROM (VALUES (1, '{ "customerId": "12345", "orders": [ { "orderId": "54321", "lineItems": [ { "productId": "abc", "qty": 3 }, { "productId": "def", "qty": 1 } ] } ] }'::json), (2, '{ "customerId": "678910", "artibitraryLevel": { "orders": [ { "orderId": "55345", "lineItems": [ { "productId": "abc", "qty": 3 }, { "productId": "ghi", "qty": 10 } ] } ] } }'::json) ) a(id,sample_json) ), json_recursive as ( SELECT a.id, bk, bv, b.json_type, case when b.json_type = 'object' and not (bv->>'customerId') is null then bv->>'customerId' else a.customer_id end customer_id, --track any arbitrary id when iterating through json graph case when b.json_type = 'object' and not (bv->>'orderId') is null then bv->>'orderId' else a.order_id end order_id, case when b.json_type = 'object' and not (bv->>'productId') is null then bv->>'productId' else a.product_id end product_id FROM ( SELECT id, sample_json v, case left(sample_json::text,1) when '[' then 'array' when '{' then 'object' else 'scalar' end json_type, --because choice of json accessor function depends on this, and for some reason postgres has no built in function to get this value sample_json->>'customerId' customer_id, sample_json->>'orderId' order_id, sample_json->>'productId' product_id FROM raw_json ) a CROSS JOIN LATERAL ( SELECT bk, bv, case left(bv::text,1) when '[' then 'array' when '{' then 'object' else 'scalar' end json_type FROM json_each(case json_type when 'object' then av else null end ) b(k,v) --get key value pairs for individual elements if we are dealing with standard object UNION ALL SELECT null::text k, cv, case left(cv::text,1) when '[' then 'array' when '{' then 'object' else 'scalar' end json_type FROM json_array_elements(case json_type when 'array' then av else null end) c(v) --if we have an array, just get the elements and use parent key ) b UNION ALL --recursive term SELECT a.id, bk, bv, b.json_type, case when b.json_type = 'object' and not (bv->>'customerId') is null then bv->>'customerId' else a.customer_id end customer_id, case when b.json_type = 'object' and not (bv->>'orderId') is null then bv->>'orderId' else a.order_id end order_id, case when b.json_type = 'object' and not (bv->>'productId') is null then bv->>'productId' else a.product_id end product_id FROM json_recursive a CROSS JOIN LATERAL ( SELECT bk, bv, case left(bv::text,1) when '[' then 'array' when '{' then 'object' else 'scalar' end json_type FROM json_each(case json_type when 'object' then av else null end ) b(k,v) UNION ALL SELECT ak, cv, case left(cv::text,1) when '[' then 'array' when '{' then 'object' else 'scalar' end json_type FROM json_array_elements(case json_type when 'array' then av else null end) c(v) ) b ) 

Then you can either summarize "qty" with an arbitrary id ...

 SELECT customer_id, sum(v::text::integer) FROM json_recursive WHERE k = 'qty' GROUP BY customer_id 

Or you can get the "lineItem" objects and manage them as you wish:

 SELECT * FROM json_recursive WHERE k = 'lineItems' and json_type = 'object' 

Regarding indexing, you can adapt a recursive query to a function that returns unique keys for each json object in each row of the source table, and then to create a functional index in your json column:

 SELECT array_agg(DISTINCT k) FROM json_recursive WHERE not k is null 
+2
source share

All Articles