TL; DR
Using PSQL 9.4, there is a way to get multiple values from a jsonb field, for example, with an imaginary function:
jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])
With the hope of accelerating the almost linear time needed to select multiple values (1 value = 300 ms, 2 values = 450 ms, 3 values = 600 ms)
Background
I have the following jsonb table:
CREATE TABLE "public"."analysis" ( "date" date NOT NULL, "name" character varying (10) NOT NULL, "country" character (3) NOT NULL, "x" jsonb, PRIMARY KEY(date,name) );
With approximately 100,000 lines, where each line has a jsonb dictionary with 90+ keys and corresponding values. I am trying to write an SQL query to select several (<10) keys + values quite quickly (<500 ms)
Index and query: 190 ms
I started by adding an index:
CREATE INDEX ON analysis USING GIN (x);
This makes the query based on the values in the x dictionary fast, for example:
SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x
It will take ~ 190 ms (acceptable for us)
Retrieving Dictionary Values
However, as soon as I start adding keys to return to the SELECT part, the execution time increases almost linearly:
1 value: 300 ms
select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x
Takes 366 ms (+ 175 ms)
select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;
Accepts 300 ms (+ 110 ms)
3 values: 600 ms
select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x
Takes 600 ms (+410 or +100 for each selected value)
select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;
Takes 600 ms (+410 or +100 for each selected value)
Getting More Values
Is there a way to get multiple values from a jsonb field, for example with an imaginary function:
jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])
This can speed up the search. It can return them either in the form of columns, or as a list / array or even a json object.
Getting an array using PL / Python
Just for this, I made a custom function using PL / Python, but it was much slower (5s +), possibly due to json.loads:
CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[]) RETURNS TEXT[] AS $$ if not data: return [] import simplejson as json j = json.loads(data) l = [] for i in k: l.append(j[i]) return l $$ LANGUAGE plpython2u;
Update 2015-05-21
I re-executed the table using hstore with a GIN index, and the performance is almost identical to using jsonb, that is, it will not be useful in my case.