Does JSONB make PostgreSQL arrays worthless?

Suppose you want to store "tags" on your object (say, a message). With release 9.4, you have 3 main options:

  • tags as text []
  • tags like jsonb
  • as text (and you save the JSON string as text)

In many cases, the third question is impossible, since it would not allow the request to conditionally use the value "tags". In my current development, I do not need such requests, the tags are only displayed there in the message list, and not for filtering messages.

So, the choice is mostly between text[] and jsonb . Both can be requested.
What would you use? And why?

+5
source share
2 answers

In most cases, I would use a normalized scheme with an option_tag table that implements a many-to-many relationship between the option and tag tables. Reference implementation here:

This may not be the fastest option in all respects, but it offers a full range of database functionality, including referential integrity, restrictions, a complete set of data types, all index parameters, and cheap updates.

For completeness, add to the list of options:

  • hstore (good option)
  • xml more verbose and more complex than hstore or jsonb , so I would use it only when working with XML.
  • "comma separated string" (very simple, mostly bad option)
  • EAV (Entity-Attribute-Value) or โ€œname-value pairsโ€ (mostly a bad option)
    Details on this related question on dba.SE:

If the list is for display purposes only and rarely updated, I would consider a simple array that is usually smaller and works better for this than the rest.

Read Josh Berkus's @a_horse blog post related to his comment. But keep in mind that it focuses on selected reading cases. Josh admits:

I understand that I have not tested the comparative write speeds.

And that where the normalized approach wins big, especially when you change separate tags under parallel loading.

jsonb is only a good option if you are going to work with JSON anyway and can store and retrieve JSON "as is".

+4
source

I used both a normalized scheme and just a regular text field with CSV-separated values โ€‹โ€‹instead of custom data types (instead of CSV you can use JSON or any other encoding such as www-urlencoding or even XML attribute encoding). This is due to the fact that many ORM libraries and databases do not support user data types very well (hstore, jsonb, array, etc.).

@ErwinBrandstetter missed a couple of other normalized advantages, because itโ€™s much faster to request all possible previously used tags in a normalized scheme than an array parameter. This is a very common scenario in many tag systems.

Considering that I recommend using Solr (or elasticsearch) to request tags, since it deals with the number of tags and the general tag prefix, it is much better than what Postgres could get if you want to deal with aspects of synchronization synchronization with search engine. Thus, tag storage becomes less important.

+1
source

Source: https://habr.com/ru/post/1214944/