How will pg_column_size be smaller than octet_length?

Question

How will pg_column_size be smaller than octet_length?

I am looking to get the expected size of a table by specifying the column type and length size. I am trying to use pg_column_size for this.

When testing the function, I realized that something seems to be wrong with this function.

The result value from pg_column_size(...) sometimes even less than the return value from octet_length(...) on the same line.

The column has nothing but numeric characters.

 postgres=# \d+ t5 Table "public.t5" Column | Type | Modifiers | Storage | Stats target | Description --------+-------------------+-----------+----------+--------------+------------- c1 | character varying | | extended | | Has OIDs: no postgres=# select pg_column_size(c1), octet_length(c1) as octet from t5; pg_column_size | octet ----------------+------- 2 | 1 704 | 700 101 | 7000 903 | 77000 (4 rows)

Is this a mistake or something else? Is there anyone with some formula for calculating the expected size of a table from column types and length values?

+7

postgresql

Kim Nov 09 '12 at 8:31

source share

1 answer

Craig Ringer · Accepted Answer · 2012-11-09T08:37:36+0000

I would say that pg_column_size reports the compressed size of the TOAST ed values, and octet_length reports the uncompressed sizes. I did not check this by checking the source of the function or definition, but it makes sense, especially since the strings of numbers will compress quite well. You use the EXTENDED repository so that the values match the TOAST compression. See the TOAST documentation .

As for the calculation of the expected size of the database, this is a new question. As you can see from the following demo, it depends on how your lines are compressed.

Here's a demo showing how octet_length can be greater than pg_column_size , demonstrating where TOAST occurs. First, let's get the results at the output of the query, where no TOAST comes into play:

 regress=> SELECT octet_length(repeat('1234567890',(2^n)::integer)), pg_column_size(repeat('1234567890',(2^n)::integer)) FROM generate_series(0,12) n; octet_length | pg_column_size --------------+---------------- 10 | 14 20 | 24 40 | 44 80 | 84 160 | 164 320 | 324 640 | 644 1280 | 1284 2560 | 2564 5120 | 5124 10240 | 10244 20480 | 20484 40960 | 40964 (13 rows)

Now let's save the same query output to a table and get the size of the saved rows:

 regress=> CREATE TABLE blah AS SELECT repeat('1234567890',(2^n)::integer) AS data FROM generate_series(0,12) n; SELECT 13 regress=> SELECT octet_length(data), pg_column_size(data) FROM blah; octet_length | pg_column_size --------------+---------------- 10 | 11 20 | 21 40 | 41 80 | 81 160 | 164 320 | 324 640 | 644 1280 | 1284 2560 | 51 5120 | 79 10240 | 138 20480 | 254 40960 | 488 (13 rows)

How will pg_column_size be smaller than octet_length?

More articles: