I would say that pg_column_size reports the compressed size of the TOAST ed values, and octet_length reports the uncompressed sizes. I did not check this by checking the source of the function or definition, but it makes sense, especially since the strings of numbers will compress quite well. You use the EXTENDED repository so that the values ββmatch the TOAST compression. See the TOAST documentation .
As for the calculation of the expected size of the database, this is a new question. As you can see from the following demo, it depends on how your lines are compressed.
Here's a demo showing how octet_length can be greater than pg_column_size , demonstrating where TOAST occurs. First, let's get the results at the output of the query, where no TOAST comes into play:
regress=> SELECT octet_length(repeat('1234567890',(2^n)::integer)), pg_column_size(repeat('1234567890',(2^n)::integer)) FROM generate_series(0,12) n; octet_length | pg_column_size --------------+---------------- 10 | 14 20 | 24 40 | 44 80 | 84 160 | 164 320 | 324 640 | 644 1280 | 1284 2560 | 2564 5120 | 5124 10240 | 10244 20480 | 20484 40960 | 40964 (13 rows)
Now let's save the same query output to a table and get the size of the saved rows:
regress=> CREATE TABLE blah AS SELECT repeat('1234567890',(2^n)::integer) AS data FROM generate_series(0,12) n; SELECT 13 regress=> SELECT octet_length(data), pg_column_size(data) FROM blah; octet_length | pg_column_size --------------+---------------- 10 | 11 20 | 21 40 | 41 80 | 81 160 | 164 320 | 324 640 | 644 1280 | 1284 2560 | 51 5120 | 79 10240 | 138 20480 | 254 40960 | 488 (13 rows)
Craig Ringer
source share