PostgresQL Automation VACUUM FULL for bloated tables

Question

PostgresQL Automation VACUUM FULL for bloated tables

We have a product using a PostgreSQL database server that is deployed to several hundred clients. Some of them have collected tens of gigabytes of data over the years. Therefore, in the next version, we will introduce automatic cleaning procedures that will be gradually archived and DELETE old records during nightly batch jobs.

If I understood correctly, autovacuum will run and analyze and reorganize tuples, so the performance will be as if there were fewer records.

Actual disk space will not be released if I understood correctly, since this only happens with VACUUM FULL , and it does not work autovacuum .. p>

So, I was thinking of an automatic process that would do this.

I found the kind of bells and whistles that nagios check_postgres uses at http://wiki.postgresql.org/wiki/Show_database_bloat .

Is this look good? Do I understand correctly that if tbloat -> 2, can it use VACUUM FULL? And if the ibloat parameter is too high, can it use REINDEX?

Any comments on the next assignment to work as a daily batch assignment?

vacuumdb -Z mydatabase #vacuum with analysis
select tablename from bloatview order by tbloat desc limit 1
vacuumdb -f -t tablename mydatabase
select tablename, iname from bloatview order by ibloat desc limit 1
reindexdb -t tablename -i iname mydatabase

Of course, I still need to wrap it in a nice perl script in crontab (we use ubuntu 12), or postgresql has some kind of scheduler with which I could do this:

Or is it a complete excess and is there a simpler procedure?

+7

postgresql postgresql-9.1 nagios

greyfairer Dec 18 '12 at 11:30

source share

2 answers

Ok, I made my way through it.

I simplified / reworked the view to split it into the following two:

 CREATE OR REPLACE VIEW bloat_datawidth AS SELECT ns.nspname AS schemaname, tbl.oid AS relid, tbl.relname, CASE WHEN every(avg_width IS NOT NULL) THEN SUM((1-null_frac)*avg_width) + MAX(null_frac) * 24 ELSE NULL END AS datawidth FROM pg_attribute att JOIN pg_class tbl ON att.attrelid = tbl.oid JOIN pg_namespace ns ON ns.oid = tbl.relnamespace LEFT JOIN pg_stats s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname WHERE att.attnum > 0 AND tbl.relkind='r' GROUP BY 1,2,3;

and

 CREATE OR REPLACE VIEW bloat_tables AS SELECT bdw.schemaname, bdw.relname, bdw.datawidth, cc.reltuples::bigint, cc.relpages::bigint, ceil(cc.reltuples*bdw.datawidth/current_setting('block_size')::NUMERIC)::bigint AS expectedpages, 100 - (cc.reltuples*100*bdw.datawidth)/(current_setting('block_size')::NUMERIC*cc.relpages) AS bloatpct FROM bloat_datawidth bdw JOIN pg_class cc ON cc.oid = bdw.relid AND cc.relpages > 1 AND bdw.datawidth IS NOT NULL;

And cron job:

 #!/bin/bash MIN_BLOAT=65 MIN_WASTED_PAGES=100 LOG_FILE=/var/log/postgresql/bloat.log DATABASE=unity-stationmaster SCHEMA=public if [[ "$(id -un)" != "postgres" ]] then echo "You need to be user postgres to run this script." exit 1 fi TABLENAME=`psql $DATABASE -t -A -c "select relname from bloat_tables where bloatpct > $MIN_BLOAT and relpages-expectedpages > $MIN_WASTED_PAGES and schemaname ='$SCHEMA' order by wastedpages desc limit 1"` if [[ -z "$TABLENAME" ]] then echo "No bloated tables." >> $LOG_FILE exit 0 fi vacuumdb -v -f -t $TABLENAME $DATABASE >> $LOG_FILE

+3

greyfairer Dec 19 '12 at 14:27

source share

Tometzky · Accepted Answer · 2012-12-19T18:42:10+0000

You probably don't need this. It’s good to do this once - after the first archiving job, to get back the disk space, but after that your daily archiving and autovacuum work will prevent bloating of dead tuples.

In addition, instead of vacuum full , it is often better to run cluster table_name using index_name; analyze table_name cluster table_name using index_name; analyze table_name . This will change the order of the rows by index. Thus, the related table rows can be physically locked onto the disk, which can limit disk searches (which is important on classic disk drives that are largely irrelevant on SSDs) and multiple readings for your typical queries.

And remember that both vacuum full and cluster will make your tables unusable when they run.

PostgresQL Automation VACUUM FULL for bloated tables

More articles: