I / O throttling in postgres pg_dump?

So, we have a production database, 32 GB on a machine with 16 GB of RAM. Thanks to caching, this is usually not a problem. But whenever I run pg_dump databases, requests from application servers are launched into the queue, and after a few minutes the queue runs away and our application stops.

I will be the first to admit that we have problems with query performance, and we turn to them. Meanwhile, I want to run pg_dump every day so that it drinks from the database and does not accept our application. I don't care if hours are required. Our application does not start DDL, so I'm not worried about a lock conflict.

Trying to fix the problem, I run pg_dump with good and ionic ones. Unfortunately, this does not solve the problem.

nice ionice -c2 -n7 pg_dump -Fc production_db -f production_db.sql 

Even with ion exchange resin, I still see the problem above. It looks like waiting for I / O and a lot of problems is causing the problem.

 vmstat 1 

It shows me that iowait fluctuates around 20-25%, and spikes - up to 40%. Real CPU% ranges between 2-5% and spikes up to 70%.

I don’t think that castles are a possible criminal. When I run this query:

 select pg_class.relname,pg_locks.* from pg_class,pg_locks where pg_class.relfilenode=pg_locks.relation; 

I see only locks marked "=". Usually we do not run DDL in production, so locks do not seem to be a problem.

Here are derived from ps with the WCHAN column enabled:

 PID WIDE S TTY TIME COMMAND 3901 sync_page D ? 00:00:50 postgres: [local] COPY 3916 - S ? 00:00:01 postgres: SELECT 3918 sync_page D ? 00:00:07 postgres: INSERT 3919 semtimedop S ? 00:00:04 postgres: SELECT 3922 - S ? 00:00:01 postgres: SELECT 3923 - S ? 00:00:01 postgres: SELECT 3924 - S ? 00:00:00 postgres: SELECT 3927 - S ? 00:00:06 postgres: SELECT 3928 - S ? 00:00:06 postgres: SELECT 3929 - S ? 00:00:00 postgres: SELECT 3930 - S ? 00:00:00 postgres: SELECT 3931 - S ? 00:00:00 postgres: SELECT 3933 - S ? 00:00:00 postgres: SELECT 3934 - S ? 00:00:02 postgres: SELECT 3935 semtimedop S ? 00:00:13 postgres: UPDATE waiting 3936 - R ? 00:00:12 postgres: SELECT 3937 - S ? 00:00:01 postgres: SELECT 3938 sync_page D ? 00:00:07 postgres: SELECT 3940 - S ? 00:00:07 postgres: SELECT 3943 semtimedop S ? 00:00:04 postgres: UPDATE waiting 3944 - S ? 00:00:05 postgres: SELECT 3948 sync_page D ? 00:00:05 postgres: SELECT 3950 sync_page D ? 00:00:03 postgres: SELECT 3952 sync_page D ? 00:00:15 postgres: SELECT 3964 log_wait_commit D ? 00:00:04 postgres: COMMIT 3965 - S ? 00:00:03 postgres: SELECT 3966 - S ? 00:00:02 postgres: SELECT 3967 sync_page D ? 00:00:01 postgres: SELECT 3970 - S ? 00:00:00 postgres: SELECT 3971 - S ? 00:00:01 postgres: SELECT 3974 sync_page D ? 00:00:00 postgres: SELECT 3975 - S ? 00:00:00 postgres: UPDATE 3977 - S ? 00:00:00 postgres: INSERT 3978 semtimedop S ? 00:00:00 postgres: UPDATE waiting 3981 semtimedop S ? 00:00:01 postgres: SELECT 3982 - S ? 00:00:00 postgres: SELECT 3983 semtimedop S ? 00:00:02 postgres: UPDATE waiting 3984 - S ? 00:00:04 postgres: SELECT 3986 sync_buffer D ? 00:00:00 postgres: SELECT 3988 - R ? 00:00:01 postgres: SELECT 3989 - S ? 00:00:00 postgres: SELECT 3990 - R ? 00:00:00 postgres: SELECT 3992 - R ? 00:00:01 postgres: SELECT 3993 sync_page D ? 00:00:01 postgres: SELECT 3994 sync_page D ? 00:00:00 postgres: SELECT 
+4
source share
2 answers
  • Simplest: You can throttle pg_dump with pv .
  • The harder: Change the backup procedure. Use for example:
      psql -c 'pg_start_backup ()'
         rsync --checksum --archive / var / lib / pgsql / backups / pgsql
         psql -c 'pg_stop_backup ()'
    
    But keep in mind that you also need to configure continuous archiving for this to work, and all WAL files created during the backup, hidden along the backup of the data files,
  • Even harder: You can configure the replicated database (using, for example, sending logs ) to an additional cheap disk and instead of backing up the backup copy of the backup copy. Even he will lag behind some transactions, which he will eventually catch up. But check if the replica is updated correctly before starting the backup.
0
source

Your PS output has several pending UPDATE statements that still talk about locking for me (your request for checking locks aside). I am sure that you would not see a β€œwait” on the PS output otherwise. You can check if this query shows anything during the problem:

 SELECT * FROM pg_stat_activity WHERE waiting; 

(You did not say which version of PostgreSQL you are using, so I'm not sure if this will work.)

If there is something (that is, with a wait = TRUE), then this is a lock / transaction problem.

0
source

All Articles