So, we have a production database, 32 GB on a machine with 16 GB of RAM. Thanks to caching, this is usually not a problem. But whenever I run pg_dump databases, requests from application servers are launched into the queue, and after a few minutes the queue runs away and our application stops.
I will be the first to admit that we have problems with query performance, and we turn to them. Meanwhile, I want to run pg_dump every day so that it drinks from the database and does not accept our application. I don't care if hours are required. Our application does not start DDL, so I'm not worried about a lock conflict.
Trying to fix the problem, I run pg_dump with good and ionic ones. Unfortunately, this does not solve the problem.
nice ionice -c2 -n7 pg_dump -Fc production_db -f production_db.sql
Even with ion exchange resin, I still see the problem above. It looks like waiting for I / O and a lot of problems is causing the problem.
vmstat 1
It shows me that iowait fluctuates around 20-25%, and spikes - up to 40%. Real CPU% ranges between 2-5% and spikes up to 70%.
I donβt think that castles are a possible criminal. When I run this query:
select pg_class.relname,pg_locks.* from pg_class,pg_locks where pg_class.relfilenode=pg_locks.relation;
I see only locks marked "=". Usually we do not run DDL in production, so locks do not seem to be a problem.
Here are derived from ps with the WCHAN column enabled:
PID WIDE S TTY TIME COMMAND 3901 sync_page D ? 00:00:50 postgres: [local] COPY 3916 - S ? 00:00:01 postgres: SELECT 3918 sync_page D ? 00:00:07 postgres: INSERT 3919 semtimedop S ? 00:00:04 postgres: SELECT 3922 - S ? 00:00:01 postgres: SELECT 3923 - S ? 00:00:01 postgres: SELECT 3924 - S ? 00:00:00 postgres: SELECT 3927 - S ? 00:00:06 postgres: SELECT 3928 - S ? 00:00:06 postgres: SELECT 3929 - S ? 00:00:00 postgres: SELECT 3930 - S ? 00:00:00 postgres: SELECT 3931 - S ? 00:00:00 postgres: SELECT 3933 - S ? 00:00:00 postgres: SELECT 3934 - S ? 00:00:02 postgres: SELECT 3935 semtimedop S ? 00:00:13 postgres: UPDATE waiting 3936 - R ? 00:00:12 postgres: SELECT 3937 - S ? 00:00:01 postgres: SELECT 3938 sync_page D ? 00:00:07 postgres: SELECT 3940 - S ? 00:00:07 postgres: SELECT 3943 semtimedop S ? 00:00:04 postgres: UPDATE waiting 3944 - S ? 00:00:05 postgres: SELECT 3948 sync_page D ? 00:00:05 postgres: SELECT 3950 sync_page D ? 00:00:03 postgres: SELECT 3952 sync_page D ? 00:00:15 postgres: SELECT 3964 log_wait_commit D ? 00:00:04 postgres: COMMIT 3965 - S ? 00:00:03 postgres: SELECT 3966 - S ? 00:00:02 postgres: SELECT 3967 sync_page D ? 00:00:01 postgres: SELECT 3970 - S ? 00:00:00 postgres: SELECT 3971 - S ? 00:00:01 postgres: SELECT 3974 sync_page D ? 00:00:00 postgres: SELECT 3975 - S ? 00:00:00 postgres: UPDATE 3977 - S ? 00:00:00 postgres: INSERT 3978 semtimedop S ? 00:00:00 postgres: UPDATE waiting 3981 semtimedop S ? 00:00:01 postgres: SELECT 3982 - S ? 00:00:00 postgres: SELECT 3983 semtimedop S ? 00:00:02 postgres: UPDATE waiting 3984 - S ? 00:00:04 postgres: SELECT 3986 sync_buffer D ? 00:00:00 postgres: SELECT 3988 - R ? 00:00:01 postgres: SELECT 3989 - S ? 00:00:00 postgres: SELECT 3990 - R ? 00:00:00 postgres: SELECT 3992 - R ? 00:00:01 postgres: SELECT 3993 sync_page D ? 00:00:01 postgres: SELECT 3994 sync_page D ? 00:00:00 postgres: SELECT