Check data consistency between two postgresql databases

This is especially true for confidence in using various replication solutions that you could reinstall on another server without data loss. Or in the situation of the master, which you could know for a reasonable period of time if one of the databases did not go out of sync.

Are there any tools for this, or are people generally dependent on the replication system itself to warn of inconsistencies? I'm currently most familiar with Postgresql WAL delivery in setting up master-standby, but I am considering installing master-master with something like PgPool. However, since this solution is slightly less directly related to PostgreSQL itself (my main understanding is that it provides the connection that the application will use by intercepting various SQL statements and then sending them to all the servers in its pool), it made me more thinking about actually checking data consistency.

Specific Requirements:

  • I'm not talking about just the structure of the table. I would like to know that the actual record data is the same, so I know if the records were corrupted or skipped (in this case, I would reinitialize the damaged database with the latest backup files + WAL before returning them to the pool)

  • Databases are about 30-50 GB. I doubt that raw SELECT queries will work very well.

  • I do not see the need for real-time verification (although this, of course, would be nice). Hourly or even daily would be better than nothing.

  • Checking the block level will not work. These were two databases with independent storage.

Or is this type of verification simply not realistic?

+7
source share
2 answers

You can check the current WAL locations on both machines ... If they represent the same value, it means that your underlying databases are compatible with each other ...

$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (do it on primary host) pg_current_xlog_location -------------------------- 0/2000000 (1 row) $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (do it on standby host) pg_last_xlog_receive_location ------------------------------- 0/2000000 (1 row) $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (do it on standby host) pg_last_xlog_replay_location ------------------------------ 0/2000000 (1 row) 

You can also check this using the walsender and walreceiver processes:

 [do it on primary] $ ps -ef | grep sender postgres 6879 6831 0 10:31 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000 [ do it on standby] $ ps -ef | grep receiver postgres 6878 6872 1 10:31 ? 00:00:01 postgres: wal receiver process streaming 0/2000000 
+3
source

If you are looking for the whole table, you should do something like this (assuming the table fits pretty easily in RAM):

 SELECT md5(array_to_string(array_agg(mytable), ' ')) FROM mytable order by id; 

This will give you a hash in the tuple view in the tables.

Note that you can break this down into ranges, etc. Depending on the type of replication, you can even break it down over a range of pages (for streaming replication).

0
source

All Articles