PostgreSQL DB performance issues with thousands of connections and distributed transactions

We are trying to evaluate PostgreSQL DB as an alternative to the Oracle database in our application. We are using PostgreSQL 9.5, which is installed on a Linux machine with 128 GB of memory, 32 CPU cores and SSD storage. Connection pools and distributed transactions are managed by the JBoss 7 application server, SQL queries are generated / executed by Hibernate 4. Most tables have tens of millions of rows, one of which contains hundreds of millions of rows. In total, about 3,000 connections to the database (they are combined by the application server) are active and used simultaneously. We modified some queries, created indexes for slow ones, set up database and OS parameters based on documentation, etc. However, the throughput is several times slower, and ultimately the response time of the database increases by 10-20 times.

I did some searches, and I could not find information about anyone else (ab) using PostgreSQL DB in the same way:

  • using thousands of active database connections
  • using this large number of distributed transactions (PREPARED OPERATIONS)
  • saving billions of rows in one table

Oracle has no problem handling even higher loads. I would be happy to share my experiences, suggestions, links, etc.

thanks

+6
source share
2 answers

The solution was to upgrade the Linux kernel and reduce the number of DB connections in our Java connection pools from 3000 to 300. After this change, we could handle the same traffic as in Oracle DB.

By chance, I found valuable information that leads to the resolution of a problem in the comment section for a message. Did I say 32 kernels? What about 64? written by Robert Haas (Vice President, Chief Architect, @EnterpriseDB Database Server, PostgreSQL Principal Developer and Commander):

No, I say that to get good performance on a 64-core server, you will need PostgreSQL> = 9.2 and Linux> = 3.2. Most of the changes are valid on the PostgreSQL side, but the lseek value for scaling in the Linux kernel is also important.

0
source

The postgresql.conf file must have appropriate settings to handle a large number of connections. It can also be completed by pgpool2 for replication and load balancing. We use Postgres in a cluster env and it works well.

-one
source

All Articles