Why does replication still fail with "max_dbs_open increase" after increasing max_dbs_open?

Our application uses filtered CouchDB replicas to move data between user databases and the main database. As we increase the number of users, replication fails with this message

Source and target databases out of sync. Try to increase max_dbs_open at both servers. 

We did this by increasing the number of max_dbs_open to a ridiculously large number (10,000), but the errors and messages remained the same. Obviously, something else is wrong. Does anyone know what it is?

+7
source share
2 answers

As it turned out, the increase max_dbs_open at best, a partial response and, at worst, misleading. In our case, the problem was not in the number of open databases, but, apparently, in the number of HTTP connections used by our many repetitions.

Each replication can use min(worker_processes + 1, http_connections) , where worker_processes is the number of workers assigned to each replication, and http_connections is the maximum number of HTTP connections allocated for each replication, as described in this document .

Thus, the total number of compounds used

 number of replications * min(worker_processes + 1, http_connections) 

The default value of worker_processes is 4, and the default value of http_connections is 20. If there are 100 replications, the total number of HTTP connections used by replication is 500. Another max_connections setting determines the maximum number of HTTP connections allowed by the CouchDB server, as described in this document . The default value is 2048.

In our case, each user has two replications: one from the user to the main database and the other from the main database to the user. Thus, in our case with the default settings, every time we added a user, we added an additional 10 HTTP connections, which ultimately purged the default max_connections .

Since our replications are minimal, and only a small amount of data moves from user to master and from master to user, we dialed the number worker_processes , http_connections , increased max_connections and all is well.

UPDATE

Several other results

  • It was necessary to raise ulimit to the process so that it had more open connections

  • Creating replication too quickly also caused problems. If I typed the answer, how quickly I created new replications, it also helped alleviate the problem. YMMV.

+10
source

For me, this error occurred because the "instanceStartTime" returned by the target database at GET /{targetDB}/ was not valid.

0
source

All Articles