Life copies for the development team at MongoDb

Q: What is the best architecture for life copies for testing and development?

Current setting:

We have two mazod amazon / EC2 servers, for example:

Machine A: A production database (on an amazon/EC2 server) (name it 'PROD') Other databases ('OTHER') Machine B: a pre-production database (name it 'PRE') a copy for developer 1 own tests (call it 'DEVEL-1') a copy for developer 2 (DEVEL-2) …DEVEL-n 

The PRE database is intended for integration tests before deployment to production.

DEVEL-n is for every developer who destroys their own data without annoying other developers.

From time to time we want to "restore" fresh data from PROD to the PRE and DEVEL-n databases.

We are currently moving from PROD to PRE using the .copyDatabase () command. Then we issue .copyDatabase () "n" times to make copies from PRE to DEVEL-n.

Problem:

The copy takes soooo long (1 hour per copy, DBsize is more than 10 GB), and it usually also saturates mongod, so we need to restart the service.

We found about:

  • Dump / recovery system (saturated as .copyDatabase ())
  • Replica sets
  • Master / Slave (seems outdated)

The replica kits seem to be winners, but we have serious doubts:

Suppose we want a set of replicas to synchronize live A / PROD in B / PRE (and have the possibility that A is primary and B is more likely secondary):

a) Can I select “several” databases from A for PROD replication, but leave OTHER alone?

b) Can I have “additional” databases in B (for example, DEVEL-n) that are not in the main?

c) Can I “stop replicating” so that we can deploy to PRE, test soft with fresh data, upload the data using testing, and after the tests have been completed, “relink” the replica, so the changes in the PRE are deleted and the changes in PROD are transferred to PRE adequately?

d) Is there a better way than a replica set suitable for this case?

Thanks. Marina and Xavi.

+4
source share
2 answers

The replica kits seem to be winners, but we have serious doubts:

Suppose we want a set of replicas to synchronize live A / PRODs in B / PRE (and probably as primary, but B is more likely secondary):

a) Can I select “several” databases from A for PROD replication, but leave OTHER alone?

As in MongoDB 2.4, replication always includes all databases. The purpose of the design is to ensure that all nodes are ultimately consistent replicas, so that you can redirect another non-hidden secondary object in the same replica set.

b) Can I have “additional” databases in B (for example, DEVEL-n) that are not in the main?

No, there is only one primary in the replica set.

c) Can I “stop replicating” so that we can deploy the PRE, test soft with the latest data, discard the data using testing, and after the test the replica “link” is completed, so the changes in the PRE are deleted and the changes in the PROD are transferred to the PRE adequately?

Since there can be only one primary, using the creation example and test roles in one replica set is not possible, as you expected.

Best practice is to isolate your production and dev / intermediates so that there is no unexpected interaction.

d) Is there a better way than a replica set suitable for this case?

There are several approaches that you can take to limit the amount of data you need to transfer so that you do not copy the full database (10Gb) through each time. Replica sets are suitable as part of the solution, but you need a separate stand-alone server or replica for your PRE environment.

Some suggestions:

  • Use a replica set and add a hidden secondary to your development environment. You can take backups from this node without affecting your production application, and since secondary replications change as they arise, you should make a comparatively faster backup of the local network.

  • Implement your own MongoDB oplog tail cursor partial replication scheme . The local oplog.rs collection with restrictions is the same mechanism that is used to relay changes to replica set members and contains information about inserts, deletes, and updates. You could map the appropriate database namespaces and relay the corresponding changes from your production replica installed in your PRE sandbox.

Any of these approaches will allow you to control when the backup will be transferred from PROD to PRE, as well as rebooting from the previous point after testing.

+1
source

In our setup, we use EBS snapshots to quickly replicate a production database in an intermediate environment. Snapshots are run every few hours as part of a backup cycle. When starting a new database server in the internship phase, it searches for the last database snapshot and uses it for the EBS disk. The snapshot is instantly close, recovery is also very fast. This approach also scales very well, we actually use it in a huge MongoDB installation. The only downside is that you need to rely on AWS to implement it. In some cases, this may be undesirable.

0
source

All Articles