Create a copy of the production test database with only a sample dataset

We have two Oracle databases. One of them is the database of our production customers. Another is a test database. The test database is a copy of the product without any data ... just empty tables. I am trying to create an efficient and efficient way to copy sample data from production db to Test.

I need to start with a sample PC list from the Customer table and write a process to populate the client table of the test database with this sample. Then I need to populate the child tables with the appropriate data based on the sample.

We used to have a Java process that would use a prepared statement to select data from Production ... create a file based on the result set ... then use the update statement to get it in Test db. It was terribly inefficient.

I am not a database administrator, so I know little about this world. I tried Google some information on database links, but could not find very good examples.

Does anyone know a way to do this ... or at least give me a starting point?

+4
source share
4 answers

If you have enough space, I would suggest copying all the production data for testing. It will be much easier to administer, it may also be a good opportunity to test your backup (restore from backup to a new instance).

From a developer's point of view, you cannot verify the health of your application without a representative dataset. This data set must have the same properties as the production data (data volume, physical distribution ...). The easiest way to achieve this is to have the same data in the test as in production.

If you can afford downtime, you can stop db production, copy the file to the test server and install both databases. If you can't afford downtime, it might be a good idea to upgrade some of your DBA skills (and, ultimately, learn about hot backups and then restore it to a new instance).


Update: if physical copying of the database is not possible, you should view bulk copies of the data using expdp and impdp (or the old exp / imp ). You can either copy all the schemes or filter the data during export . In this case, you would choose the appropriate WHERE clause. Export and import in bulk will be an order of magnitude faster than copying data line by line.

+5
source

I just want to point out the industry scenario and products .. not a solution, but an input for reference, as @aintnoprophet mentioned that

โ€œI am not a database administrator, so I donโ€™t know much of this kingdom. I tried Google for some information on database links, but I couldnโ€™t find very good examples.โ€

This use case is related to test data management systems. This concept deals with the fact that we need to have a way to manage the test data that goes into these specific test cases for some specific test scenarios. It will be very difficult to manually create a subset of data from large amounts of data, you significantly increase the likelihood of human error. Thus, re-tuning data for several test iterations quickly becomes unmanageable.

There are many use cases, especially in corporate examples -

http://www.compuware.com/resources/mainframe/Fileaid_clientserver.pdf

http://www.erphome.net/wdb/upload/forum1_f_8007_1227768589.pdf

this is for huge corporate data.

there is an open source tdm tool http://sourceforge.net/projects/opentdm/ but not in java (I'm not sure if this is good or not)

if you are looking for products (for links), some of them

+3
source

Test databases, which are complete copies of the product, will work much better than a subset for error cleaning. Most are a subset of people because they do not have the time and resources to make full copies, but all that has changed with the help of virtual data, as well as managing copying data. In such systems, you can swallow one instance of production and make thin copies of clones in minutes with virtually no additional storage. These systems also typically compress data, so the most popular copy is 1/3 of the original and has the advantage of a quick update in minutes, which is great for destructive testing. They are also usually integrated with disguise, since we probably do not want sensitive data coming out of the production environment to dev and test. There are several suppliers in this area. Check out Netapp SMO, Actifio, Oracle Snap Clone, and Delphix for Beginners. Most of them require some specialized equipment, which can be an obstacle. Delphix is โ€‹โ€‹fully software and even has a free version called Delphix Express that you can download. A similar technology is used by Rubrick and Cohesity for VM images.

Best wishes Kyle Hayley

+3
source

It looks like you are looking for Jailer . It exports sequential, reference intact row sets from relational databases.

+1
source

All Articles