Database pooling, how to handle duplicate PK

We have three databases that are physically separated by region, one in LA, SF, and NY. All databases use the same schema, but contain data related to their region. We strive to combine these databases into one and reflect this. We need to save data for each region, but combine them into one bit. For us, this is quite a lot of questions, for example, we will definitely have duplicate primary keys, and foreign keys will be potentially invalid.

I hope to find someone who has experience with a task that could provide some advice, strategies and words of experience on how we can perform a merge.

For example, one idea was to create compound keys and then modify our code and sprocs to find data using the compound key (region / original pk). But this requires us to modify all our code and sprocs.

Another idea was to simply import the data and allow the generation of a new PK, and then update all FK links to the new PK. Therefore, we may not need to modify any code.

Any experience is welcome!

+7
merge sql database
source share
6 answers

I have no direct experience with this, but it seems to me that you should be able to unambiguously display PK β†’ New PK for each server. For example, generate new PKs so that the data from the LA server has PK% 3 == 2, SF has PK% 3 == 1, and NY has PK% 3 == 0. And since, as I understand yours the question is, each server only maintains the FK relationship with its own data, you can update the FK in the same way.

NewLA = OldLA*3-1 NewSF = OldLA*3-2 NewNY = OldLA*3 

Then you can combine them and not have duplicate PCs. In fact, as you said, this is just creating new PCs, but structuring in this way allows you to trivially update your FKs (assuming, like me, that the data on each server is isolated). Good luck.

+3
source share

BEST: Add a column for RegionCode and enable it on your PCs, but you don’t want to do all the footwork.

HACK: if your identifiers are INT, a quick solution would be to add a fixed region-based value for each key when importing. INT can be equal: 2,147,483,647

local server data:

 LA IDs: 1,2,3,4,5,6 SF IDs: 1,2,3,4,5 NY IDs: 1,2,3,4,5,6,7,9 

add 100000000 to LA identifiers

add 200000000 to SF identifiers

add 300000000 identifiers in NY

combined server data:

 LA IDs: 100000001,100000002,100000003,100000004,100000005,100000006 SF IDs: 200000001,200000002,200000003,200000004,200000005 NY IDs: 300000001,300000002,300000003,300000004,300000005,300000006,300000007,300000009 
+1
source share

I did this, and I say that change your keys (choose a method), not change your code. Invariably, you will either skip the stored procedure or enter an error. With data changes, it’s pretty easy to write tests to find lost records or to check for correct matches. With code changes, especially with code that works correctly, it's too easy to miss something.

+1
source share

One thing you can do is set up regional data tables to use a GUID. Thus, the primary keys in each region are unique, and you can mix and match data (import data from one region to another). For tables that share data (such as type tables), you can save the primary keys as they are (since they must be the same everywhere).

The following is the GUID information: http://www.sqlteam.com/article/uniqueidentifier-vs-identity

Perhaps SQL Server Management Studio makes it easy to convert columns to GUIDs. I hope so!

Good luck.

0
source share

what i did in this situation:

  • create a new db with the same schema but only tables. no pk fk, checks, etc.
  • transfer data from DB1 to this db source
  • for each table in the target database, find the top number for the PC
  • For each table in the source, update the database pk, fk, etc. starting from (upper number + 1) from target db
  • for each table in the target database, set the ID to insert on
  • import data from db source to target db
  • for each table in the target database set the identifier insert
  • clear source db
  • repeat for DB2
0
source share

As John said, I would use a GUID to solve the merge task. And I see two different solutions requiring a GUID:

1) Constantly change the database schema to use the GUID instead of INTEGER (IDENTITY) as the primary key.

This is a good solution in general, but if you have a lot of code other than SQL that has something to do with how your identifiers work, this may require content code changes. Probably, since you are joining databases, you may need to update the application so that it works with data from one region only based on a registered user, etc.

2) Temporarily add a GUID for migration purposes only, and leave the data after the data transfer:

This view is more complex, but as soon as you write this porting script, you can (re) run it several times to rejoin the databases if you screwed it in for the first time. Here is an example:

 Table: PERSON (ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL) Table: ADDRESS (ID INT PRIMARY KEY, City VARCHAR(100) NOT NULL, PERSON_ID INT) 

Your alternative scripts (note that for all PCs we automatically generate GUIDs):

 ALTER TABLE PERSON ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID()) ALTER TABLE ADDRESS ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID()) ALTER TABLE ADDRESS ADD PERSON_UID UNIQUEIDENTIFIER NULL 

Then you update FK to match INTEGER:

 --// set ADDRESS.PERSON_UID UPDATE ADDRESS SET ADDRESS.PERSON_UID = PERSON.UID FROM ADDRESS INNER JOIN PERSON ON ADDRESS.PERSON_ID = PERSON.ID 

You do this for all PCs (automatically generate GUIDs) and FKs (update as shown above).

You are now creating your target database. In this target database, you also add UID columns for all PCs and FKs. Also disable all FK restrictions.

Now you insert from each of the source databases into the target (note: we do not insert PKs and integer FKs):

 INSERT INTO TARGET_DB.dbo.PERSON (UID, NAME) SELECT UID, NAME FROM SOURCE_DB1.dbo.PERSON INSERT INTO TARGET_DB.dbo.ADDRESS (UID, CITY, PERSON_UID) SELECT UID, CITY, PERSON_UID FROM SOURCE_DB1.dbo.ADDRESS 

After you have inserted the data from all the databases, you run the code opposite to the original to make the integer FKs compatible with the GUID in the target database:

 --// set ADDRESS.PERSON_ID UPDATE ADDRESS SET ADDRESS.PERSON_ID = PERSON.ID FROM ADDRESS INNER JOIN PERSON ON ADDRESS.PERSON_UID = PERSON.UID 

Now you can delete all UID columns: ALTER TABLE PERSON DROP COLUMN UID ALTER TABLE ADDRESS DROP COLUMN UID ALTER TABLE ADDRESS DROP COLUMN PERSON_UID

So, in the end you should get a fairly long migration script that should do the job for you. The fact is that IT DOABLE

NOTE: everything written here is not verified.

0
source share

All Articles