Compare millions of records from Oracle to SQL Server

I have an Oracle database and a SQL Server database. One table says Inventory , which contains millions of rows in both database tables and continues to grow.

I want to compare Oracle table data with SQL Server data to find out which records are missing from the SQL Server table on a daily basis.

What is the best approach for this?

  • Create an SSIS package.
  • Create a Windows service.

I want to consume less resources to achieve this functionality, which takes less time and less resources.

For example: 18 million records in oracle and 16/17 million in SQL Server

This situation with two different databases arises from the fact that two different applications are online and offline.

EDIT : how about connecting SQL Server from Oracle through Oracle Gateway to SQL Server before

1) Direct query to SQL server from Oracle to update a missing record on SQL server for the first time.

2) Create a trigger for Oracle that fires when a record is deleted from Oracle and inserts the deleted record into a new oracle table. A.

3) Create an SSIS package to bind the newly created oracle table with the SQL server to update the SQL server record. This method allows you to process only a few records daily through SSIS.

What do you think of this approach?

+8
database oracle sql-server ssis
source share
5 answers

I would create an SSIS package and load data from an Oracle table using Data Data / OLE DB Data Source. If you have SQL Enterprise, Attunity connectors are a little faster.

Then I would load the key from the SQL Server table into the Lookup transformation, where I would match the two sources on the key and route the unsurpassed rows to separate output.

Finally, I would direct the output of unsurpassed rows to the OLE DB command to update the SQL Server table.

This SSIS package will require a lot of memory, but since matching is performed in memory with minimal IO, it is probably superior to other speed solutions. To cache all keys from a SQL Server table, you will need enough free memory.

SSIS also has the advantage of having many other conversion functions if you need them later.

+1
source share

What you basically want to do is replicate from Oracle to SQL Server.

You can do this in SSIS, Windows, or indeed many platforms. The real trick uses the right design pattern.

There are two general design patterns.

  • Snapshot replication

You take all the records from both systems and compare them somewhere (so far we have offers to compare with SSIS or compare according to Oracle, but we have not yet proposed to compare on SQL Server, although this is true)

You are comparing 18 million entries here, so this is a lot of work.

  1. Differential replication

You record changes to the publisher (i.e., Oracle) since the last replication, then apply the changes to the subscriber (i.e., SQL Server)

You can do this manually by executing triggers and log tables on the Oracle side, then use the normal ETL process (SSIS, command line tools, text files, etc.), possibly planned in SQL Agent for their application to SQL Server.

Or you could do this using replication out of the box to configure Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx

You will have to try a few of them and see what works for you.

Given this goal:

I want to consume less resources to achieve this functionality, which takes less time and less resources

transactional replication is much more efficient, but more complex. For maintenance purposes, which platforms (.Net, SSIS, Python, etc.) are most convenient for you?

+1
source share

Other alternatives:

If you can use the Oracle gateway for SQL Server, you do not need to transfer data and directly query.

If you cannot use the Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Easy to use.

0
source share

I think the best approach is to use an oracle gateway. Just follow the instructions. I have the same experience.

For example, you can use this operator in your procedure.

  INSERT INTO "dbo"."sql_server_table"@dblink_name("column1","column2"...."column5") VALUES ( select column1,column2....column5 from oracle_table minus select "column1","column2"...."column5" from "dbo"."sql_server_table"@dblink_name ) 
  1. Create a scheduler that runs the procedure daily.

When both databases are online, missing records will be inserted into the sql server. Otherwise, the scheduler fails or you can perform the procedure manually. Minimum resource required.

0
source share

I suggest having a home ETL solution.

  • Schedule an oracle job to export table source data (daily based on application logic) to the regular CSV format .
  • Schedule the SQL server job (with an acceptable delay from the first oracle job) to read this CSV file and import it into the middle table inside sql-servter using BULK INSERT .
  • The last part of the SQL server job will read the data from the environment table and execute the logic (insert, update the target table). I suggest having another table for storing reports on this daily work result.
0
source share

All Articles