The fastest way to populate a database table with 10 million rows

Question

The fastest way to populate a database table with 10 million rows

What is the fastest way to populate a database table with 10 million rows? I ask about this technique, as well as any specific database engine that will allow you to do this as quickly as possible. I do not require this data to be indexed during this initial data set.

+4

performance language-agnostic database

luvieere Nov 19 '09 at 20:38

source share

5 answers

The fastest method probably starts an INSERT SQL query with SELECT FROM. I created test data to populate tables from other databases and even the same database several times. But it all depends on the nature and availability of your own data. In my case, I had enough rows of collected data, in which several select / insert routines with a random row selection applied twice to the real data quickly brought decent test data. In some cases, when these tables were uniquely identified, I used intermediate tables and sorted by frequency distributions to exclude things like unusual names (excluded instances in which the number with the group was less than or equal to 2)

In addition, Red Gate actually provides a utility to do just what you ask. It is not free, and I think it is Sql Server-specific, but their tools are top notch. Well worth it. There is also a free trial period.

If you do not want to pay or their utility, you could build your own pretty quickly. What they do is not magic at all. A decent developer should be able to beat out a similar, albeit alpha / hardcoded version of the application, in a day or two ...

+2

Paul sasik Nov 19 '09 at 20:47

source share

You may be interested in the answers to this question . It considers loading a massive CSV file into a SQL Server database (2005). For SQL Server, it seems that the SSIS DTS package is the fastest way to bulk import data into a database.

+1

Magicandi Nov 19 '09 at 20:57

source share

It all depends on your database. For example, Oracle has something called direct path loading ( http://download.oracle.com/docs/cd/B10501_01/server.920/a96652/ch09.htm ) that effectively disables indexing, and if I understand correctly builds a binary file of the structure that will be written to disk on the client-side, and not send SQL.

Combined with partitioning and rebuilding indexes into one partition, we were able to load a 1 billion database (I don’t know you) in a relatively short order. 10 million lines is nothing.

+1

James Nov 20 '09 at 8:46

source share

Use MySQL or MS SQL and built-in functions to create records inside the database engine. Or create a text file (in cvs format, as a format), and then use the bulk copy functions.

0

Elalfer Nov 19 '09 at 20:40

source share

gooli · Accepted Answer · 2009-11-20T01:14:26+0000

Using SQL to load large amounts of data into a database usually results in poor performance. To do something fast, you need to get around the SQL engine. Most databases (including Firebird, I think) have the ability to archive all data into a text file (or possibly XML) and restore the entire database from such a dump file. Since the recovery process does not require a transaction, and the data is not presented as SQL, it is usually very fast.

I would write a script that creates the dump file manually, and then use the database recovery utility to load the data.

After a short search, I found FBExport , which seems to be able to do just that - you just need to create a CSV file, and then use the FBExport tool to import this data into your database.

The fastest way to populate a database table with 10 million rows

More articles: