Postgres Insertion Optimization

Question

Postgres Insertion Optimization

I have a script that generates tens of thousands of insertions in db postgres via custom ORM. As you can imagine, this is pretty slow. It is used for development purposes to create dummy data. Is there a simple optimization that I can do at the Postgres level to make this faster? This is the only script that works consistently and does not require thread safety.

Perhaps I can disable all locks, security checks, triggers, etc.? Just looking for a quick and dirty solution that will significantly speed up this process.

Thanks.

+6

sql postgresql bulkinsert

Scott Dec 03 '10 at 18:14

source share

7 answers

The fastest way to insert data is with the COPY . But this requires a flat file. I think creating a flat file is not an option.

Do not run too often, especially do not run it with auto messaging enabled. "Tens of thousands" sounds like one fix at the end, it would be right.

If you can convect your ORM to use Postgres multi-line insert, which will also speed things up.

This is an example of a multi-line insert:

  insert into my_table (col1, col2) 
 values 
 (row_1_col_value1, row_1_col_value_2), 
 (row_2_col_value1, row_2_col_value_2), 
 (row_3_col_value1, row_3_col_value_2)

If you cannot generate the above syntax and you use Java, make sure you use batch instructions instead of single attachments to the statement (maybe other database layers allow something like that)

Edit:

Message

jmz 'inspired me to add something:

You can also see the improvement when wal_buffers increases to some larger value (e.g. 8 MB) and checkpoint_segments (e.g. 16)

+8

a_horse_with_no_name Dec 03 '10 at 18:23

source share

To insert this number into hundreds and thousands, execute them:

 begin; insert1 ... insert2 ... ... insert10k ... commit;

For insertions in millions, use a copy:

 COPY test (ts) FROM stdin; 2010-11-29 22:32:01.383741-07 2010-11-29 22:32:01.737722-07 ... 1Million rows \.

Make sure that any column used as fk in another table is indexed if it is larger than trivial in size in another table.

+6

Scott marlowe Dec 05 '10 at 12:41

source share

One thing you can do is delete all indexes, make your inserts, and then recreate the indexes.

+3

Tyler eaves Dec 03 '10 at 18:21

source share

Are you sending a packet of tens of thousands of INSERT OR , are you sending tens of thousands of INSERT?

I know that with Hibernate you can load all your SQL queries and send them at the end in one big chunk instead of taxing the network and overhead of the database, making thousands of SQL queries individually.

+2

karlgrz Dec 03 '10 at 18:23

source share

If you just initialize persistent test data, you can also put the test data in the intermediate table (s) and then simply copy the contents of the table using

 INSERT INTO... SELECT...

it should be about as fast as using COPY (although I have not tested it), with the advantage that you can copy using only SQL commands, without the hassle of setting up an external file, for example, for COPY.

+2

sleske Dec 05 '10 at 13:38

source share

Try to do as much as possible in one request!

 insert into my_table (col1, col2) values ( unnest(array[row_1_col_value_1, row_2_col_value_1, row3_col_value_1]), unnest(array[row_1_col_value_2, row_2_col_value_2, row_3_col_value_2));

This is similar to @a_horse_with_no_name's suggestion. The advantage of using unnest is this: you can use query parameters containing arrays!

 insert into my_table (col1, col2) values (unnest(:col_values_1), unnest(:col_values_2));

By combining three insert into one, you save more than 50% of the execution time. And using query parameters with 2000 values in one insert , I get a speed factor of 150 in my application.

+2

hagello Jun 07 '13 at 8:56

source share

jmz · Accepted Answer · 2010-12-03T22:24:07+0000

If you do not need this functionality in a production environment, I would suggest that you disable fsync from your PostgreSQL configuration. This will significantly speed up the insertion.

Never shut down fsync in a production database.

Postgres Insertion Optimization

More articles: