Postgres Insertion Optimization

I have a script that generates tens of thousands of insertions in db postgres via custom ORM. As you can imagine, this is pretty slow. It is used for development purposes to create dummy data. Is there a simple optimization that I can do at the Postgres level to make this faster? This is the only script that works consistently and does not require thread safety.

Perhaps I can disable all locks, security checks, triggers, etc.? Just looking for a quick and dirty solution that will significantly speed up this process.

Thanks.

+6
sql postgresql bulkinsert
source share
7 answers

If you do not need this functionality in a production environment, I would suggest that you disable fsync from your PostgreSQL configuration. This will significantly speed up the insertion.

Never shut down fsync in a production database.

+8
source share

The fastest way to insert data is with the COPY . But this requires a flat file. I think creating a flat file is not an option.

Do not run too often, especially do not run it with auto messaging enabled. "Tens of thousands" sounds like one fix at the end, it would be right.

If you can convect your ORM to use Postgres multi-line insert, which will also speed things up.

This is an example of a multi-line insert:

  insert into my_table (col1, col2) 
 values 
 (row_1_col_value1, row_1_col_value_2), 
 (row_2_col_value1, row_2_col_value_2), 
 (row_3_col_value1, row_3_col_value_2)

If you cannot generate the above syntax and you use Java, make sure you use batch instructions instead of single attachments to the statement (maybe other database layers allow something like that)

Edit:

Message

jmz 'inspired me to add something:

You can also see the improvement when wal_buffers increases to some larger value (e.g. 8 MB) and checkpoint_segments (e.g. 16)

+8
source share

To insert this number into hundreds and thousands, execute them:

 begin; insert1 ... insert2 ... ... insert10k ... commit; 

For insertions in millions, use a copy:

 COPY test (ts) FROM stdin; 2010-11-29 22:32:01.383741-07 2010-11-29 22:32:01.737722-07 ... 1Million rows \. 

Make sure that any column used as fk in another table is indexed if it is larger than trivial in size in another table.

+6
source share

One thing you can do is delete all indexes, make your inserts, and then recreate the indexes.

+3
source share

Are you sending a packet of tens of thousands of INSERT OR , are you sending tens of thousands of INSERT?

I know that with Hibernate you can load all your SQL queries and send them at the end in one big chunk instead of taxing the network and overhead of the database, making thousands of SQL queries individually.

+2
source share

If you just initialize persistent test data, you can also put the test data in the intermediate table (s) and then simply copy the contents of the table using

 INSERT INTO... SELECT... 

it should be about as fast as using COPY (although I have not tested it), with the advantage that you can copy using only SQL commands, without the hassle of setting up an external file, for example, for COPY.

+2
source share

Try to do as much as possible in one request!

 insert into my_table (col1, col2) values ( unnest(array[row_1_col_value_1, row_2_col_value_1, row3_col_value_1]), unnest(array[row_1_col_value_2, row_2_col_value_2, row_3_col_value_2)); 

This is similar to @a_horse_with_no_name's suggestion. The advantage of using unnest is this: you can use query parameters containing arrays!

 insert into my_table (col1, col2) values (unnest(:col_values_1), unnest(:col_values_2)); 

By combining three insert into one, you save more than 50% of the execution time. And using query parameters with 2000 values ​​in one insert , I get a speed factor of 150 in my application.

+2
source share

All Articles