Slow insertion speed in Postgresql memory table space

I have a requirement when I need to store records at a speed of 10,000 records / sec in a database (with indexing over several fields). The number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve input speed, I changed the table space from disk to RAM. With this, I can only achieve 5,000 inserts per second.

I also performed the following setup in postgres configuration:

  • Indices: no
  • fsync: false
  • logging: disabled

Additional Information:

  • Tablespace: RAM
  • The number of columns in one row: 25 (mostly integer)
  • CPU: 4 cores, 2.5 GHz
  • RAM: 48 GB.

I am wondering why a single insert request takes about 0.2 ms on average when the database does not write anything to disk (since I use a RAM-based table space). Is there something I'm doing wrong?

Help evaluate.

Prashant

+6
postgresql
source share
4 answers

Fast data loading

  • Transfer your data to CSV.
  • Create a temporary table (as you noted, without indexes).
  • Run the command COPY: \COPY schema.temp_table FROM /tmp/data.csv WITH CSV
  • Paste the data into a non-temporary table.
  • Creating indexes.
  • Set the appropriate statistics.

Additional recommendations

For large amounts of data:

  • Split data into child tables.
  • Insert it in the column order from which most SELECT will be used. In other words, try to align the physical model with the logical model.
  • Adjust the configuration settings.
  • Create a CLUSTER index (the most important column on the left). For example:
  CREATE UNIQUE INDEX measurement_001_stc_index
       ON climate.measurement_001
       USING btree
       (station_id, taken, category_id);
     ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

Configuration settings

On a machine with 4 GB of RAM, I did the following ...

Kernel Configuration

Tell the kernel that everything is in order for programs to use shared memory cubes:

 sysctl -w kernel.shmmax=536870912 sysctl -p /etc/sysctl.conf 

PostgreSQL configuration

  • Edit /etc/postgresql/8.4/main/postgresql.conf and install:
      shared_buffers = 1GB
     temp_buffers = 32MB
     work_mem = 32MB
     maintenance_work_mem = 64MB
     seq_page_cost = 1.0
     random_page_cost = 2.0
     cpu_index_tuple_cost = 0.001
     effective_cache_size = 512MB
     checkpoint_segments = 10 
  • Highlight the values ​​that are necessary and appropriate for your environment. You may need to modify them to suitably read / write optimizations later.
  • Restart PostgreSQL.

Tables for children

For example, let's say you have weather-based data divided into different categories. Instead of having one monstrous table, divide it into several tables (one for each category).

Master table

 CREATE TABLE climate.measurement ( id bigserial NOT NULL, taken date NOT NULL, station_id integer NOT NULL, amount numeric(8,2) NOT NULL, flag character varying(1) NOT NULL, category_id smallint NOT NULL, CONSTRAINT measurement_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); 

Table for children

 CREATE TABLE climate.measurement_001 ( -- Inherited from table climate.measurement_001: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass), -- Inherited from table climate.measurement_001: taken date NOT NULL, -- Inherited from table climate.measurement_001: station_id integer NOT NULL, -- Inherited from table climate.measurement_001: amount numeric(8,2) NOT NULL, -- Inherited from table climate.measurement_001: flag character varying(1) NOT NULL, -- Inherited from table climate.measurement_001: category_id smallint NOT NULL, CONSTRAINT measurement_001_pkey PRIMARY KEY (id), CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1) ) INHERITS (climate.measurement) WITH ( OIDS=FALSE ); 

Table statistics

Increase table statistics for important columns:

 ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000; ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000; 

Do not forget ANALYSE and ANALYSE after that.

+16
source share

You make your insert as a series

 INSERT INTO tablename (...) VALUES (...); INSERT INTO tablename (...) VALUES (...); ... 

or as a single line insert:

 INSERT INTO tablename (...) VALUES (...),(...),(...); 

the second will be much faster by 100 thousand lines.

source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

+4
source share

Did you host xlog (WAL segments) also on your RAM disk? If not, you are still writing to disk. What about the settings for wal_buffers, checkpoint_segments, etc.? You should try to get all of your 100,000 records (your only transaction) in your wal_buffers. Increasing this setting may cause PostgreSQL to request more System V shared memory than the default setting for your operating system allows.

+3
source share

I suggest you use COPY instead of INSERT .

You must also fine-tune your postgresql.conf file.

Read about http://wiki.postgresql.org/wiki/Performance_Optimization

+1
source share

All Articles