Fast data loading
- Transfer your data to CSV.
- Create a temporary table (as you noted, without indexes).
- Run the command COPY:
\COPY schema.temp_table FROM /tmp/data.csv WITH CSV - Paste the data into a non-temporary table.
- Creating indexes.
- Set the appropriate statistics.
Additional recommendations
For large amounts of data:
- Split data into child tables.
- Insert it in the column order from which most
SELECT will be used. In other words, try to align the physical model with the logical model. - Adjust the configuration settings.
- Create a
CLUSTER index (the most important column on the left). For example:
CREATE UNIQUE INDEX measurement_001_stc_index
ON climate.measurement_001
USING btree
(station_id, taken, category_id);
ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;
Configuration settings
On a machine with 4 GB of RAM, I did the following ...
Kernel Configuration
Tell the kernel that everything is in order for programs to use shared memory cubes:
sysctl -w kernel.shmmax=536870912 sysctl -p /etc/sysctl.conf
PostgreSQL configuration
Tables for children
For example, let's say you have weather-based data divided into different categories. Instead of having one monstrous table, divide it into several tables (one for each category).
Master table
CREATE TABLE climate.measurement ( id bigserial NOT NULL, taken date NOT NULL, station_id integer NOT NULL, amount numeric(8,2) NOT NULL, flag character varying(1) NOT NULL, category_id smallint NOT NULL, CONSTRAINT measurement_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE );
Table for children
CREATE TABLE climate.measurement_001 ( -- Inherited from table climate.measurement_001: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass), -- Inherited from table climate.measurement_001: taken date NOT NULL, -- Inherited from table climate.measurement_001: station_id integer NOT NULL, -- Inherited from table climate.measurement_001: amount numeric(8,2) NOT NULL, -- Inherited from table climate.measurement_001: flag character varying(1) NOT NULL, -- Inherited from table climate.measurement_001: category_id smallint NOT NULL, CONSTRAINT measurement_001_pkey PRIMARY KEY (id), CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1) ) INHERITS (climate.measurement) WITH ( OIDS=FALSE );
Table statistics
Increase table statistics for important columns:
ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000; ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;
Do not forget ANALYSE and ANALYSE after that.
Dave jarvis
source share