Efficient way to import a large number of csv files into PostgreSQL db

I see many examples of importing CSV into PostgreSQL db, but I need an efficient way to import 500,000 CSV into one PostgreSQL DBG. Each CSV is slightly larger than 500 KB (so the total is about 272 GB of data).

CSVs are identically formatted and there are no duplicate records (data was generated programmatically from the original data source). I searched and will continue to search the Internet for options, but I would be grateful for any direction to make this possible in the most efficient way. I have some experience with Python, but it will delve into any other solution that seems appropriate.

Thanks!

+7
source share
3 answers

If you start by reading the PostgreSQL tutorial, β€œFilling the Database,” you'll see some tips:

  • Load data in one transaction.
  • Use COPY if at all possible.
  • Before loading data and restoring it, delete indexes, foreign key restrictions, etc.

PostgreSQL COPY statement already supports CSV format:

 COPY table (column1, column2, ...) FROM '/path/to/data.csv' WITH (FORMAT CSV) 

so your best bet is to not use Python at all or use Python only to generate the required sequence of COPY statements.

+7
source

Good piece of data that you have. I'm not 100% sure about Postgre, but at least MySQL provides some SQL commands to pass csv directly to the table. This bypasses any paste checks, etc., and it happens more than an order of magnitude faster than any regular paste operations.

Thus, perhaps the fastest way is to create a simple python script by telling your postgre server which csv files you need to hungry endless tables in.

0
source

I use php and postgres and read the csv file using php and move the line in the following format:

 { {line1 column1, line1 column2, line1 column3} , { line2 column1,line2 column2,line2 column3} } 

Caring for a single transaction by passing a string parameter to the postgresql function.

I can check all records, formatting, amount of data, etc. and get the result of importing 500,000 records in 3 minutes.

To read the data in the postgresql function:

  DECLARE d varchar[]; BEGIN FOREACH d SLICE 1 IN ARRAY p_dados LOOP INSERT INTO schema.table ( column1, column2, column3, ) VALUES ( d[1], d[2]::INTEGER, -- explicit conversion to INTEGER d[3]::BIGINT, -- explicit conversion to BIGINT ); END LOOP; END; 
0
source

All Articles