Load big data in R data.table from Postgresql

I store my data on a Postgresql server. I want to load a table with 15mil rows on data.frameordata.table

I use RPostgreSQLto load data.

library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, ...)

# Select data from a table
system.time(
df <- dbGetQuery(con, "SELECT * FROM 15mil_rows_table")
)

It took 20 minutes to load data from DB to df . I am using a Google Cloud Server with 60 GB of RAM and a 16-core processor

What to do to reduce load time?

+4
source share
2 answers

I’m not sure if this will reduce the load time, and it will certainly reduce the load time, since both processes are quite efficient. You can leave a comment about timming.

  • bash psql csv:

COPY 15mil_rows_table TO '/path/15mil_rows_table.csv' DELIMITER ',' CSV HEADER;
  1. R :

library(data.table)
DT <- fread("/path/15mil_rows_table.csv")
+2

@Jan Gorecki zip .

1- csv

psql -h localhost -U user -d 'database' -c "COPY 15mil_rows_table TO stdout DELIMITER ',' CSV HEADER" | gzip > 15mil_rows_table.csv.gz &

2- R

DT <- fread('zcat 15mil_rows_table.csv.gz')
+1

All Articles