H2 Database Download csv data faster

Question

H2 Database Download csv data faster

I want to load about 2 million lines from a CSV file into a database and run an SQL query for analysis, and then delete the data. The file size is 2 GB. Data is a web server log message. Was there any research and found that the H2 memory database is faster because it stores data in memory. When I try to load the data received an OutOfMemory error message due to 32 bit java. Planning to use 64-bit Java.

I am looking for all the optimization options for quick loading and running SQL.

Test.sql

CREATE TABLE temptable ( f1 varchar(250) NOT NULL DEFAULT '', f2 varchar(250) NOT NULL DEFAULT '', f3 reponsetime NOT NULL DEFAULT '' ) as select * from CSVREAD('log.csv');

It is executed as in 64-bit java:

 java -Xms256m -Xmx4096m -cp h2*.jar org.h2.tools.RunScript -url 'jdbc:h2:mem:test;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0' -script test.sql

If any other database is available for use on AIX, please let me know.

thanks

+6

h2

sfgroups Apr 3 '13 at 1:00

source share

1 answer

Thomas mueller · Answer 1 · 2013-04-03T04:46:21+0000

If the CSV file is 2 GB, then using a clean database in memory will require more than 4 GB of heap memory. The exact memory requirements largely depend on how redundant the data is. If the same values appear again and again, then the database will need less memory, as ordinary objects will be reused (regardless of whether it is a string, length, timestamp, ...).

Note that when using create table as select , LOCK_MODE=0 , UNDO_LOG=0 and LOG=0 are not needed. Also, CACHE_SIZE does not help when using the mem: prefix (but it helps for file systems in memory).

I suggest first using the in-memory file system ( memFS: instead of mem: , which is slightly slower than mem: but usually requires less memory:

 jdbc:h2:memFS:test;CACHE_SIZE=65536

If this is not enough, try the compressed mode in memory ( memLZF: , which is again slower but uses even less memory:

 jdbc:h2:memLZF:test;CACHE_SIZE=65536

If this is still not enough, I suggest you try the usual constant mode and see how fast it is:

 jdbc:h2:~/data/test;CACHE_SIZE=65536

H2 Database Download csv data faster

More articles: