In my application, I need to significantly increase insert performance. Example: a file containing about 21K records takes more than 100 minutes to be inserted. There are reasons why this may take some time, such as 20 minutes or so, but more than 100 minutes are too long.
Data is inserted into 3 tables (many-to-many). Identifiers are generated from the sequence, but I have already done a googled search and set hibernate.id.new_generator_mappings = true and incrementSize + increment sequence to 1000.
Also, the amount of data is not unusual at all, the file is 90 mb.
I checked with visual vm that most of the time is spent on the jdbc driver (postgresql) and sleep mode. I think the problem is due to the unique constraint in the child table. The service level performs a manual check (= SELECT) before insertion. If the record already exists, it reuses it instead of waiting for the restriction to be thrown.
So, to summarize for a particular file, there will be 1 insert in the table (it may differ, but not for this file, which is the ideal (fastest) case). This means that full inserts 60k + 20k are selected. More than 100 minutes seems very long (yes, the hardware is calculated, and it is on a simple PC with a 7200 rpm drive, without an SSD or raid). However, this is an improved version compared to the previous application (simple jdbc), on which the same insert on this equipment took about 15 minutes. Given that in both cases about 4-5 minutes are spent on “pre-treatment”, the increase is massive.
Any tips that could be improved? Is there a batch download feature?
beginner_
source share