How to insert data as quickly as possible with Hibernate

I read the file and create an object from it and save it in the postgresql database. My file contains 100,000 documents that I read from a single file, and split it and finally saved it in a database. I cannot create a List<> and save the entire document in List<> , because my RAM is small. My code for reading and writing to the database is below. But My JVM Heap fills out and cannot continue to store more documents. How to efficiently read a file and store in a database.

 public void readFile() { StringBuilder wholeDocument = new StringBuilder(); try { bufferedReader = new BufferedReader(new FileReader(files)); String line; int count = 0; while ((line = bufferedReader.readLine()) != null) { if (line.contains("<page>")) { wholeDocument.append(line); while ((line = bufferedReader.readLine()) != null) { wholeDocument = wholeDocument.append("\n" + line); if (line.contains("</page>")) { System.out.println(count++); addBodyToDatabase(wholeDocument.toString()); wholeDocument.setLength(0); break; } } } } wikiParser.commit(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { bufferedReader.close(); } catch (IOException e) { e.printStackTrace(); } } } public void addBodyToDatabase(String wholeContent) { Page page = new Page(new Timestamp(System.currentTimeMillis()), wholeContent); database.addPageToDatabase(page); } public static int counter = 1; public void addPageToDatabase(Page page) { session.save(page); if (counter % 3000 == 0) { commit(); } counter++; } 
+6
source share
5 answers

I am using @RookieGuy's answer. fooobar.com/questions/112483 / ...

I use

 session.flush(); session.clear(); 

and finally read all the documents and save them in the database

 tx.commit(); session.close(); 

and change

 wholeDocument = wholeDocument.append("\n" + line); 

to

 wholeDocument.append("\n" + line); 
+1
source

First of all, you should apply fork-join here .

The main task analyzes the file and sends batches of no more than 100 elements to the ExecutorService . ExecutorService must have the number of workflows equal to the number of database connections available. If you have 4 processor cores, let's say that the database can accept 8 simultaneous connections without making a lot of context switching.

Then you must configure the DataSource connection pool and have minSize equal to maxSize and equal to 8. Try HikariCP or ViburDBCP for the connection pool.

Then you need to configure JDBC batch processing . If you use MySQL, the IDENTITY generator will disable swimming. If you use a database that supports sequences, make sure that you also use extended identifier generators (they are the default settings in Hibernate 5.x).

Thus, the process of inserting an entity is parallelized and separated from the main stream of parsing. The main thread must wait for the ExecutorService complete processing all tasks before shutting down.

+8
source

In fact, it’s hard for you to suggest that you don’t do real profiling and figure out what makes your code slow or inefficient.

However, there are a few things we can see from your code.

  • You use StringBuilder inefficiently

    wholeDocument.append("\n" + line); should be written as wholeDocument.append("\n").append(line); instead

    Because what you wrote in the original will be translated by the compiler into whileDocument.append(new StringBuilder("\n").append(line).toString()) . You can see how many unnecessary StringBuilder you created :)

  • Considerations for Using Hibernate

    I'm not sure how you manage your session or how you implemented your commit() , I assume you did it right, there is something else to consider:

    • Have you set the batch size correctly in Hibernate? ( hibernate.jdbc.batch_size ) By default, the JDBC batch size is about 5. You may want to set it to a larger size (so that inside Hibernate sends inserts to a large batch).

    • Given that you do not need entities in the 1st level cache for later use, you can make an intermittent flush() + clear() session until

      • Trigger inserts in batches mentioned in the previous paragraph
      • clear first level cache
  • Disconnect from Hibernate for this feature.

    Hibernate is cool, but it's not a panacea for everything. Given that in this function you simply save records in the database based on the contents of the text file. You also do not need any behavior of the entity, and you do not need to use the first level cache for further processing, there is no reason to use Hibernate here, given the additional processing costs and space. Simple JDBC execution with manual batch management can save you a lot of trouble.

+2
source

I am not very sure about the structure of your data file. This will be easy to understand if you could provide a sample of your file.

The main reason for memory consumption is the way the file is read / iterated. As soon as something is read, it remains in memory. You must use either java.io.FileInputStream or org.apache.commons.io.FileUtils .

Here is sample code to iterate using java.io.FileInputStream

 try ( FileInputStream inputStream = new FileInputStream("/tmp/sample.txt"); Scanner sc = new Scanner(inputStream, "UTF-8") ) { while (sc.hasNextLine()) { String line = sc.nextLine(); addBodyToDatabase(line); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } 

Here is sample code to iterate using org.apache.commons.io.FileUtils

 File file = new File("/tmp/sample.txt"); LineIterator it = FileUtils.lineIterator(file, "UTF-8"); try { while (it.hasNext()) { String line = it.nextLine(); addBodyToDatabase(line); } } finally { LineIterator.closeQuietly(it); } 
0
source

You must start the transaction, complete the save operation, and complete the transaction. (Do not start the transaction after saving!). You can try to use StatelessSession to eliminate cache consumption.

And use a lower value for example 20 in this code

 if (counter % 20 == 0) 

You can try passing StringBuilder as an argument to the method as much as possible.

0
source

All Articles