Jpa inserts slowly with object graph

I am trying to cascade persistence on a large object graph using JPA. For example (my object graph is slightly larger, but close enough):

@Entity @Table(name="a") public class A { private long id; @OneToMany(cascade = CascadeType.ALL, mappedBy = "a") private Collection<B> bs; } @Entity @Table(name="b") public class B { private long id; @ManyToOne private A a; } 

So I'm trying to save A, which has a collection of 100+ B. The code is just

 em.persist(a); 

The problem is that she is SLOW. My save takes about 1300 ms. I looked at the generated SQL and is terribly inefficient. Something like that:

 select a_seq.nextval from dual; select b_seq.nextval from dual; select b_seq.nextval from dual; select b_seq.nextval from dual; ... insert into a (id) values (1); insert into b (id, fk) values (1, 1); insert into b (id, fk) values (2, 1); insert into b (id, fk) values (3, 1); ... 

Currently using toplink as a persistence provider, but I also tried eclipselink and hibernate. Backend - Oracle 11g. The problem is how sql is compiled. Each of these operations is performed discretely, rather than in bulk, so if there is a network latency of even 5 ms between my application server and the db server, performing 200 discrete operations adds 1 second. I tried to increase the highlight in my sequences, but that helps a bit. I also tried directly JDBC as a batch statement:

 for...{ statement = connection.prepareStatement(sql); statement.addBatch(); } statement.executeBatch(); 

For my datamodel it takes about 33 ms, made as a direct JDBC batch. Oracle itself takes 5 ms for over 100 inserts.

Is there a way to make JPA (I'm stuck with 1.0 right now ...) faster without delving into vendor-specific things like sleep mode insertion?

Thanks!

+4
source share
3 answers

The solution would be to enable JDBC batch processing and clear the EntityManager at regular intervals (same as batch size), but I don't know about the vendor's neutral way to do this:

  • With Hibernate, you need to set the hibernate.jdbc.batch_size configuration parameter. See Chapter 13. Batch Processing

  • With EclipseLink, this is similar to batch writing. See Jeff Sutherland's post on this thread (size should also be specified).

  • According to the comments of this blog post , batch recording is not available in TopLink Essentials :(

+2
source

Curious, why do you find the INCREMENT BY increase dirty? This is an optimization that reduces the number of calls in the database to get the next sequence value and is a common pattern used in database clients where the id value is assigned to the client before INSERT. I do not see this as a JPA or ORM problem and should have the same cost when comparing JDBC, since it also needs to get a new serial number for every new line before INSERT. If you have a different approach in your JDBC case, we should be able to get the EclipseLink JPA to follow the same approach.

The cost of a JPA is probably the most obvious in an isolated INSERT script, because you do not get any benefit from re-reading in a transactional or shared cache, and depending on your cache configuration, you pay a price to put these new objects in the flash cache / fixing.

Please note that there is also the cost of creating the first EntityManager, where all metadata processing, class loading, weaving and initialization of the metamodel are possible. Make sure you keep this time out of your comparison. In your real application, this happens once, and all subsequent EntityManagers benefit from the shared metadata.

If you have other scripts that these entities should read, then the cost of placing them in the cache can reduce the cost of retrieving them. In my experience, I can make the application as a whole much faster than the usual JDBC handwritten solution, but its balance is across the entire set of concurrent users, and not according to an isolated test case.

Hope this helps. We are pleased to provide any recommendations and EclipseLink JPA, as well as performance and scalability options.

Arc

+2
source

Thanks to Pascal for the answer. I conducted several tests and I was able to significantly increase productivity.

Without optimizations, I had an insert that took about 1100 ms. Using eclipselink, I added to persistence.xml:

  <property name="eclipselink.jdbc.batch-writing" value="JDBC"/> <property name="eclipselink.jdbc.batch-writing.size" value="1000"/> 

I tried other properties (Oracle-JDBC, etc.), but JDBC seemed to give a better performance increase. This led to the insertion dropping to about 900 ms. Thus, a rather modest increase in performance by 200 ms. Great savings came from increasing the distribution of sequenceSize. I am not a big fan of this. I find it dirty to increase the INCREMENT BY of my sequences just to host the JPA. Increasing them reduced the time to about 600 ms for each insert. Thus, a total of about 500 ms were shaved with these improvements.

All this is beautiful and dandy, but still much slower than the JDBC party. JPA is a pretty high price to pay for ease of coding.

+1
source

Source: https://habr.com/ru/post/1313704/


All Articles