Why do JPA / Eclipselink run redundant database queries even with the BatchFetchType.IN prompt?

Summary:

I am trying to minimize the number of queries generated by a JPA-based java application in a database. I pointed out the @BatchFetch(BatchFetchType.IN) optimization hint, but I still see some additional requests that I find redundant and unnecessary.

Details:

Given a simple domain model: We have an invoice system. The invoice relates to OneToOne order. We also have a customer who has OneToMany relationship with Orders. (Customer 1-> M Order 1 <-1 Invoice) . Read more about here . Find the full source code here . Here is the entity definition that is currently worth:

Client.java (excluding getters and setter ):

  @Entity(name = "CUSTOMER") public class Customer { @Id //signifies the primary key @Column(name = "CUST_ID", nullable = false) @GeneratedValue(strategy = GenerationType.AUTO) private long custId; @Column(name = "FIRST_NAME", length = 50) private String firstName; @OneToMany(mappedBy="customer",targetEntity=Order.class, fetch=FetchType.LAZY) private Collection<Order> orders; } 

Order.java (excluding getters and setter ):

  @Entity(name = "ORDERS") public class Order { @Id @Column(name = "ORDER_ID", nullable = false) @GeneratedValue(strategy = GenerationType.AUTO) private long orderId; @Column(name = "TOTAL_PRICE", precision = 2) private double totPrice; @OneToOne(fetch = FetchType.LAZY, optional = false, cascade = CascadeType.ALL, mappedBy = "order") private Invoice invoice; @ManyToOne(optional = false) @JoinColumn(name = "CUST_ID", referencedColumnName = "CUST_ID") private Customer customer; @ManyToMany(fetch = FetchType.LAZY) @JoinTable(name = "ORDER_DETAIL", joinColumns = @JoinColumn(name = "ORDER_ID", referencedColumnName = "ORDER_ID"), inverseJoinColumns = @JoinColumn(name = "PROD_ID", referencedColumnName = "PROD_ID")) private List<Product> productList; } 

Invoice.java (excluding getters and setter ):

 @Entity(name = "ORDER_INVOICE") public class Invoice { @Id // signifies the primary key @Column(name = "INVOICE_ID", nullable = false) @GeneratedValue(strategy = GenerationType.AUTO) private long invoiceId; @Column(name = "AMOUNT_DUE", precision = 2) private double amountDue; @OneToOne(optional = false, fetch = FetchType.LAZY) @JoinColumn(name = "ORDER_ID") private Order order; } 

With this model, I conducted a simple test to get all customer orders.

 EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("testjpa"); EntityManager em = entityManagerFactory.createEntityManager(); Customer customer = em.find(Customer.class, 100L); Collection<Order> orders = customer.getOrders(); for(Order order: orders){ System.out.println(order.getInvoice().getInvoiceId()); } em.close(); 

Since everything was lazy, we received four requests, as shown:

 1398882535950|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100) 1398882535981|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100) 1398882535995|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 111) 1398882536004|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222) DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID =?) | SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID 1398882535950|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100) 1398882535981|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100) 1398882535995|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 111) 1398882536004|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222) 

Since I do not need N + 1 calls to receive invoices, I thought about using batch sampling and reduced the number of requests to 4 (one request to extract invoices for all customer orders). To do the same, I updated my Order object as shown below:

Updated - Order.java, the addition of a BatchFetch for an invoice. (Excluding getters and setter ):

  @Entity(name = "ORDERS") public class Order { @Id @Column(name = "ORDER_ID", nullable = false) @GeneratedValue(strategy = GenerationType.AUTO) private long orderId; @Column(name = "TOTAL_PRICE", precision = 2) private double totPrice; @BatchFetch(BatchFetchType.IN) @OneToOne(fetch = FetchType.LAZY, optional = false, cascade = CascadeType.ALL, mappedBy = "order") private Invoice invoice; @ManyToOne(optional = false) @JoinColumn(name = "CUST_ID", referencedColumnName = "CUST_ID") private Customer customer; @ManyToMany(fetch = FetchType.LAZY) @JoinTable(name = "ORDER_DETAIL", joinColumns = @JoinColumn(name = "ORDER_ID", referencedColumnName = "ORDER_ID"), inverseJoinColumns = @JoinColumn(name = "PROD_ID", referencedColumnName = "PROD_ID")) private List<Product> productList; } 

I repeated the same test again and assumed 3 requests would be received to receive the data. (One for the customer, One for Orders and One for batch invoice collection). However, an eclipse generates 5 queries for the same. The following are the queries:

 1398883197009|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100) 1398883197030|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100) 1398883197037|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (?,?))|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (111,222)) 1398883197042|1|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = 222) 1398883197045|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222) DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (,)??) | SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM 1398883197009|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100) 1398883197030|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100) 1398883197037|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (?,?))|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (111,222)) 1398883197042|1|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = 222) 1398883197045|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222) 

I did not understand why the last two queries are being generated. Any help to explain what was going on would be helpful.

Thanks!

+8
performance optimization jpa eclipselink
source share
1 answer

It looks like an error / problem in EclipseLink due to bypassing impatient relationships in the object model, which allows you to load the second invoice in "in" before the order that this link is loading. This forces Invoice to request an order in the database instead of finding it in the cache.

You can fix this by using the lazy sampling regarding the invoice for the order. This delay allows EclipseLink to fully build the object model so that it is in the cache when it is accessed. The code in the question shows that this relationship is marked as Lazy, but this is just a hint at JPA providers that cannot work in EclipseLink without using the agent track or byte code, as described here: https://wiki.eclipse.org/EclipseLink / UserGuide / JPA / Advanced_JPA_Development / Performance / Weaving https://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Performance/Weaving/Dynamic_Weaving

Weaving is not required for lazy collections, only for 1: 1 and other optimizations.

+12
source share

All Articles