Eclipselink jpa generates count requests using COUNT (id) instead of COUNT (*)

I use Eclipselink, Spring Data and Postgresql. In my project, I noticed that when using the paged results provided by the SpringData repositories, there are queries such as:

SELECT COUNT(id) FROM table WHERE [part generated according to specification] 

where "id" is the primary key of the "table". Delving into the explanation, I noticed that COUNT (id) is about 10 times slower than COUNT () for a very large table (count (id) looks for nonzero values ​​in the id column, and count () just returns the number of rows matching criteria), also count (*) can use indexes, while count (id) is not.

I traced the SpringData base repository class, and it seems that only the JPA implementation responds to generate this request.

  • What is the reason for using count (id) instead of faster COUNT (*)?
  • Can I change this behavior (in any case - even increasing the existing components)?

any help appreciated

- [edit] -

there is a table:

 \d ord_order Table "public.ord_order" Column | Type | Modificators -------------------------+--------------------------+---------------------------------------------------------- id | integer | NOT NULL DEFAULT nextval('ord_order_id_seq'::regclass) test_order | boolean | DEFAULT false ... Indexes: "pk_order" PRIMARY KEY, btree (id) "idx_test_order" btree (test_order) # explain SELECT COUNT(*) FROM ord_order WHERE (test_order = false); QUERY PLAN -------------------------------------------------------------------------- Aggregate (cost=89898.79..89898.80 rows=1 width=0) -> Index Only Scan using idx_test_order on ord_order (cost=0.43..85375.37 rows=1809366 width=0) Index Cond: (test_order = false) Filter: (NOT test_order) (4 wiersze) # explain SELECT COUNT(id) FROM ord_order WHERE (test_order = false); QUERY PLAN -------------------------------------------------------------------------- Aggregate (cost=712924.52..712924.53 rows=1 width=4) -> Seq Scan on ord_order (cost=0.00..708401.10 rows=1809366 width=4) Filter: (NOT test_order) (3 wiersze) 

now the difference is ~ 90k versus ~ 713k and index scan versus full scan

+1
java spring-data-jpa postgresql hibernate jpa
source share
2 answers

I managed to complete the usual implementation of the Spring and factory repository base class using this implementation. As a result, the generated counter requests now have the form:

 SELECT COUNT(1) FROM table 

which has the same plan as COUNT (*). This seems like a great solution and works around the world for all of the specific repositories in the application.

I did not know how to generate COUNT (*), COUNT (1) was much simpler because the COUNT function expects some expressions as parameters, and I can put a static value - 1

0
source share

count(*) can use an index because only one column ( test_order ) is specified in the query. count(id) refers to two columns, and Postgres must select the column id column and test_order to publish the result.

As I said, some people think that count(id) faster than count(*) - when there are no restrictions on the request. A myth that has never been right for any DBMS with a decent optimizer. I assume that your obfuscation layer uses count(id) instead of count(*) .

Assuming you don't want to get rid of ORM (in order to gain control over the SQL used by your application again), the only workaround that I see is to create a partial index that Postgres can use:

 create index on ord_order (id) where test_order = false; 
0
source share

All Articles