Eclipselink jpa generates count requests using COUNT (id) instead of COUNT (*)

Question

Eclipselink jpa generates count requests using COUNT (id) instead of COUNT (*)

I use Eclipselink, Spring Data and Postgresql. In my project, I noticed that when using the paged results provided by the SpringData repositories, there are queries such as:

SELECT COUNT(id) FROM table WHERE [part generated according to specification]

where "id" is the primary key of the "table". Delving into the explanation, I noticed that COUNT (id) is about 10 times slower than COUNT () for a very large table (count (id) looks for nonzero values in the id column, and count () just returns the number of rows matching criteria), also count (*) can use indexes, while count (id) is not.

I traced the SpringData base repository class, and it seems that only the JPA implementation responds to generate this request.

What is the reason for using count (id) instead of faster COUNT (*)?
Can I change this behavior (in any case - even increasing the existing components)?

any help appreciated

- [edit] -

there is a table:

 \d ord_order Table "public.ord_order" Column | Type | Modificators -------------------------+--------------------------+---------------------------------------------------------- id | integer | NOT NULL DEFAULT nextval('ord_order_id_seq'::regclass) test_order | boolean | DEFAULT false ... Indexes: "pk_order" PRIMARY KEY, btree (id) "idx_test_order" btree (test_order) # explain SELECT COUNT(*) FROM ord_order WHERE (test_order = false); QUERY PLAN -------------------------------------------------------------------------- Aggregate (cost=89898.79..89898.80 rows=1 width=0) -> Index Only Scan using idx_test_order on ord_order (cost=0.43..85375.37 rows=1809366 width=0) Index Cond: (test_order = false) Filter: (NOT test_order) (4 wiersze) # explain SELECT COUNT(id) FROM ord_order WHERE (test_order = false); QUERY PLAN -------------------------------------------------------------------------- Aggregate (cost=712924.52..712924.53 rows=1 width=4) -> Seq Scan on ord_order (cost=0.00..708401.10 rows=1809366 width=4) Filter: (NOT test_order) (3 wiersze)

now the difference is ~ 90k versus ~ 713k and index scan versus full scan

+1

java spring-data-jpa postgresql hibernate jpa

redguy Nov 02 '15 at 14:53

source share

2 answers

count(*) can use an index because only one column ( test_order ) is specified in the query. count(id) refers to two columns, and Postgres must select the column id column and test_order to publish the result.

As I said, some people think that count(id) faster than count(*) - when there are no restrictions on the request. A myth that has never been right for any DBMS with a decent optimizer. I assume that your obfuscation layer uses count(id) instead of count(*) .

Assuming you don't want to get rid of ORM (in order to gain control over the SQL used by your application again), the only workaround that I see is to create a partial index that Postgres can use:

 create index on ord_order (id) where test_order = false;

0

a_horse_with_no_name Nov 02 '15 at 15:55

source share

redguy · Accepted Answer · 2015-11-03T11:39:21+0000

I managed to complete the usual implementation of the Spring and factory repository base class using this implementation. As a result, the generated counter requests now have the form:

 SELECT COUNT(1) FROM table

which has the same plan as COUNT (*). This seems like a great solution and works around the world for all of the specific repositories in the application.

I did not know how to generate COUNT (*), COUNT (1) was much simpler because the COUNT function expects some expressions as parameters, and I can put a static value - 1

Eclipselink jpa generates count requests using COUNT (id) instead of COUNT (*)

More articles: