From what I understand in the semantics of your query, you can simplify:
select count(*) from emails where ( select "references"[1] from emails where message_id = e.message_id ) = ANY ("references") or message_id = ( select "references"[1] from emails where message_id = e.message_id )
in
select count(*) from emails where e."references"[1] = ANY ("references") OR message_id = e."references"[1]
Indeed, message_id is not necessarily unique, but if you have different lines for the given message_id value, your request will not be executed.
This simplification, however, does not significantly change the cost of the request. In fact, the problem here is that you need two full table mail scans to complete the query (as well as an index scan on emails_message_id_index). You can save one full scan using the index in an array of links.
You would create an index like this with:
CREATE INDEX emails_references_index ON emails USING GIN ("references");
An index alone helps to greatly simplify the initial query: provided there are up-to-date statistics, as with a sufficiently large number of rows, PostgreSQL will scan indexes. However, you should modify the subquery as follows to help the scheduler scan the raster index at that array index:
select count(*) from emails where ARRAY[e."references"[1]] <@ "references" OR message_id = e."references"[1]
The final request will read:
SELECT DISTINCT e.id, folder, subject, in_reply_to, message_id, "references", e.updated_at, ( select count(*) from emails where ARRAY[e."references"[1]] <@ "references" OR message_id = e."references"[1] ) FROM "emails" e INNER JOIN "email_participants" ON ("email_participants"."email_id" = e."id") WHERE (("user_id" = 220) AND ("folder" = 'INBOX')) ORDER BY e."updated_at" DESC LIMIT 10 OFFSET 0;
To illustrate the expected profit, some tests were conducted in a dummy environment:
- with lines of 10,000 in tabular letters (and corresponding lines in the email_participants table), the initial query is executed in 787 ms, while the index scan is reduced to 399 ms, and the proposed query is executed in 12 ms;
- with an initial row request of 100,000, it takes 9,200 ms, and the index scan is reduced to 4,251 ms, and the proposed query is performed at 637 ms.