Keeping Rank for Postgres Contests

Question

Keeping Rank for Postgres Contests

I am trying to determine if there is a "low cost" for the next request. We have implemented a system in which “tickets” earn “points” and, therefore, can be ranked. In order to support the analytical type of requests, we keep the rating of each ticket (tickets may be associated) with the ticket.

I found that the scale of updating this rank is very slow. I am trying to run the script below on a set of "tickets", which is about 20 thousand tickets.

I hope someone can help determine the cause and offer some help.

We are on postgres 9.3.6

Here's a simplified ticket scheme:

ogs_1=> \d api_ticket Table "public.api_ticket" Column | Type | Modifiers ------------------------------+--------------------------+--------------------------------------------------------- id | integer | not null default nextval('api_ticket_id_seq'::regclass) status | character varying(3) | not null points_earned | integer | not null rank | integer | not null event_id | integer | not null user_id | integer | not null Indexes: "api_ticket_pkey" PRIMARY KEY, btree (id) "api_ticket_4437cfac" btree (event_id) "api_ticket_e8701ad4" btree (user_id) "api_ticket_points_earned_idx" btree (points_earned) "api_ticket_rank_idx" btree ("rank") Foreign-key constraints: "api_ticket_event_id_598c97289edc0e3e_fk_api_event_id" FOREIGN KEY (event_id) REFERENCES api_event(id) DEFERRABLE INITIALLY DEFERRED (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED

Here is the query that I am executing:

 UPDATE api_ticket t SET rank = ( SELECT rank FROM (SELECT Rank() over ( Partition BY event_id ORDER BY points_earned DESC ) as rank, id FROM api_ticket tt WHERE event_id = t.event_id AND tt.status != 'x' ) as r WHERE r.id = t.id ) WHERE event_id = <EVENT_ID> AND t.status != 'x';

Here is a set of approximately 10 thousand lines:

 Update on api_ticket t (cost=0.00..1852176.70 rows=9646 width=88) (actual time=1254035.623..1254035.623 rows=0 loops=1) -> Seq Scan on api_ticket t (cost=0.00..1852176.70 rows=9646 width=88) (actual time=121.611..1253148.416 rows=9748 loops=1) Filter: (((status)::text <> 'x'::text) AND (event_id = 207)) Rows Removed by Filter: 10 SubPlan 1 -> Subquery Scan on r (cost=159.78..191.97 rows=1 width=8) (actual time=87.466..128.537 rows=1 loops=9748) Filter: (r.id = t.id) Rows Removed by Filter: 9747 -> WindowAgg (cost=159.78..178.55 rows=1073 width=12) (actual time=46.389..108.954 rows=9748 loops=9748) -> Sort (cost=159.78..162.46 rows=1073 width=12) (actual time=46.370..66.163 rows=9748 loops=9748) Sort Key: tt.points_earned Sort Method: quicksort Memory: 799kB -> Index Scan using api_ticket_4437cfac on api_ticket tt (cost=0.29..105.77 rows=1073 width=12) (actual time=2.698..26.448 rows=9748 loops=9748) Index Cond: (event_id = t.event_id) Filter: ((status)::text <> 'x'::text) Total runtime: 1254036.583 ms

+5

sql sql-update postgresql postgresql-performance correlated-subquery

Brock haywood Jul 20 '15 at 18:38

source share

1 answer

Erwin brandstetter · Accepted Answer · 2015-07-20T19:29:05+0000

A correlated subquery must be executed for each row (20k times in your example). This only makes sense for a small number of lines or where it requires computation.

This view is computed once before joining it:

 UPDATE api_ticket t SET rank = tt.rnk FROM ( SELECT tt.id , rank() OVER (PARTITION BY tt.event_id ORDER BY tt.points_earned DESC) AS rnk FROM api_ticket tt WHERE tt.status <> 'x' AND tt.event_id = <EVENT_ID> ) tt WHERE t.id = tt.id AND t.rank <> tt.rnk; -- avoid empty updates

It should be a little faster. :)

Additional improvements

The last condition I added excludes empty updates:
- How can I (or can I) select SELECT DISTINCT in multiple columns?
It only makes sense if the new rank can be the old rank at least occasionally. Else delete it.
We do not need to repeat AND t.status != 'x' in the outer query, since we are joining the PK id column with the same value on both sides.
_{And the standard SQL inequality operator is <> , even if Postgres supports != , But rather uses the standard operator.}
Click the predicate event_id = <EVENT_ID> in the subquery. There is no need to calculate numbers for other event_id . This was transferred from an external request in your original. In the rewritten query, we best pull the predicate into the subquery. Since we are using PARTITION BY tt.event_id , it will not be randomly ranked.

Keeping Rank for Postgres Contests

Additional improvements

More articles: